paper-autoraterslisted

Run the four paper-quality autoraters from PaperOrchestra (arXiv:2604.05018, App. F.3) — Citation F1 (P0/P1 partition + Precision/Recall/F1), Literature Review Quality (6-axis 0-100 with anti-inflation rules), SxS Overall Paper Quality (side-by-side), and SxS Literature Review Quality (side-by-side). TRIGGER when the user asks to "score this paper draft", "evaluate against the benchmark", "compare two papers", or "run the autoraters".
Aukexecutivedepartment5152/PaperOrchestra · ★ 2 · AI & Automation · score 65

Install: claude install-skill Aukexecutivedepartment5152/PaperOrchestra

# Paper Autoraters (App. F.3) Faithful implementation of the four LLM-as-judge autoraters used in PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §5 and App. F.3). These are the metrics the paper uses to demonstrate that PaperOrchestra beats single-agent and AI-Scientist-v2 baselines. Use them to: 1. Score a generated paper against a ground-truth paper. 2. Compare two paper-writing pipelines side-by-side. 3. Validate your own host-agent execution of the paper-orchestra pipeline. ## The four autoraters | Autorater | What it does | Inputs | Output | |---|---|---|---| | **Citation F1 — P0/P1 partition** | Partitions reference list into P0 (must-cite) and P1 (good-to-cite) given the paper text | one paper text + its references list | JSON `{ref_num: "P0"\|"P1"}` | | **Literature Review Quality** | 6-axis 0-100 score for Intro+Related Work, with anti-inflation hard caps | one paper PDF/text + reference avg citation count | JSON with `axis_scores`, `penalties`, `summary`, `overall_score` | | **SxS Overall Paper Quality** | Holistic side-by-side preference judgment | two papers (PDF or text) | JSON with `winner` ∈ {paper_1, paper_2, tie} | | **SxS Literature Review Quality** | Side-by-side preference, Intro+Related Work only | two papers | JSON with `winner` ∈ {paper_1, paper_2, tie} | The paper uses Gemini-3.1-Pro and GPT-5 as judges, set to temperature 0.0 (Gemini) or default 1.0 (GPT-5, which doesn't allow temperature adjustment). Use whatever your host LLM is. ## Wor