pdf-cascadelisted

Acquire PDFs for bibliographic references via a strict 10-source cascade (Crossref OA → arXiv → OpenAlex → Unpaywall → HAL → CORE → archive.org → WebSearch queue; optionally Sci-Hub + Anna's Archive in opt-in mode). Each acquired PDF is validated against expected author/title/year (page 1 anti-homonymy) before being accepted. Trigger this skill whenever the user wants to download a PDF for a reference, fill a cascade, retry an acquisition, or push a `candidate`/`uid_resolved` ref forward in the FSM. Use also for `/paper-trail:cascade <slug>` and `/paper-trail:reactivate-ocr`. Triggers on French and English phrases: "télécharge le PDF", "lance la cascade", "acquérir les sources", "DL ce papier", "passer en pdf_acquired", "valider page 1", "reprise OCR", "download this paper", "acquire PDFs", "run cascade", "retry acquisition", "advance candidates". The skill never decides whether a citation is truthful — that is the curator's role (sota-auditor skill). It only executes the technical state transitions of the wo
roomi-fields/paper-trail · ★ 3 · Data & Documents · score 76

Install: claude install-skill roomi-fields/paper-trail

# Skill : pdf-cascade ## Purpose Wraps the paper-trail worker B's acquisition cascade. Given a single reference slug or a state filter, it advances the matching refs from `candidate` toward `page1_validated` through the FSM, with strict page 1 anti-homonymy validation. Anchors all downloads in the local registry (`pdf_path`, `pdf_sha256`, `acquisition_attempts[]`) so the curator can audit everything. ## When to invoke Trigger this skill for any of: - The user wants to fetch a PDF for a ref by slug - The user wants to push the whole batch of `candidate` or `uid_resolved` refs forward - The user explicitly calls `/paper-trail:cascade`, `/paper-trail:reactivate-ocr`, or `/paper-trail:status` - `sota-writer` sub-task needs PDFs acquired for its proposed candidates Do NOT invoke for semantic decisions (is this citation correct?) — that belongs to `sota-auditor`. ## How it works The skill delegates to the worker B Python CLI: ```bash # Single ref by slug python -m pipeline run --ref <slug> # Batch by state filter python -m pipeline run --state candidate --limit 50 # Dry-run (no mutation) python -m pipeline run --state candidate --dry-run # Reactivate refs waiting for OCR python -m pipeline reactivate-ocr ``` The CLI invokes the 8-source cascade (or 10 sources if `RESEARCH_ENABLE_SHADOW_LIBS=1` — see DISCLAIMER.md). Each acquired PDF must pass page 1 validation (author + title similarity ≥ 0.3 + zero off-domain keywords) before being accepted into the registry. ##