fulltext-retrievallisted
Install: claude install-skill Aperivue/medsci-skills
# Fulltext Retrieval Skill
Batch download open-access full-text PDFs from a DOI list using legitimate OA APIs only.
## Pipeline
```
DOI list → Unpaywall → PMC (Europe PMC / OA FTP / web) → OpenAlex → Crossref → landing page
```
Each DOI goes through these sources in order until a valid PDF (≥10 KB, `%PDF-` header) is found.
## Quick Start
```bash
# Prepare a DOI list (one per line)
cat > dois.txt << 'EOF'
10.1007/s00330-010-1783-x
10.1002/mp.12524
10.1148/radiol.13131265
EOF
# Run
python fetch_oa.py dois.txt --output pdfs/ --email your@email.com
# Verbose mode for debugging
python fetch_oa.py dois.txt -o pdfs/ -e your@email.com --verbose
```
## Input Formats
**Plain text** — one DOI per line:
```
10.1007/s00330-010-1783-x
10.1002/mp.12524
```
**TSV with header** — must contain a `DOI` column, optional `PMID` column:
```tsv
ID Title DOI PMID Year
1 Some paper 10.1007/s00330-010-1783-x 20628747 2010
```
When a PMID is available, the PMC lookup is more reliable (PMID → PMCID conversion).
## PMC Download (JS-Challenge Resistant)
PMC web pages may block automated downloads with JavaScript proof-of-work challenges. This tool uses three fallback methods:
### Method A: Europe PMC REST API (most reliable)
```bash
PMCID="PMC9733600"
curl -sLo output.pdf \
"https://europepmc.org/backend/ptpmcrender.fcgi?accid=${PMCID}&blobtype=pdf"
```
### Method B: PMC OA FTP Service
```bash
curl -s "https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id=${PMCID}" | \
grep -oE 'hr