link-rot-detectionlisted
Install: claude install-skill jacob-balslev/skill-graph
# Link-Rot Detection
## Coverage
- Markdown link extraction — pulling every `[text](url)` and `[text][ref]` reference link out of every `.md` and `.mdx` file in the content tree
- External vs internal classification — what counts as external (different host) and what gets skipped (relative path, anchor link, mailto, tel)
- Status-code interpretation — 200 is healthy, 301/308 is a redirect (record the new URL but don't fail), 302/307 is transient (re-check next run), 404/410 is dead (flag), 5xx is transient (retry with backoff before flagging)
- Soft-404 detection — pages that return HTTP 200 but render an "unknown page" interstitial (compare body length / content against known soft-404 patterns)
- Rate-limiting and politeness — concurrent request budget per origin host, honoring `robots.txt` crawl-delay, exponential backoff on 429
- Reporting shape — the scanner's output is a structured report (JSON + markdown summary), not a live alert; the report is the audit artifact
## Philosophy
External links rot. Every site that's older than two years has at least a few. The choice is between knowing about them on a schedule or finding out from a user. A periodic scan with a published report turns link health into a maintenance task instead of an emergency. The discipline is to be conservative — distinguish persistent failures (404 across 3 runs) from transient ones (one 503 on a Tuesday) — and to never let the scanner itself become a denial-of-service vector against the targets it