skill-scorerlisted

Rates any SKILL.md on a 0-100 scale across 10 dimensions with a SHIP / REWORK / SCRAP verdict. Evaluates trigger precision, instruction clarity, output predictability, edge case coverage, anti-hallucination guardrails, developer experience, composability, open-source readiness, wow factor, and real-world utility. No flattery — calibrated against Anthropic's own skill-creator rubric. Use this skill whenever someone says "score this skill", "rate my skill", "is this skill good", "skill review", "skill audit", "roast my SKILL.md", "grade this", "will this skill work", "evaluate my skill", "how good is this", or pastes a SKILL.md and asks for feedback. Also trigger when comparing two skills, benchmarking a skill collection, or asking "what's wrong with this skill" — even without the word "score".
mturac/simulacra · ★ 64 · AI & Automation · score 82

Install: claude install-skill mturac/simulacra

# Skill Scorer — 0-100 Rating for Any SKILL.md Calibrated against Anthropic's own skill-creator guidelines and the patterns of 50k+ star repos (Superpowers, Caveman, mattpocock/skills). Average skill scores 45-55. If yours scores 80+, ship it. --- ## Dependencies None. Standalone. Reads any SKILL.md content. --- ## CRITICAL: Auto-start SKILL.md content in the message or just created in conversation → skip to Step 2. No preamble. --- ## Step 1. Get the skill Content in context? Use it. Otherwise: > Paste your SKILL.md or tell me which skill to score. If the user names a skill in the current environment, read it. --- ## Step 2. Score 10 Dimensions (each 0-10) Be precise. 6.5 is 6.5, not 7. Every score needs one sentence of evidence referencing the actual skill content. ### D1: Trigger Precision (0-10) *Will it fire when it should? Will it stay quiet when it shouldn't?* Evaluate the YAML `description` field: - Specific trigger phrases listed? (+2) - Natural language variations covered? (+2) - "Pushy" quality per Anthropic guidance? (+2) - Edge-case triggers (bare paste, implicit intent)? (+2) - False positive risk controlled? (+2) | Range | Calibration | |-------|-------------| | 0-3 | Vague description, undertriggers or fires on everything | | 4-6 | Some triggers but gaps in natural phrasing | | 7-8 | Solid coverage, most natural phrasings caught | | 9-10 | Comprehensive — includes implicit triggers and edge cases | ### D2: Instruction Clarity (0-10) *Can Cla