self-eval
SolidHonestly evaluate AI work quality using a two-axis scoring system. Use after completing a task, code review, or work session to get an unbiased assessment. Detects score inflation, forces devil's advocate reasoning, and persists scores across sessions.
Install
Quality Score: 96/100
Skill Content
Details
- Author
- alirezarezvani
- Repository
- alirezarezvani/claude-skills
- Created
- 7 months ago
- Last Updated
- 3 days ago
- Language
- Python
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
evaluate
Comprehensive quality grading. Checks prompt compliance, code quality, security, test coverage, architecture fitness. Produces a percentage score. Not lenient. Keywords: evaluate, grade, check, verify, validate, scorecard, quality, percentage, score, how good
agentic-eval
Evaluate and improve AI-generated output with explicit rubrics, reflection loops, and stop conditions. Use when building self-critique workflows, evaluator-optimizer pipelines, or acceptance gates for code, docs, analysis, or plans.
eval-skills
Audit all skills in the current project for frontmatter completeness, effort level appropriateness, allowed-tools scoping, and content quality. Produces a scored report with effort-level recommendations for each skill. Use when onboarding to a new project, reviewing skill quality before shipping, or adding effort fields to an existing skill library.
ai-evals
Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure AI output quality.
ai-reliability-eval
Measures AI system reliability over time by defining pass/fail criteria before implementation, running capability checks, and tracking regression via pass@k metrics. Trigger for 'how reliable is this', 'did my changes break anything', 'measure AI performance', 'define success criteria', 'eval this feature', 'check skill regression'. Not for code correctness; use /ai-test instead. Not for quality gates; use /ai-verify instead — evals measure AI task completion consistency.