evaluation-methodology

Solid

PluginEval quality methodology — dimensions, rubrics, statistical methods, and scoring formulas. Use this skill when understanding how plugin quality is measured, when interpreting a low score on a specific dimension, when deciding how to improve a skill's triggering accuracy or orchestration fitness, when calibrating scoring thresholds for your marketplace, or when explaining quality badges to external partners like Neon.

AI & Automation 36,649 stars 3968 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Evaluation Methodology This document is the authoritative reference for how PluginEval measures plugin and skill quality. It covers the three evaluation layers, all ten scoring dimensions, the composite formula, badge thresholds, anti-pattern flags, Elo ranking, and actionable improvement tips. Related: [Full rubric anchors](references/rubrics.md) --- ## The Three Evaluation Layers PluginEval stacks three complementary layers. Each layer produces a score between 0.0 and 1.0 for each applicable dimension, and later layers override or blend with earlier ones according to per-dimension blend weights. ### Layer 1 — Static Analysis **Speed:** < 2 seconds. No LLM calls. Deterministic. The static analyzer (`layers/static.py`) runs six sub-checks directly against the parsed SKILL.md: | Sub-check | What it measures | |---|---| | `frontmatter_quality` | Name presence, description length, trigger-phrase quality | | `orchestration_wiring` | Output/input documentation, code block count, orchestrator anti-pattern | | `progressive_disclosure` | Line count vs. sweet-spot (200–600 lines), references/ and assets/ bonuses | | `structural_completeness` | Heading density, code blocks, examples section, troubleshooting section | | `token_efficiency` | MUST/NEVER/ALWAYS density, duplicate-line repetition ratio | | `ecosystem_coherence` | Cross-references to other skills/agents, "related"/"see also" mentions | These six sub-checks feed directly into six of the ten final dimensions (via `...

Details

Author
wshobson
Repository
wshobson/agents
Created
10 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category