bernstein-quality

Solid

Show quality metrics for Bernstein runs - success rates per model, lint/test pass rates, completion time distributions. Use when the user asks about quality, reliability, which model performs best, or pass rates.

AI & Automation 538 stars 44 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 89/100

Stars 20%
91
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
64
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Bernstein Quality Metrics Analyze quality and reliability of agent-generated code. ## When to Use - User asks "how reliable are the agents?" or "which model is best?" - User wants success rates, pass rates, or completion time stats - User asks about test failures or lint issues across models - User says "show me quality metrics" ## Instructions 1. Run `scripts/quality.sh metrics` for overall quality metrics. 2. Run `scripts/quality.sh pass-rates` for lint/typecheck/test pass rates by model. 3. Run `scripts/quality.sh times` for completion time distributions. 4. Present a quality dashboard: ``` ## Quality Dashboard ### Success Rate by Model | Model | Tasks | Success | Fail | Rate | |-------|-------|---------|------|------| | claude-sonnet-4 | 24 | 22 | 2 | 91.7% | | gpt-4.1 | 12 | 10 | 2 | 83.3% | ### Pass Rates | Check | Overall | claude-sonnet-4 | gpt-4.1 | |-------|---------|-----------------|---------| | Lint | 96% | 98% | 92% | | Type-check | 88% | 91% | 83% | | Tests | 85% | 89% | 75% | ### Completion Times | Percentile | Time | |------------|------| | p50 | 3m 20s | | p90 | 8m 45s | | p99 | 15m 12s | ``` 5. Highlight any models with significantly lower pass rates. 6. Recommend model routing adjustments if one model consistently underperforms.

Details

Author
sipyourdrink-ltd
Repository
sipyourdrink-ltd/bernstein
Created
2 months ago
Last Updated
today
Language
Python
License
Apache-2.0

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

bernstein-cost

Show detailed cost breakdown and budget status for the Bernstein orchestrator. Use when the user asks about spending, budget, cost per model, cost per agent, or wants a cost projection.

538 Updated today
sipyourdrink-ltd
AI & Automation Listed

benchmark

Run metric quality benchmark, store results, and compare against previous runs. Invoke with /benchmark, "run benchmark", "benchmark metrics", "check metric quality".

0 Updated today
ichabodcognate315
AI & Automation Solid

bernstein-status

Show Bernstein orchestrator status - active agents, task progress, costs, and alerts. Use when the user asks about orchestrator status, what agents are doing, task progress, how much has been spent, or what's happening with the build.

538 Updated today
sipyourdrink-ltd
AI & Automation Solid

bernstein-agents

Manage Bernstein agents - list active agents, inspect their output, kill stalled agents, or stream live logs. Use when the user asks about agents, wants to see what an agent is doing, or needs to kill one.

538 Updated today
sipyourdrink-ltd
AI & Automation Listed

cost-quality-frontier

Adds cost (input + output tokens × model price) and latency (p50, p95) to eval results, plots model options on a Pareto frontier, and produces a quality-per-dollar composite score so production model selection is grounded in trade-offs, not just quality. Use when: model comparison, cost-aware evals, latency budget, quality-per-dollar, Pareto frontier, model selection, eval economics, picking a model for production.

1 Updated today
varunk130