evaluation-framework

Solid

Patterns for building evaluation and scoring systems, quality gates, rubrics, and decision frameworks. Use for any scored assessment.

AI & Automation 297 stars 27 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
82
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

## Table of Contents - [Overview](#overview) - [When to Use](#when-to-use) - [Core Pattern](#core-pattern) - [1. Define Criteria](#1-define-criteria) - [2. Score Each Criterion](#2-score-each-criterion) - [3. Calculate Weighted Total](#3-calculate-weighted-total) - [4. Apply Decision Thresholds](#4-apply-decision-thresholds) - [Quick Start](#quick-start) - [Define Your Evaluation](#define-your-evaluation) - [Example: Code Review Evaluation](#example:-code-review-evaluation) - [Evaluation Workflow](#evaluation-workflow) - [Common Use Cases](#common-use-cases) - [Integration Pattern](#integration-pattern) - [Detailed Resources](#detailed-resources) - [Exit Criteria](#exit-criteria) # Evaluation Framework ## Overview A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology. This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions. ## When To Use - Implementing quality gates or evaluation rubrics - Building scoring systems for artifacts, proposals, or submissions - Need consistent evaluation methodology across different domains - Want threshold-based automated decision making - Creating assessment tools with weighted criteria ## When NOT To Use - Simple pass/fail without scoring needs ## Core Pattern ### 1. Define Criteria ```yaml criteria: -...

Details

Author
athola
Repository
athola/claude-night-market
Created
6 months ago
Last Updated
today
Language
Python
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

Code & Development Listed

content-evaluation-framework

This skill should be used when evaluating the quality of book chapters, lessons, or educational content. It provides a systematic 6-category rubric with weighted scoring (Technical Accuracy 30%, Pedagogical Effectiveness 25%, Writing Quality 20%, Structure & Organization 15%, AI-First Teaching 10%, Constitution Compliance Pass/Fail) and multi-tier assessment (Excellent/Good/Needs Work/Insufficient). Use this during iterative drafting, after content completion, on-demand review requests, or before validation phases.

335 Updated today
aiskillstore
AI & Automation Listed

evaluate

Comprehensive quality grading. Checks prompt compliance, code quality, security, test coverage, architecture fitness. Produces a percentage score. Not lenient. Keywords: evaluate, grade, check, verify, validate, scorecard, quality, percentage, score, how good

2 Updated today
jvalin17
AI & Automation Listed

agentic-eval

Evaluate and improve AI-generated output with explicit rubrics, reflection loops, and stop conditions. Use when building self-critique workflows, evaluator-optimizer pipelines, or acceptance gates for code, docs, analysis, or plans.

1 Updated today
bg-szy
AI & Automation Featured

evaluation

Build evaluation frameworks for agent systems. Use when testing agent performance systematically, validating context engineering choices, or measuring improvements over time.

39,350 Updated today
sickn33
AI & Automation Solid

rubric-design-validation

Develop clear scoring rubrics with defined criteria, performance levels, and anchor examples ensuring inter-rater reliability

1,160 Updated today
a5c-ai