evaluation-framework

Solid

Patterns for building evaluation and scoring systems, quality gates, rubrics, and decision frameworks. Use for any scored assessment.

AI & Automation 297 stars 27 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

## Table of Contents - [Overview](#overview) - [When to Use](#when-to-use) - [Core Pattern](#core-pattern) - [1. Define Criteria](#1-define-criteria) - [2. Score Each Criterion](#2-score-each-criterion) - [3. Calculate Weighted Total](#3-calculate-weighted-total) - [4. Apply Decision Thresholds](#4-apply-decision-thresholds) - [Quick Start](#quick-start) - [Define Your Evaluation](#define-your-evaluation) - [Example: Code Review Evaluation](#example:-code-review-evaluation) - [Evaluation Workflow](#evaluation-workflow) - [Common Use Cases](#common-use-cases) - [Integration Pattern](#integration-pattern) - [Detailed Resources](#detailed-resources) - [Exit Criteria](#exit-criteria) # Evaluation Framework ## Overview A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology. This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions. ## When To Use - Implementing quality gates or evaluation rubrics - Building scoring systems for artifacts, proposals, or submissions - Need consistent evaluation methodology across different domains - Want threshold-based automated decision making - Creating assessment tools with weighted criteria ## When NOT To Use - Simple pass/fail without scoring needs ## Core Pattern ### 1. Define Criteria ```yaml criteria: -...

Details

Author: athola
Repository: athola/claude-night-market
Created: 6 months ago
Last Updated: today
Language: Python
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

Code & Development Listed

content-evaluation-framework

This skill should be used when evaluating the quality of book chapters, lessons, or educational content. It provides a systematic 6-category rubric with weighted scoring (Technical Accuracy 30%, Pedagogical Effectiveness 25%, Writing Quality 20%, Structure & Organization 15%, AI-First Teaching 10%, Constitution Compliance Pass/Fail) and multi-tier assessment (Excellent/Good/Needs Work/Insufficient). Use this during iterative drafting, after content completion, on-demand review requests, or before validation phases.

335 Updated today

aiskillstore

AI & Automation Listed

evaluate

Comprehensive quality grading. Checks prompt compliance, code quality, security, test coverage, architecture fitness. Produces a percentage score. Not lenient. Keywords: evaluate, grade, check, verify, validate, scorecard, quality, percentage, score, how good

2 Updated today

jvalin17

AI & Automation Listed

agentic-eval

Evaluate and improve AI-generated output with explicit rubrics, reflection loops, and stop conditions. Use when building self-critique workflows, evaluator-optimizer pipelines, or acceptance gates for code, docs, analysis, or plans.

1 Updated today

bg-szy

AI & Automation Featured

evaluation

Build evaluation frameworks for agent systems. Use when testing agent performance systematically, validating context engineering choices, or measuring improvements over time.

39,350 Updated today

sickn33

AI & Automation Solid

rubric-design-validation

Develop clear scoring rubrics with defined criteria, performance levels, and anchor examples ensuring inter-rater reliability

1,160 Updated today

a5c-ai