gan-style-harness

Solid

GAN-inspired Generator-Evaluator agent harness for building high-quality applications autonomously. Based on Anthropic's March 2026 harness design paper.

AI & Automation 201,447 stars 30903 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# GAN-Style Harness Skill > Inspired by [Anthropic's Harness Design for Long-Running Application Development](https://www.anthropic.com/engineering/harness-design-long-running-apps) (March 24, 2026) A multi-agent harness that separates **generation** from **evaluation**, creating an adversarial feedback loop that drives quality far beyond what a single agent can achieve. ## Core Insight > When asked to evaluate their own work, agents are pathological optimists — they praise mediocre output and talk themselves out of legitimate issues. But engineering a **separate evaluator** to be ruthlessly strict is far more tractable than teaching a generator to self-critique. This is the same dynamic as GANs (Generative Adversarial Networks): the Generator produces, the Evaluator critiques, and that feedback drives the next iteration. ## When to Use - Building complete applications from a one-line prompt - Frontend design tasks requiring high visual quality - Full-stack projects that need working features, not just code - Any task where "AI slop" aesthetics are unacceptable - Projects where you want to invest $50-200 for production-quality output ## When NOT to Use - Quick single-file fixes (use standard `claude -p`) - Tasks with tight budget constraints (<$10) - Simple refactoring (use de-sloppify pattern instead) - Tasks that are already well-specified with tests (use TDD workflow) ## Architecture ``` ┌─────────────┐ │ PLANNER │ ...

Details

Author
affaan-m
Repository
affaan-m/everything-claude-code
Created
4 months ago
Last Updated
yesterday
Language
JavaScript
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

devpilot-harness-engineering

Use when setting up a repository for autonomous coding agents, adding guardrails, context files, or automation so agents ship reliably without constant review. Triggers on "make this repo agent-friendly", "agents keep drifting", "set up AGENTS.md / skills / sub-agents", "harness engineering", architectural drift with agent-authored code, or retrofitting guardrails after output quality decayed.

4 Updated today
SiyuQian
AI & Automation Listed

harness

Cybernetics-based multi-agent orchestration for complex tasks. Coordinates a Planner → Generator → Evaluator → Retro pipeline with clean-context sub-agents, per-checkpoint drift prevention, and persistent retro learning. Recommended workflow: Claude Code plans the spec (Session 1), Codex executes autonomously (Session 2), Claude CLI reviews as cross-model peer. Use when: "harness this task", "use harness", "orchestrate this", "harness plan", "harness continue", "harness execute <task-id>", "harness <spec-name>", or when a task requires structured multi-agent coordination.

26 Updated 4 days ago
stone16
DevOps & Infrastructure Solid

harness-engineering

Design runtime infrastructure around AI agents — permissions, tools, feedback loops, observability. Use when deploying agents to production or designing multi-agent systems.

108 Updated today
Mark393295827
AI & Automation Listed

neo-agent-harness

Use this skill when the user asks to improve AI-assisted development reliability, AGENTS.md, skills, tests, CI, hooks, review loops, or agent workflow governance. It designs feedforward guides, feedback sensors, verification gates, and human decision points from repository evidence.

5 Updated yesterday
Benknightdark
AI & Automation Listed

harness-engineering

Design the harness — the 7-layer scaffolding around the LLM loop that makes agents reliable. Covers the agent loop itself (gather/act/verify), context management, durable execution, guardrails, human-in-the-loop, evals, and observability. In production agents, the harness is 98% of the code. Use whenever the user is structuring code around an agent loop, asks "how do I make this reliable / production-ready," is implementing verification, retry logic, sub-agent delegation, permission systems, approval gates, or wants to understand what makes Claude Code / Codex / Devin work beyond the model.

5 Updated today
AlexDuchDev