harness-engineeringlisted

Design the harness — the 7-layer scaffolding around the LLM loop that makes agents reliable. Covers the agent loop itself (gather/act/verify), context management, durable execution, guardrails, human-in-the-loop, evals, and observability. In production agents, the harness is 98% of the code. Use whenever the user is structuring code around an agent loop, asks "how do I make this reliable / production-ready," is implementing verification, retry logic, sub-agent delegation, permission systems, approval gates, or wants to understand what makes Claude Code / Codex / Devin work beyond the model.
AlexDuchDev/agentic-product-standard · ★ 5 · AI & Automation · score 77

Install: claude install-skill AlexDuchDev/agentic-product-standard

# Harness Engineering OpenAI's "Harness Engineering" post and Liu et al.'s Claude Code analysis (arXiv:2604.14228) converge on the same finding: in a production agent, ~98% of code is *not* the model loop. It's the harness — context management, permission systems, verification, sub-agent delegation, tool routing, recovery. LangChain's empirical finding (March 2026): holding model constant at gpt-5.2-codex, their coding agent moved from Top 30 to Top 5 on Terminal Bench 2.0 (52.8% → 66.5%) **only by changing the harness**. As model capability converges, harness quality is the durable competitive advantage. ## The 7-layer harness model Every production agent has these layers. Build them in this order; skipping is technical debt: ``` ┌─────────────────────────────────────────────┐ │ 7. Observability & Tracing │ ├─────────────────────────────────────────────┤ │ 6. Evaluation Layer (CI gates) │ ├─────────────────────────────────────────────┤ │ 5. Human-in-the-Loop (notify/ask/review) │ ├─────────────────────────────────────────────┤ │ 4. Guardrails (input/output validation) │ ├─────────────────────────────────────────────┤ │ 3. Durable Execution (Workflow + Activity) │ ├─────────────────────────────────────────────┤ │ 2. Context & Memory Management │ ├─────────────────────────────────────────────┤ │ 1. Agent Loop (gather → act → verify) │ └─────────────────────────────────────────────┘ ↕ MCP / function cal