harness-engineeringlisted
Install: claude install-skill AlexDuchDev/agentic-product-standard
# Harness Engineering
OpenAI's "Harness Engineering" post and Liu et al.'s Claude Code analysis (arXiv:2604.14228) converge on the same finding: in a production agent, ~98% of code is *not* the model loop. It's the harness — context management, permission systems, verification, sub-agent delegation, tool routing, recovery.
LangChain's empirical finding (March 2026): holding model constant at gpt-5.2-codex, their coding agent moved from Top 30 to Top 5 on Terminal Bench 2.0 (52.8% → 66.5%) **only by changing the harness**. As model capability converges, harness quality is the durable competitive advantage.
## The 7-layer harness model
Every production agent has these layers. Build them in this order; skipping is technical debt:
```
┌─────────────────────────────────────────────┐
│ 7. Observability & Tracing │
├─────────────────────────────────────────────┤
│ 6. Evaluation Layer (CI gates) │
├─────────────────────────────────────────────┤
│ 5. Human-in-the-Loop (notify/ask/review) │
├─────────────────────────────────────────────┤
│ 4. Guardrails (input/output validation) │
├─────────────────────────────────────────────┤
│ 3. Durable Execution (Workflow + Activity) │
├─────────────────────────────────────────────┤
│ 2. Context & Memory Management │
├─────────────────────────────────────────────┤
│ 1. Agent Loop (gather → act → verify) │
└─────────────────────────────────────────────┘
↕ MCP / function cal