harness-engineering

Solid

Design runtime infrastructure around AI agents — permissions, tools, feedback loops, observability. Use when deploying agents to production or designing multi-agent systems.

DevOps & Infrastructure 108 stars 16 forks Updated today MIT

Install

View on GitHub

Quality Score: 86/100

Stars 20%
68
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
80
License 10%
100
Description 5%
100

Skill Content

# Harness Engineering Design the system *around* AI agents for reliable, safe production use. Harness Engineering protects the quality ceiling of Agentic Engineering. It turns fast agent output into controlled execution through permissions, observability, recovery, and adversarial validation. ## Agent Runtime Model Treat the agent harness as the kernel around an LLM OS: | Kernel concern | Agent harness responsibility | |---|---| | Memory management | Curate context, summarize bulky outputs, persist state to wiki/logs. | | Syscall boundary | Expose tools with contracts, allowlists, deny rules, and retries. | | Process isolation | Separate write scopes, sandboxes, credentials, and state per agent. | | Scheduling | Decide sequential, parallel, or event-driven execution. | | Interrupts | Stop, ask approval, rollback, or route to a safer action. | | Observability | Log tool calls, decisions, outputs, costs, and verification evidence. | | Garbage collection | Close idle agents, remove stale tasks, compact context, and record risks. | ## Productized Agent Harness Google I/O '26 added a practical pressure test for harness design: the same runtime pattern now appears in developer tools, personal agents, search, commerce, generative media, and smart glasses. | Product surface | Harness control that must exist | |---|---| | Agent-first IDE | task queue, subagent ownership, hooks, sandbox, test proof | | Personal agent | user mandate, memory scope, tool allowlist, resumable log |...

Details

Author
Mark393295827
Repository
Mark393295827/third-brain-v5-skills
Created
4 weeks ago
Last Updated
today
Language
HTML
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

devpilot-harness-engineering

Use when setting up a repository for autonomous coding agents, adding guardrails, context files, or automation so agents ship reliably without constant review. Triggers on "make this repo agent-friendly", "agents keep drifting", "set up AGENTS.md / skills / sub-agents", "harness engineering", architectural drift with agent-authored code, or retrofitting guardrails after output quality decayed.

4 Updated today
SiyuQian
AI & Automation Listed

harness-engineering

Design the harness — the 7-layer scaffolding around the LLM loop that makes agents reliable. Covers the agent loop itself (gather/act/verify), context management, durable execution, guardrails, human-in-the-loop, evals, and observability. In production agents, the harness is 98% of the code. Use whenever the user is structuring code around an agent loop, asks "how do I make this reliable / production-ready," is implementing verification, retry logic, sub-agent delegation, permission systems, approval gates, or wants to understand what makes Claude Code / Codex / Devin work beyond the model.

5 Updated today
AlexDuchDev
AI & Automation Listed

neo-agent-harness

Use this skill when the user asks to improve AI-assisted development reliability, AGENTS.md, skills, tests, CI, hooks, review loops, or agent workflow governance. It designs feedforward guides, feedback sensors, verification gates, and human decision points from repository evidence.

5 Updated yesterday
Benknightdark
AI & Automation Listed

agent-harness-design

Design agent tool sets with stable names, narrow schemas, deterministic output shapes, and explicit error paths. No catch-all tools unless unavoidable.

7 Updated 2 days ago
yeaight7
AI & Automation Listed

harness

Cybernetics-based multi-agent orchestration for complex tasks. Coordinates a Planner → Generator → Evaluator → Retro pipeline with clean-context sub-agents, per-checkpoint drift prevention, and persistent retro learning. Recommended workflow: Claude Code plans the spec (Session 1), Codex executes autonomously (Session 2), Claude CLI reviews as cross-model peer. Use when: "harness this task", "use harness", "orchestrate this", "harness plan", "harness continue", "harness execute <task-id>", "harness <spec-name>", or when a task requires structured multi-agent coordination.

26 Updated 4 days ago
stone16