agent-safety-patternslisted

Design safe AI agent systems — capability restriction, sandboxed execution, human-in-the-loop gates, anomaly detection, rollback on unexpected behavior, blast radius limiting, and output verification before acting. Use when asked about "agent safety", "safe agent design", "AI agent guardrails", "capability restriction", "agent sandbox", "human approval gate", "agent rollback", "blast radius", "agent anomaly detection", "agent going off the rails", "agent verification", "principle of least capability", or "how to make an agent safe to run autonomously". Do NOT use for: prompt injection defense — see adversarial-prompt-testing. Do NOT use for: hook-based blocking — see hook-block-commands.
phamlongh230-lgtm/yamtam-engine · ★ 3 · AI & Automation · score 65

Install: claude install-skill phamlongh230-lgtm/yamtam-engine

## When to Use - Use when: deploying an agent that can write files, run commands, or call APIs - Use when: an agent had unexpected behavior and needs a safety review - Use when: designing the capability scope for a new autonomous agent - Use when: determining what requires human approval vs can be auto-approved - Do NOT use for: prompt injection defense — see adversarial-prompt-testing - Do NOT use for: hook-level command blocking — see hook-block-commands --- ## Principle of Least Capability ``` Give the agent only the tools it needs for its specific task. Never give an agent capabilities it might need "someday". Read-only agent: Read, Bash (read-only commands), WebFetch Write agent: Read, Edit, Bash (scoped), Write (specific dirs only) Deploy agent: Read, Bash (deploy scripts only), NOT: Edit, Write to source Capability matrix example: ┌────────────────┬──────┬────────┬───────┬──────────┐ │ Agent │ Read │ Edit │ Write │ Bash │ ├────────────────┼──────┼────────┼───────┼──────────┤ │ research │ ✅ │ ❌ │ ❌ │ read-only│ │ code-review │ ✅ │ ❌ │ ❌ │ lint only│ │ code-writer │ ✅ │ ✅ │ ✅ │ test only│ │ deploy │ ✅ │ ❌ │ ❌ │ deploy/* │ └────────────────┴──────┴────────┴───────┴──────────┘ ``` --- ## Sandboxed Execution ```bash # Run agent in read-only filesystem overlay (Linux) # Agent can "write" to tmpfs overlay — host filesystem unchanged mkdir -p /tmp/agent-sandbox/{upper,work,