← ClaudeAtlas

agent-proof-approval-gatelisted

Build a fail-closed PreToolUse gate for merge/deploy/destructive actions that the AI agent provably cannot self-bypass. Use when you need human-only override for a consequential action the agent orchestrates.
CarlosCaPe/octorato · ★ 5 · AI & Automation · score 73
Install: claude install-skill CarlosCaPe/octorato
# Agent-Proof Approval Gate ## Problem A PreToolUse hook that blocks a destructive action (merge, deploy, delete) needs a human-only override. Naively you might check for an env var or a flag — but the agent can set those itself with an inline prefix (`APPROVE=1 gh pr merge 96`) or by writing a file. The gate must be unforgeable by the entity it constrains. ## Key Insight — Inline Env Never Reaches the Hook A PreToolUse hook runs in the **harness process**, not in the shell that executes the agent's command. When the agent writes `VAR=1 cmd`, that assignment is scoped to the child shell that runs `cmd`; the hook fires before `cmd` even starts, in a separate environment. Therefore: > **An env var set in the agent's command prefix is invisible to the hook.** Only a human who runs `export OCTO_MERGE_APPROVE=96` in the real terminal session can set the hook's env. The agent cannot reach it. ## Design ### Primary channel — scoped env var (agent-proof) ```bash # Human grants approval for a specific PR export OCTO_MERGE_APPROVE=96 # must match the exact PR number being merged ``` The hook validates: 1. `OCTO_MERGE_APPROVE` is set. 2. Its value **equals** the PR number extracted from the command being intercepted (not `startswith`, not `in` — exact equality). 3. Optionally, a TTL: compare against the file-mtime of a stamp written when the var was set. ```python import os, re, sys def check_approval(pr_number: str) -> bool: approved = os.environ.get("OCTO_MERGE_AP