replay-shadowlisted

Capture a real execution of a worker-shaped system, replay it against a candidate config in an isolated sandbox, diff the two. Use before promoting prompt changes, model swaps, skill set changes, or API upgrades. System-agnostic — works for any unit of work that has identifiable inputs, environment state, an observable transcript, and a measurable output.
a-canary/arc-agents · ★ 0 · AI & Automation · score 60

Install: claude install-skill a-canary/arc-agents

# replay-shadow — Capture-Replay-Diff for Confidence Before Promotion A/B-test a candidate build against a live baseline *without* mirroring the full system. Pick one unit of work (a worker turn, a request, a job), freeze its execution, replay against the candidate, diff. Repeat across a corpus until variance stabilizes. Replay-shadow is the unit-test analogue of live shadowing: cheap, deterministic, runnable on every prompt change. Live shadow is the integration-test analogue — expensive, run before major cutovers. ## When to use - Prompt iteration (system prompt, frame templates, skill set). - Model swap (cost/quality tradeoff for the same workload). - API/SDK upgrade. - Config change with non-obvious downstream effects (timeouts, retries, tool allowlist). ## When NOT to use - Concurrency bugs, race conditions, backpressure — need live load. - UX integration regressions — need the real transport. - Anything where the input distribution itself is what changed. Replay tests *behavior on past inputs*. It does not test the future. ## The three verbs ### 1. capture — freeze a real execution into a fixture A **fixture** is the minimum reproducible context for one unit of work. Four parts: | Part | What | Where to find it (varies by system) | |---|---|---| | **Input** | Exact stimulus the unit received | rendered prompt, request body, job payload, CLI args | | **Env-snapshot** | State the unit read against | git sha, DB snapshot, index snapshot, config hash, env vars |