← ClaudeAtlas

replay-shadowlisted

Capture a real execution of a worker-shaped system, replay it against a candidate config in an isolated sandbox, diff the two. Use before promoting prompt changes, model swaps, skill set changes, or API upgrades. System-agnostic — works for any unit of work that has identifiable inputs, environment state, an observable transcript, and a measurable output.
a-canary/arc-agents · ★ 0 · AI & Automation · score 66
Install: claude install-skill a-canary/arc-agents
# replay-shadow — Capture-Replay-Diff for Confidence Before Promotion A general dev practice for A/B-testing a candidate build against a live baseline *without* mirroring the full system. Pick one unit of work (a worker turn, a request, a job), freeze its execution, replay against the candidate, diff. Repeat across a corpus until variance stabilizes. Replay-shadow is the unit-test analogue of live shadowing. Cheap, deterministic, runnable on every prompt change. Live shadow is the integration-test analogue — expensive, run before major cutovers. This skill is the cheap one. ## When to use - Prompt iteration (system prompt, frame templates, skill set). - Model swap (cost/quality tradeoff between two models for the same workload). - API/SDK upgrade (does new client behave the same on real captured inputs?). - Config change with non-obvious downstream effects (timeouts, retry policy, tool allowlist). ## When NOT to use - Concurrency bugs, race conditions, factory backpressure — those need live load, not replay. - UX integration regressions — those need the real transport. - Anything where the input distribution itself is what changed. Replay tests *behavior on past inputs*. It does not test the future. ## The three verbs Three steps, three artifacts. The skill prescribes the contract; the agent wires it to the system at hand. ### 1. capture — freeze a real execution into a fixture A **fixture** is the minimum reproducible context for one unit of work. Four parts: | P