ab-harnesslisted

Run a counterfactual A/B harness on Claude Code to measure whether the user's ~/.claude setup (memories, lessons, axioms, skills, hooks, plugins) actually helps on real tasks vs. a blank-canvas env. Use when: (1) User wants to "prove my setup works" or "quantify setup impact" to colleagues; (2) User asks "is my setup actually helping" beyond what ecosystem-audit's reference-count scan can show; (3) A project has the counterfactual measurement step of an audit → measure → clean pipeline queued. Covers the clean-env mechanism (CLAUDE_CONFIG_DIR), what it does and doesn't isolate, fair-comparison knobs (model pinning, permission mode, stdin), how to mine num_turns/tool_calls/pitfall-hits from the resulting JSONL transcripts, and the honest caveats any n=3 harness report must declare.
wan-huiyan/claude-ecosystem-hygiene · ★ 0 · Data & Documents · score 76

Install: claude install-skill wan-huiyan/claude-ecosystem-hygiene

# Claude Code A/B Counterfactual Harness ## Problem The user wants to quantify whether their accumulated `~/.claude/` setup (axioms, lessons, skills, plugins, hooks, auto-memory) *actually helps* on representative tasks, not just whether artifacts get referenced (that's `ecosystem-audit`'s job). Naive attempts fail because: - The default Claude Code run pulls the full `~/.claude` context — there's no obvious switch to "turn the setup off." - Simply `cd`-ing to a different project doesn't help — global axioms/skills still load. - Even with a clean env, the *project repo's* `docs/runbooks/`, `docs/findings/`, and in-repo `MEMORY.md` stay visible — the harness must either accept that leakage (measure *marginal* ~/.claude value) or scrub repo files too. ## Context / Trigger Conditions - User has completed the `ecosystem-audit` scan (utilization %) and wants the **counterfactual** number next ("setup saved X turns / prevented Y pitfalls"). - User says "I want to show colleagues my setup works," "quantify the setup," "A/B test my Claude setup," "replay a task without my memory." - A project plan references a counterfactual harness step — the canonical measurement stage after (1) usage audit and (2) before (3) cleanup. - User asks whether a specific skill / lesson / axiom pulls its weight on real work. ## Solution ### Step 1 — Verify the clean-env mechanism before spending a cent The knob is `CLAUDE_CONFIG_DIR`. Point it at an empty dir: ```bash mkdir -p /tmp/