← ClaudeAtlas

experiment-designerlisted

Design and run a formal hypothesis-driven experiment end-to-end — write the design doc with go/no-go criteria, dispatch workers, validate their output, and capture findings. Use when 'design an experiment', 'experiment-designer', 'run an experiment to test', 'set up a hypothesis test', 'fan out experiments to figure out', or when the user wants a structured multi-phase investigation rather than a quick exploration. Distinct from tech-spike, which is for fast informal exploration; use experiment-designer when the question deserves a written hypothesis, explicit pass/fail thresholds per metric, and findings that get committed to git as long-lived artifacts.
mistakenot/auto-stack · ★ 0 · Web & Frontend · score 56
Install: claude install-skill mistakenot/auto-stack
# Experiment Designer Design and run a formal experiment with stated hypothesis, explicit pass/fail thresholds, and findings that get checked into git as durable artifacts. Optimized for the case where a research question is too important to settle by intuition and the answer is worth keeping. ## When to use vs. neighbors - **experiment-designer** (this skill): formal research question with hypothesis + thresholds + worker dispatch + findings committed to `docs/experiments/`. Multi-phase OK. - **tech-spike**: quick informal exploration, scratch work in `.tmp/`, no formal writeup. - **new-task / new-solution / new-plan**: planning to build a known feature, not testing a research question. Trigger experiment-designer when: the user states a question with multiple plausible answers, the cost of guessing wrong is high enough that a $1-30 OpenAI bill is a good trade, and the answer would inform a real design decision. ## The full lifecycle Four phases. Each gates the next. ### Phase 1: Design 1. **Capture the hypothesis in one sentence.** "X works for Y" or "method A beats method B on metric M." If you cannot state it crisply, the experiment isn't ready yet — go back to the user with clarifying questions. 2. **Pick the metric(s) and the pass/fail threshold for each.** Vague success criteria produce vague reports. For each metric: write the number that would make you conclude "yes" and the number that would make you conclude "no." Anything in between is "partial" and needs