experimentlisted
Install: claude install-skill alirezarezvani/gaios
# Experiment loop (autoresearch)
Hill-climb a measurable artifact: **try → measure → keep-or-revert → log**, on a branch, with a trail. Full procedure: `references/sops/experiment-loop.md`. Existing examples + use-case cards: `experiments/`.
## First, the autonomy gate
Before looping unsupervised, confirm **all three**:
1. **Objective, cheap, automatable metric** (code computes it in seconds–minutes).
2. **Reversible sandbox** (git branch / throwaway data / `.tmp`).
3. **No PHI, no confidential data, no external send, no prod deploy.**
If any fails → **human-in-the-loop**: propose each change, get approval, no overnight run. Regulated/production outcomes feed the human-reviewed change process — never auto-deploy. This is `CLAUDE.md`'s scale-caution-to-stakes rule.
## Run it
1. Pick or write the experiment's `CARD.md` (objective, metric+direction, the one mutable surface, budget, autonomy, owner). New ones can copy `experiments/forecast-tuning/`.
2. `git checkout -b experiments/<tag>`.
3. Baseline run → `tools/experiment_log.py add <results.tsv> --id ... --metric ... --status keep --note baseline`.
4. Loop: edit the **one** mutable surface → run the **read-only** harness → log → keep (commit) if improved, else `git reset`. Simpler-and-equal = keep.
5. Stop on interrupt or the budget. Hand the winner + trail to the human.
## Guardrails
- The eval harness is read-only ground truth — never edit it to game the metric.
- One mutable surface only (reviewable diffs).
- Run artif