labautoresearch

Solid

Self-improving loop for plugin skills. Reads program.md, proposes one mutation per iteration, evaluates against deterministic scorer, keeps improvements via git, reverts failures. Targets weakest skill+dimension. Use with /loop for overnight runs.

AI & Automation 384 stars 25 forks Updated 4 days ago MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
86
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Autoresearch — Plugin Skill Self-Improvement Iteratively improve plugin skills via the autoresearch pattern: propose one mutation -> eval -> keep/revert -> repeat. ## Usage ``` /lab:autoresearch # Targeted: attack weakest skill+dimension /lab:autoresearch --skill review # Focus on one skill /lab:autoresearch --strategy sweep # Process all skills alphabetically /lab:autoresearch --dry-run # Show what would change, don't commit ``` For overnight runs: ``` /loop 5m /lab:autoresearch --strategy sweep --max-iterations 200 ``` ## Iron Laws 1. **ONE mutation per iteration** — if description needs "and", split into two 2. **NEVER mutate read-only files** — check program.md before every write 3. **EVAL is deterministic** — always use the wrapper script, never LLM-judge 4. **REVERT on regression OR checks failure** — no exceptions 5. **LOG every iteration** — use `keep` or `revert` command (never skip) 6. **CHECK ideas.md before proposing** — don't rediscover known optimizations ## Wrapper Script Commands All eval/git/journal operations go through ONE script. Do NOT run these manually. ```bash # Find the weakest skill+dimension python3 lab/autoresearch/scripts/run-iteration.py target --strategy targeted # Score a skill (before mutation, to get baseline) python3 lab/autoresearch/scripts/run-iteration.py score <skill-name> # After mutation: score + checks + compare → verdict (KEEP or REVERT) python3 lab/autoresea...

Details

Author
oliver-kriska
Repository
oliver-kriska/claude-elixir-phoenix
Created
3 months ago
Last Updated
4 days ago
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

Code & Development Listed

autoresearch

Scaffold and run Karpathy-style autoresearch loops in any git repo. This skill should be used when setting up autonomous code improvement, generating adversarial eval harnesses, running hypothesis-implement-eval-keep/discard loops, or checking autoresearch progress. Triggers on "autoresearch", "autonomous improvement", "eval loop", "hypothesis loop", "self-improvement loop".

33 Updated 2 days ago
tdimino
AI & Automation Featured

autoresearch

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

34,233 Updated today
github
AI & Automation Listed

autoresearch

Karpathy's autoresearch: autonomous ratcheting optimization loops for any artifact. A human writes program.md, the agent runs experiments with git-backed keep/revert. Trigger on "optimize this", "make this better", "iterate on", "autoresearch", "loop on this", "A/B test", "find the best version", Karpathy's loop, experiment loops, hill climbing, the ratchet pattern, or program.md workflows. Works across code, prompts, content, models, and configs.

0 Updated 3 days ago
Evarodenas
AI & Automation Listed

eval-autoresearch-fit

Trigger with "evaluate autoresearch fit", "score this skill for karpathy loop", "is this a good autoresearch candidate", "assess autoresearch viability for", "which skills are best for autonomous loop optimization", "score skills for 3-file architecture", or when the user wants to determine if a skill is a good candidate for applying the Karpathy autoresearch autonomous optimization loop pattern.

3 Updated today
richfrem
AI & Automation Listed

skill-improver

Autoresearch loop for Claude Code skills — greedy keep/discard hill climbing on a 10-dimension quality rubric, with blind subagent validation for self-scoring bias, plus a `freshen` mode that probes external references (release notes, docs, deprecation signals) and applies verified updates, plus a `trigger` mode that measures and tunes the skill's frontmatter description until it reliably fires when it should and stays silent when it shouldn't (60/40 train/test split, 3 runs/query, blinded test scores).

3 Updated yesterday
air-gapped