labautoresearch

Featured

Self-improving loop for plugin skills. Reads program.md, proposes one mutation per iteration, evaluates against deterministic scorer, keeps improvements via git, reverts failures. Targets weakest skill+dimension. Use with /loop for overnight runs.

Code & Development 505 stars 35 forks Updated today MIT

Install

Quality Score: 97/100

Stars 20%

90

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# Autoresearch — Plugin Skill Self-Improvement Iteratively improve plugin skills via the autoresearch pattern: propose one mutation -> eval -> keep/revert -> repeat. ## Usage ``` /lab:autoresearch # Targeted: attack weakest skill+dimension /lab:autoresearch --skill review # Focus on one skill /lab:autoresearch --strategy sweep # Process all skills alphabetically /lab:autoresearch --dry-run # Show what would change, don't commit ``` For overnight runs: ``` /loop 5m /lab:autoresearch --strategy sweep --max-iterations 200 ``` ## Iron Laws 1. **ONE mutation per iteration** — if description needs "and", split into two 2. **NEVER mutate read-only files** — check program.md before every write 3. **EVAL is deterministic** — always use the wrapper script, never LLM-judge 4. **REVERT on regression OR checks failure** — no exceptions 5. **LOG every iteration** — use `keep` or `revert` command (never skip) 6. **CHECK ideas.md before proposing** — don't rediscover known optimizations ## Wrapper Script Commands All eval/git/journal operations go through ONE script. Do NOT run these manually. ```bash # Find the weakest skill+dimension python3 lab/autoresearch/scripts/run-iteration.py target --strategy targeted # Score a skill (before mutation, to get baseline) python3 lab/autoresearch/scripts/run-iteration.py score <skill-name> # After mutation: score + checks + compare → verdict (KEEP or REVERT) python3 lab/autoresea...

Details

Author: oliver-kriska
Repository: oliver-kriska/claude-elixir-phoenix
Created: 5 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Redis · Database Model Context Protocol · AI

Bundled in these plugins

claude-elixir-phoenix

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

autoresearch

Autoresearch v2. Meta-Harness-informed self-improving skill loop. Upgrades from Karpathy hill-climbing to trace-informed diagnosis: full execution trace access, counterfactual diagnosis after regressions, additive-only safety valve, multi-candidate filesystem. Runs on any skill that produces scoreable output. Also runs on CLAUDE.md routing logic itself. If you can score it, you can autoresearch it.

2 Updated 3 days ago

AI & Automation Solid

autoresearch

Autonomous experiment loop: edit code, commit, run benchmark, extract metrics, keep improvements or revert, repeat forever. Use this skill when the user asks to "run autoresearch", "start an experiment loop", "optimize a metric autonomously", "autonomous experiments", "benchmark loop", "keep/discard experiments", "optimize test speed", "optimize bundle size", "optimize build time", "run experiments overnight", "speed up my tests", "make my build faster", "reduce compile time", "keep trying until it's faster", "run experiments while I sleep", "overnight optimization", "edit-measure-keep loop", "autoresearch status", or mentions "autoresearch", "experiment loop", "autonomous optimization". Always use this skill when the user wants to iteratively and autonomously improve any measurable metric — even if they don't use the word "autoresearch". Also use when the user asks about the status of a running autoresearch session or wants to cancel/stop one.

12 Updated 1 weeks ago

AI & Automation Solid

autoresearch

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

14 Updated yesterday