← ClaudeAtlas

skill-optimizerlisted

SkillOpt-flavored offline training loop for any SKILL.md. Treats accumulated learn-rule corrections as training trajectories, proposes bounded patches via an optimizer LLM, gates each candidate against a held-out validation set built from the user's own past corrections, and ships only candidates that demonstrably improve the score. Inspired by Microsoft SkillOpt's ReflACT pipeline (rollout → reflect → aggregate → select → update → evaluate) adapted to pro-workflow's SQLite store. Use when a skill has accumulated 8+ learn-rule rows and the user wants the skill itself to get better, not just longer.
rohitg00/pro-workflow · ★ 2,280 · AI & Automation · score 83
Install: claude install-skill rohitg00/pro-workflow
# Skill Optimizer Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes. ## When to use Use this skill when: - A pro-workflow skill has accumulated 8+ learn-rule rows for it - The user reports the skill is "getting bloated" or "rules keep being repeated" - The user wants offline, budget-capped improvement over multiple sessions Do not use when: - Skill has fewer than 8 trajectories (nothing to learn from) - The user wants real-time edits (this is offline, single-shot) - No `ANTHROPIC_API_KEY` (or equivalent provider key) is available ## Architecture (mirrors SkillOpt's six-stage loop) ```text rollout pull recent learnings from SQLite (existing learn-rule rows) reflect optimizer LLM analyzes a minibatch, proposes add/delete/replace patches aggregate vote-merge patches across minibatches select clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step) update apply selected patches to a candidate skill content evaluate evaluator LLM scores candidate against held-out validation items gate accept candidate only if weighted score >= current + acceptThreshold slow update at epoch boundary, consolidate accepted edits into a coherent rewrite ``` Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice.