benchmarklisted

Compare Claude Code output with full config vs minimal config using standardized tasks per stack.
luiseiman/dotforge · ★ 7 · AI & Automation · score 73

Install: claude install-skill luiseiman/dotforge

# Benchmark Compare the effectiveness of a project's full dotforge configuration against a minimal baseline by executing the same standardized task in two isolated worktrees. **Cost warning:** Each benchmark runs Claude Code twice (full + minimal). Use sparingly and only after Fases 0-2 are working. ## Prerequisites - Project must have `.claude/settings.json` and `CLAUDE.md` - Project must be a git repository with a clean working tree - Task definitions must exist in `$DOTFORGE_DIR/tests/benchmark-tasks/` ## Step 1: Select task 1. Detect project stacks from `.claude/.forge-manifest.json` or infer from project files 2. Load matching task from `$DOTFORGE_DIR/tests/benchmark-tasks/{stack}.yml` 3. If multiple stacks match, let user choose or run the first match 4. If no stack matches, use `generic.yml` Display: ``` ═══ BENCHMARK SETUP ═══ Project: {{name}} Stack detected: {{stack}} Task: {{task title}} Description: {{task description}} ⚠ This will run Claude Code twice in isolated worktrees. Proceed? (yes/no) ``` ## Step 2: Prepare worktrees Create two git worktrees from the current HEAD: 1. **Full config** — `git worktree add /tmp/bench-full-{{slug}} HEAD` - Copy entire `.claude/` directory as-is - Copy `CLAUDE.md` as-is 2. **Minimal config** — `git worktree add /tmp/bench-minimal-{{slug}} HEAD` - Create minimal `CLAUDE.md` with only project name and "Build & Test" section - Create minimal `.claude/settings.json` with only `allowedTools` (no hooks, no den