← ClaudeAtlas

benchmark-e2elisted

End-to-end benchmark suite for vercel-plugin. Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report for overnight self-improvement loops.
build-with-dhiraj/ai-workflow-framework-portability-kit · ★ 1 · Testing & QA · score 74
Install: claude install-skill build-with-dhiraj/ai-workflow-framework-portability-kit
# Benchmark E2E Single-command pipeline that creates projects, exercises skill injection via `claude --print`, launches dev servers, verifies they work, analyzes conversation logs, and generates actionable improvement reports. ## Quick Start ```bash # Full suite (9 projects, ~2-3 hours) bun run scripts/benchmark-e2e.ts # Quick mode (first 3 projects, ~30-45 min) bun run scripts/benchmark-e2e.ts --quick ``` Options: | Flag | Description | Default | |------|-------------|---------| | `--quick` | Run only first 3 projects | `false` | | `--base <path>` | Override base directory | `~/dev/vercel-plugin-testing` | | `--timeout <ms>` | Per-project timeout (forwarded to runner) | `900000` (15 min) | ## Pipeline Stages The orchestrator chains four stages sequentially, aborting on failure: 1. **runner** — Creates test dirs, installs plugin, runs `claude --print` with `VERCEL_PLUGIN_LOG_LEVEL=trace` 2. **verify** — Detects package manager, launches dev server, polls for 200 with non-empty HTML 3. **analyze** — Matches JSONL sessions to projects via `run-manifest.json`, extracts metrics 4. **report** — Generates `report.md` and `report.json` with scorecards and recommendations ## Contracts ### `run-manifest.json` Written by the runner at `<base>/results/run-manifest.json`. Links all downstream stages to the same run. ```typescript interface BenchmarkRunManifest { runId: string; // UUID for this pipeline run timestamp: string; // ISO 8601 baseDir: string;