benchmark-e2elisted
Install: claude install-skill build-with-dhiraj/ai-workflow-framework-portability-kit
# Benchmark E2E
Single-command pipeline that creates projects, exercises skill injection via `claude --print`, launches dev servers, verifies they work, analyzes conversation logs, and generates actionable improvement reports.
## Quick Start
```bash
# Full suite (9 projects, ~2-3 hours)
bun run scripts/benchmark-e2e.ts
# Quick mode (first 3 projects, ~30-45 min)
bun run scripts/benchmark-e2e.ts --quick
```
Options:
| Flag | Description | Default |
|------|-------------|---------|
| `--quick` | Run only first 3 projects | `false` |
| `--base <path>` | Override base directory | `~/dev/vercel-plugin-testing` |
| `--timeout <ms>` | Per-project timeout (forwarded to runner) | `900000` (15 min) |
## Pipeline Stages
The orchestrator chains four stages sequentially, aborting on failure:
1. **runner** — Creates test dirs, installs plugin, runs `claude --print` with `VERCEL_PLUGIN_LOG_LEVEL=trace`
2. **verify** — Detects package manager, launches dev server, polls for 200 with non-empty HTML
3. **analyze** — Matches JSONL sessions to projects via `run-manifest.json`, extracts metrics
4. **report** — Generates `report.md` and `report.json` with scorecards and recommendations
## Contracts
### `run-manifest.json`
Written by the runner at `<base>/results/run-manifest.json`. Links all downstream stages to the same run.
```typescript
interface BenchmarkRunManifest {
runId: string; // UUID for this pipeline run
timestamp: string; // ISO 8601
baseDir: string;