debug-buttercuplisted
Install: claude install-skill kevinvwong/stack-agents
# Debug Buttercup
## When to Use
- Pods in the `crs` namespace are in CrashLoopBackOff, OOMKilled, or restarting
- Multiple services restart simultaneously (cascade failure)
- Redis is unresponsive or showing AOF warnings
- Queues are growing but tasks are not progressing
- Nodes show DiskPressure, MemoryPressure, or PID pressure
- Build-bot cannot reach the Docker daemon (DinD failures)
- Scheduler is stuck and not advancing task state
- Health check probes are failing unexpectedly
- Deployed Helm values don't match actual pod configuration
## When NOT to Use
- Deploying or upgrading Buttercup (use Helm and deployment guides)
- Debugging issues outside the `crs` Kubernetes namespace
- Performance tuning that doesn't involve a failure symptom
## Namespace and Services
All pods run in namespace `crs`. Key services:
| Layer | Services |
|-------|----------|
| Infra | redis, dind, litellm, registry-cache |
| Orchestration | scheduler, task-server, task-downloader, scratch-cleaner |
| Fuzzing | build-bot, fuzzer-bot, coverage-bot, tracer-bot, merger-bot |
| Analysis | patcher, seed-gen, program-model, pov-reproducer |
| Interface | competition-api, ui |
## Triage Workflow
Always start with triage. Run these three commands first:
```bash
# 1. Pod status - look for restarts, CrashLoopBackOff, OOMKilled
kubectl get pods -n crs -o wide
# 2. Events - the timeline of what went wrong
kubectl get events -n crs --sort-by='.lastTimestamp'
# 3. Warnings only - filter the noise
k