debug-buttercuplisted

Debugs the Buttercup CRS (Cyber Reasoning System) running on Kubernetes. Use when diagnosing pod crashes, restart loops, Redis failures, resource pressure, disk saturation, DinD issues, or any service misbehavior in the crs namespace. Covers triage, log analysis, queue inspection, and common failure patterns for: redis, fuzzer-bot, coverage-bot, seed-gen, patcher, build-bot, scheduler, task-server, task-downloader, program-model, litellm, dind, tracer-bot, merger-bot, competition-api, pov-reproducer, scratch-cleaner, registry-cache, image-preloader, ui.
kevinvwong/stack-agents · ★ 1 · AI & Automation · score 67

Install: claude install-skill kevinvwong/stack-agents

# Debug Buttercup ## When to Use - Pods in the `crs` namespace are in CrashLoopBackOff, OOMKilled, or restarting - Multiple services restart simultaneously (cascade failure) - Redis is unresponsive or showing AOF warnings - Queues are growing but tasks are not progressing - Nodes show DiskPressure, MemoryPressure, or PID pressure - Build-bot cannot reach the Docker daemon (DinD failures) - Scheduler is stuck and not advancing task state - Health check probes are failing unexpectedly - Deployed Helm values don't match actual pod configuration ## When NOT to Use - Deploying or upgrading Buttercup (use Helm and deployment guides) - Debugging issues outside the `crs` Kubernetes namespace - Performance tuning that doesn't involve a failure symptom ## Namespace and Services All pods run in namespace `crs`. Key services: | Layer | Services | |-------|----------| | Infra | redis, dind, litellm, registry-cache | | Orchestration | scheduler, task-server, task-downloader, scratch-cleaner | | Fuzzing | build-bot, fuzzer-bot, coverage-bot, tracer-bot, merger-bot | | Analysis | patcher, seed-gen, program-model, pov-reproducer | | Interface | competition-api, ui | ## Triage Workflow Always start with triage. Run these three commands first: ```bash # 1. Pod status - look for restarts, CrashLoopBackOff, OOMKilled kubectl get pods -n crs -o wide # 2. Events - the timeline of what went wrong kubectl get events -n crs --sort-by='.lastTimestamp' # 3. Warnings only - filter the noise k