← ClaudeAtlas

datadog-investigatelisted

Investigate production issues by querying Datadog logs, metrics, and APM traces, then correlating findings with the codebase. Use this skill whenever the user mentions production errors, Datadog, observability, log investigation, latency spikes, error rate increases, 500s, trace IDs, monitor alerts, or wants to debug any service issue in a deployed environment.
ClipboardHealth/groundcrew · ★ 40 · Code & Development · score 80
Install: claude install-skill ClipboardHealth/groundcrew
# Datadog Investigation Skill Investigate production issues by querying Datadog logs, metrics, and APM traces, then correlating findings with the codebase. ## Prerequisites - Datadog CLI (`dog`) installed and configured via `~/.dogrc` with `apikey` and `appkey` ## Setup: API Credentials Every Datadog API call needs authentication. Extract credentials once and reuse them to keep commands readable: ```bash DD_API_KEY=$(grep apikey ~/.dogrc | cut -d= -f2 | tr -d ' ') DD_APP_KEY=$(grep appkey ~/.dogrc | cut -d= -f2 | tr -d ' ') ``` Use these variables in all subsequent curl calls. If a shell session is lost, re-extract them. ## Default Environment Filter by `env:production` unless the user specifies otherwise. Production is the default because that's where real user-impacting issues live — staging and dev issues rarely warrant this investigation workflow. ## Timestamps Use Node.js for portable timestamp calculations (works on macOS and Linux): ```bash node -e "console.log(Math.floor(Date.now()/1000))" # now node -e "console.log(Math.floor(Date.now()/1000) - 3600)" # 1 hour ago node -e "console.log(Math.floor(Date.now()/1000) - 86400)" # 24 hours ago ``` ## Investigation Workflow When a user reports an issue, follow this flow. The goal is to move from symptoms to root cause to fix as quickly as possible. 1. **Clarify the problem** — Get service name, time range, error messages, or trace IDs. If the user is vague, start with the last hour of errors for th