← ClaudeAtlas

kubernetes-operationslisted

Debugs Kubernetes pods and controllers — FailedCreate, ImagePullBackOff, init-container failures, probe flapping, missing service endpoints, GKE NEG readiness. Use when a pod is not Running, a Deployment/StatefulSet shows FailedCreate, image pulls fail, or services lack endpoints.
Goodsmileduck/claude-registry · ★ 1 · DevOps & Infrastructure · score 74
Install: claude install-skill Goodsmileduck/claude-registry
# Kubernetes — pod debug decision tree For ArgoCD-managed resources, also check the `argocd-operations` skill: direct mutations are reverted by `selfHeal` within ~3 minutes. ## When to invoke The pod's `STATUS` column tells you which branch to take. Always start with: ```bash kubectl config current-context # confirm cluster/env BEFORE anything kubectl describe pod <pod> -n <ns> | tail -40 # events at the bottom kubectl logs <pod> -n <ns> [-c <container>] [--previous] kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -20 ``` The `Events:` section at the end of `describe` is the single highest-signal source. Read it before anything else. ## Pre-flight: is this resource Argo-managed? Before any `kubectl edit`/`patch`/`apply -f` fix, check whether the resource is GitOps-owned: ```bash kubectl get <kind> <name> -n <ns> -o jsonpath='{.metadata.labels}{"\n"}{.metadata.annotations}{"\n"}' # managed-by indicators: argocd.argoproj.io/tracking-id, meta.helm.sh/release-name, app.kubernetes.io/managed-by ``` If managed: fix the source (chart/values/kustomization), not the cluster. See the `argocd-operations` skill. ## Branch 1 — Pod never created (`FailedCreate` on the controller) The pod doesn't exist yet; the ReplicaSet/StatefulSet/Job can't create it. ```bash # Look at the controller's events, not the pod's (the pod isn't there) kubectl describe rs <rs-name> -n <ns> | tail -30 kubectl describe statefulset <ss> -n <ns> | tail -30 ``` | Event mess