couchbase-observabilitylisted

Monitor, alert on, and observe Couchbase clusters in production. Use whenever the user asks about Couchbase metrics, Prometheus, Grafana, alerting, alert thresholds, memory high watermark, disk usage, replication lag, query latency, index build progress, DCP lag, ops/sec, cache miss ratio, Couchbase Exporter, admin_stats_* tools, log aggregation, SIEM shipping, health checks, or 'how do I know if my Couchbase cluster is healthy.' Distinct from couchbase-mcp (calling the tools) and couchbase-security-hardening (audit log shipping). Use proactively for new production deployments needing an observability stack, incident response setup, and SLO definition.
celticht32/Couchbase-Skills-for-Claude.ai · ★ 1 · AI & Automation · score 75

Install: claude install-skill celticht32/Couchbase-Skills-for-Claude.ai

# Couchbase Observability A skill for *monitoring and alerting on* Couchbase clusters in production — metrics, thresholds, Prometheus integration, Grafana dashboards, log aggregation, and health check patterns. Distinct from: - `couchbase-mcp` — calling `admin_stats_*` tools to read current state - `couchbase-security-hardening` — audit log configuration and shipping ## When this skill applies - "How do I monitor Couchbase?" - "What metrics should I alert on?" - "How do I integrate Couchbase with Prometheus / Grafana?" - "What's a good alert threshold for memory / disk / replication lag?" - "How do I know if a node is healthy?" - "How do I aggregate Couchbase logs?" - "How do I set up dashboards for Couchbase?" - "What does the cache miss ratio tell me?" - "How do I detect query performance degradation?" ## Pick the right reference | Question | Read | |---|---| | "What metrics matter and what do they mean?" | `references/key-metrics.md` | | "Prometheus / Grafana setup — scraping, dashboards, recording rules" | `references/prometheus-grafana.md` | | "Alert thresholds — what values should trigger pages vs warnings?" | `references/alert-thresholds.md` | | "Log aggregation — shipping cluster logs to ELK / Splunk / Datadog" | `references/log-aggregation.md` | ## Three core principles **Principle 1 — Alert on symptoms, not causes.** "Disk usage > 80%" is a symptom. "Compaction not keeping up" is a cause. Alert on the symptom (disk), investigate causes during the incident. S