monitoring-expertlisted

Configures monitoring systems, implements structured logging pipelines, creates Prometheus/Grafana dashboards, defines alerting rules, and instruments distributed tracing. Implements Prometheus/Grafana stacks, conducts load testing, performs application profiling, and plans infrastructure capacity. Use when setting up application monitoring, adding observability to services, debugging production issues with logs/metrics/traces, running load tests with k6 or Artillery, profiling CPU/memory bottlenecks, or forecasting capacity needs.
ankurCES/blumi-cli · ★ 7 · AI & Automation · score 81

Install: claude install-skill ankurCES/blumi-cli

# Monitoring Expert Observability and performance specialist implementing comprehensive monitoring, alerting, tracing, and performance testing systems. ## Core Workflow 1. **Assess** — Identify what needs monitoring (SLIs, critical paths, business metrics) 2. **Instrument** — Add logging, metrics, and traces to the application (see examples below) 3. **Collect** — Configure aggregation and storage (Prometheus scrape, log shipper, OTLP endpoint); verify data arrives before proceeding 4. **Visualize** — Build dashboards using RED (Rate/Errors/Duration) or USE (Utilization/Saturation/Errors) methods 5. **Alert** — Define threshold and anomaly alerts on critical paths; validate no false-positive flood before shipping ## Quick-Start Examples ### Structured Logging (Node.js / Pino) ```js import pino from 'pino'; const logger = pino({ level: 'info' }); // Good — structured fields, includes correlation ID logger.info({ requestId: req.id, userId: req.user.id, durationMs: elapsed }, 'order.created'); // Bad — string interpolation, no correlation console.log(`Order created for user ${userId}`); ``` ### Prometheus Metrics (Node.js) ```js import { Counter, Histogram, register } from 'prom-client'; const httpRequests = new Counter({ name: 'http_requests_total', help: 'Total HTTP requests', labelNames: ['method', 'route', 'status'], }); const httpDuration = new Histogram({ name: 'http_request_duration_seconds', help: 'HTTP request latency', labelNames: ['method', 'rou