← ClaudeAtlas

data-pipelinelisted

Wire ETL, ingestion, cron, edge-function, and queue jobs correctly. Use for "build a pipeline", "sync X into Y", "nightly aggregation", "cron double-counts", "dedupe", "backfill", "the numbers are wrong after a retry". Bakes in idempotency, atomic writes, data contracts, dead-letter, and observability.
kensaurus/cursor-kenji · ★ 4 · Data & Documents · score 80
Install: claude install-skill kensaurus/cursor-kenji
# Data Pipeline Correctness > Pipelines fail silently: a retry double-counts, a partial write corrupts a table, a schema drift poisons a dashboard, and nobody notices until the numbers are wrong. This skill bakes correctness in at build time. It complements `sbc-qa-data-integrity-audit` (which *detects* these after the fact) and the Supabase plugin (DB/Edge Functions/RLS). ## When this fires Any job that **moves, transforms, or aggregates** data: ingestion/ETL/ELT, scheduled aggregations, edge-function workers, `pg_cron` jobs, queue consumers, webhook processors, backfills, materialized-view refreshes. ## Non-negotiables (the 5 that prevent silent corruption) 1. **Idempotency** — running the same job twice must not change the result. Retries, at-least-once queues, and overlapping cron fires are guaranteed, not hypothetical. - Use `INSERT ... ON CONFLICT (natural_key) DO UPDATE` (upsert), not blind `INSERT`. - Derive a deterministic dedup key from the source event, not `now()` or a random id. - For aggregates: recompute-and-replace a window, or use idempotent deltas — never `count = count + 1` on a path that can retry. 2. **Atomicity** — a job either fully applies or not at all. No half-written batches. - Wrap multi-row writes in a transaction; stage to a temp/raw table then swap. - A function that writes to 3 tables must not leave 1 of them updated on failure. 3. **Data contracts** — validate shape at the boundary before trusting input. - Parse/validate (zod / pyd