data-pipeline

Solid

Wire ETL, ingestion, cron, edge-function, and queue jobs correctly. Use for "build a pipeline", "sync X into Y", "nightly aggregation", "cron double-counts", "dedupe", "backfill", "the numbers are wrong after a retry". Bakes in idempotency, atomic writes, data contracts, dead-letter, and observability.

Data & Documents 6 stars 0 forks Updated 3 days ago MIT

Install

View on GitHub

Quality Score: 81/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Data Pipeline Correctness > Pipelines fail silently: a retry double-counts, a partial write corrupts a table, a schema drift poisons a dashboard, and nobody notices until the numbers are wrong. This skill bakes correctness in at build time. It complements post-hoc data-integrity audit skills (which *detect* these after the fact) and the Supabase plugin (DB/Edge Functions/RLS). ## When this fires Any job that **moves, transforms, or aggregates** data: ingestion/ETL/ELT, scheduled aggregations, edge-function workers, `pg_cron` jobs, queue consumers, webhook processors, backfills, materialized-view refreshes. ## Non-negotiables (the 5 that prevent silent corruption) 1. **Idempotency** — running the same job twice must not change the result. Retries, at-least-once queues, and overlapping cron fires are guaranteed, not hypothetical. - Use `INSERT ... ON CONFLICT (natural_key) DO UPDATE` (upsert), not blind `INSERT`. - Derive a deterministic dedup key from the source event, not `now()` or a random id. - For aggregates: recompute-and-replace a window, or use idempotent deltas — never `count = count + 1` on a path that can retry. 2. **Atomicity** — a job either fully applies or not at all. No half-written batches. - Wrap multi-row writes in a transaction; stage to a temp/raw table then swap. - A function that writes to 3 tables must not leave 1 of them updated on failure. 3. **Data contracts** — validate shape at the boundary before trusting input. - Parse/validate (zod...

Details

Author: kensaurus
Repository: kensaurus/cursor-kenji
Created: 5 months ago
Last Updated: 3 days ago
Language: JavaScript
License: MIT

Integrates with

Supabase · Cloud Next.js · Frontend Tailwind CSS · Frontend

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Listed

pipeline-design

Design ETL/ELT pipelines end-to-end — source connectors, extraction strategies, transform logic, load patterns, idempotency, scheduling, and error handling. Use this skill whenever the user is starting a new ingestion job, planning how data moves from a source (REST API, database, file, webhook, message queue) into a data warehouse or data lake. Also trigger when the user asks about pipeline architecture, incremental vs. full loads, backfill strategies, CDC, retry logic, or orchestration choices (Airflow, Prefect, dbt). This skill should feel like pairing with a senior data engineer on day one of a new pipeline project.

1 Updated 1 weeks ago

Methasit-Pun

Data & Documents Solid

data-pipeline-architect

Designs robust ETL/ELT data pipelines covering ingestion, idempotency, schema evolution, orchestration, and data quality validation. Use this skill when the user asks to design, build, or review a data pipeline, ingest data from APIs/databases/files into a warehouse or lake, set up batch or streaming ETL/ELT, choose an orchestrator (Airflow, Dagster, Prefect, dbt), make a pipeline idempotent or backfill-safe, handle late-arriving or duplicate data, manage schema drift/evolution, add data quality or freshness checks, or model incremental/CDC loads.

3 Updated today

JayRHa

Data & Documents Listed

data-engineer

Builds and hardens the pipelines and warehouse structures that move data from source systems to the people and systems that consume it. Use when the user says "build the ETL pipeline", "design the dbt models", "orchestrate this pipeline", or "design the warehouse schema", or "/agent-collab:data-engineer." Also offer this proactively when a pipeline lacks idempotency, has no data-quality checks, or moves data through undocumented schema contracts.

0 Updated today

sumitake