data-pipelines

Solid

Use this skill when building data pipelines, ETL/ELT workflows, or data transformation layers. Triggers on Airflow DAG design, dbt model creation, Spark job optimization, streaming vs batch architecture decisions, data ingestion, data quality checks, pipeline orchestration, incremental loads, CDC (change data capture), schema evolution, and data warehouse modeling. Acts as a senior data engineer advisor for building reliable, scalable data infrastructure.

Data & Documents 167 stars 29 forks Updated today MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%
74
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

When this skill is activated, always start your first response with the ๐Ÿงข emoji. # Data Pipelines A senior data engineer's decision-making framework for building production data pipelines. This skill covers the five pillars of data engineering - ingestion patterns (ETL vs ELT), orchestration (Airflow), transformation (dbt), large-scale processing (Spark), and architecture choices (streaming vs batch) - with emphasis on when to use each pattern and the trade-offs involved. Designed for engineers who need opinionated guidance on building reliable, observable, and maintainable data infrastructure. --- ## When to use this skill Trigger this skill when the user: - Designs an ETL or ELT pipeline from scratch - Writes or debugs an Airflow DAG - Creates dbt models, tests, or macros - Optimizes a Spark job (shuffles, partitioning, memory tuning) - Decides between streaming and batch processing - Implements incremental loads or change data capture (CDC) - Plans a data warehouse or lakehouse architecture - Needs data quality checks, schema evolution, or pipeline monitoring Do NOT trigger this skill for: - BI/analytics dashboard design or visualization (use an analytics skill) - ML model training or feature engineering (use an ML/data-science skill) --- ## Key principles 1. **Idempotency is non-negotiable** - Every pipeline run with the same input must produce the same output. Design for safe re-runs from day one. Use date partitions, merge keys, or upsert logic so that r...

Details

Author
AbsolutelySkilled
Repository
AbsolutelySkilled/AbsolutelySkilled
Created
2 months ago
Last Updated
today
Language
MDX
License
MIT

Similar Skills

Semantically similar based on skill content โ€” not just same category

Data & Documents Listed

data-pipelines

Use this skill when building data pipelines, ETL/ELT workflows, or data transformation layers. Triggers on Airflow DAG design, dbt model creation, Spark job optimization, streaming vs batch architecture decisions, data ingestion, data quality checks, pipeline orchestration, incremental loads, CDC (change data capture), schema evolution, and data warehouse modeling. Acts as a senior data engineer advisor for building reliable, scalable data infrastructure.

3 Updated today
Samuelca6399
Data & Documents Featured

data-engineering-data-pipeline

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

39,350 Updated today
sickn33
Data & Documents Listed

data-engineering-data-pipeline

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

335 Updated today
aiskillstore
Data & Documents Solid

data-quality

Use this skill when implementing data validation, data quality monitoring, data lineage tracking, data contracts, or Great Expectations test suites. Triggers on schema validation, data profiling, freshness checks, row-count anomalies, column drift, expectation suites, contract testing between producers and consumers, lineage graphs, data observability, and any task requiring data integrity enforcement across pipelines.

167 Updated today
AbsolutelySkilled
Data & Documents Listed

data-quality

Use this skill when implementing data validation, data quality monitoring, data lineage tracking, data contracts, or Great Expectations test suites. Triggers on schema validation, data profiling, freshness checks, row-count anomalies, column drift, expectation suites, contract testing between producers and consumers, lineage graphs, data observability, and any task requiring data integrity enforcement across pipelines.

3 Updated today
Samuelca6399