etl-pipeline-builder

Solid

Build and manage ETL pipelines for data migration with transformation, CDC, and monitoring

Data & Documents 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# ETL Pipeline Builder Skill Builds and manages ETL (Extract, Transform, Load) pipelines for data migration, supporting incremental loads, CDC, and comprehensive monitoring. ## Purpose Enable data pipeline creation for: - Source-to-target mapping - Transformation definition - Incremental load setup - CDC configuration - Pipeline monitoring ## Capabilities ### 1. Source-to-Target Mapping - Define column mappings - Handle schema differences - Configure data type conversions - Manage derived columns ### 2. Transformation Definition - Data type transformations - Value mappings - Aggregations - Lookups and enrichments ### 3. Incremental Load Setup - Define watermarks - Configure incremental columns - Handle deletes - Manage merge logic ### 4. CDC Configuration - Log-based CDC - Trigger-based CDC - Timestamp-based CDC - Full load comparison ### 5. Error Handling - Define retry policies - Configure dead letter queues - Handle data quality issues - Implement alerting ### 6. Pipeline Monitoring - Track pipeline metrics - Monitor data volumes - Alert on failures - Generate SLA reports ## Tool Integrations | Tool | Type | Integration Method | |------|------|-------------------| | Apache Airflow | Orchestration | Python | | dbt | Transformation | CLI | | Airbyte | Data integration | API | | Fivetran | SaaS ETL | API | | AWS DMS | Cloud migration | CLI | | Debezium | CDC | Config | ## Output Schema ```json { "pipelineId": "string", "timestamp": "ISO8601", "pipeline": {...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Listed

etl

ETL pipeline development with focus on data quality, orchestration, and error handling patterns.

0 Updated today
ignKhut
Data & Documents Listed

pipeline-architect

Designs and implements data pipelines: ETL/ELT, streaming, batch processing, schema migrations, and data warehouse architecture. Covers Kafka, Airflow, dbt, Spark, ClickHouse, BigQuery, Snowflake, Redis Streams, and more. Use this skill when the user asks about data pipelines, ETL jobs, data transformation, streaming setup, data warehouse design, CDC, schema migrations, data quality checks, or anything involving moving data from source to target. Also triggers on "build a pipeline," "migrate data from X to Y," "set up streaming," "design my data warehouse," or "data quality is bad, help me fix it."

1 Updated 4 days ago
mturac
Data & Documents Featured

data-engineering-data-pipeline

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

39,350 Updated today
sickn33
Data & Documents Listed

data-engineering-data-pipeline

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

335 Updated today
aiskillstore
Data & Documents Listed

pipeline-design

Design ETL/ELT pipelines end-to-end — source connectors, extraction strategies, transform logic, load patterns, idempotency, scheduling, and error handling. Use this skill whenever the user is starting a new ingestion job, planning how data moves from a source (REST API, database, file, webhook, message queue) into a data warehouse or data lake. Also trigger when the user asks about pipeline architecture, incremental vs. full loads, backfill strategies, CDC, retry logic, or orchestration choices (Airflow, Prefect, dbt). This skill should feel like pairing with a senior data engineer on day one of a new pipeline project.

0 Updated 5 days ago
Methasit-Pun