engineering-data-pipelines

Solid

Data engineering knowledge reference covering Airflow, Dagster, Kafka Streams, Flink, dbt, and data quality patterns. Use when building data pipelines, ETL workflows, stream processing, or data quality checks.

Data & Documents 228 stars 30 forks Updated today MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%
79
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
26
Issue Health 10%
80
License 10%
100
Description 5%
100

Skill Content

# 数据工程域 · Data Engineering ``` 编排:Airflow(调度) | Dagster(资产) | Prefect(现代流) 流处理:Kafka Streams(嵌入式) | Flink(集群) | Spark Streaming 质量:Great Expectations | dbt tests | Soda Core ``` ## 编排检查项 幂等(UPSERT/分区覆盖) | 增量(`WHERE updated_at > last_run`) | 事件驱动触发 | 跨 DAG 依赖 | 数据血缘(`ref()`/Asset deps) ## 流处理检查项 时间语义选择 | Watermark 乱序容忍 | 状态 TTL 防膨胀 | Checkpoint 间隔 | 端到端 Exactly-Once | 背压监控 ## 质量检查项 分层验证(源→转换→目标) | 完整性+准确性+一致性 | 及时性阈值 | 加权评分 | 告警(Slack/PagerDuty) 工具对比、API 用法、质量维度详见 [references/details.md](references/details.md)

Details

Author
telagod
Repository
telagod/code-abyss
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category