data-engineering
Featured数据工程。Airflow、Dagster、Kafka Streams、Flink、dbt、数据管道、流处理、数据质量。当用户提到数据管道、ETL、流处理、数据质量时路由到此。
Install
Quality Score: 99/100
Skill Content
Details
- Author
- fengshao1227
- Repository
- fengshao1227/ccg-workflow
- Created
- 4 months ago
- Last Updated
- 2 days ago
- Language
- Go
- License
- MIT
Similar Skills
Semantically similar based on skill content — not just same category
data-engineering
数据工程(Airflow/Dagster/Kafka/Flink/dbt、数据管道、ETL、流处理、数据质量)。
data-pipeline
【数据管道】ETL 管道设计、Airflow/dbt 模式、数据验证、监控告警。 触发时机: - 用户要求"设计数据管道"、"ETL流程" - 需要搭建 Airflow DAG - 数据转换和验证 提供完整的数据管道设计方案。
data-engineering-master
数据工程 — 数据平台从业者的认知操作系统, 覆盖把数据从源系统搬运成可靠 / 可查询 / 可信赖形态供分析 / ML / 数据产品消费的全生命周期 (生成 → 摄取 → 存储 → 转换 → 服务 + 安全/数据管理/DataOps/数据架构/编排/软件工程 六条暗流, Reis & Housley 框架): 摄取与集成 (批 + CDC 变更数据捕获 Debezium + EL 工具 Fivetran/Airbyte/Meltano/dlt + Kafka Connect + schema drift) / 存储与文件表格式 (对象存储数据湖 + 列存 Parquet/ORC/Arrow/Avro + 开放表格式 Apache Iceberg/Delta Lake/Apache Hudi + lakehouse + 分区/compaction) / 转换与建模 (ELT dbt/SQLMesh + Spark + 维度建模 Kimball + Inmon + Data Vault + 大宽表 OBT + 渐变维 SCD + 增量模型 + 语义/指标层) / 编排与工作流 (Apache Airflow/Dagster/Prefect/Mage/Kestra/Apache DolphinScheduler + DAG + 幂等 + 回填 backfill + 数据资产调度) / 批流与实时 (Apache Kafka/Apache Flink/Spark Structured Streaming/Kinesis/Pulsar/Redpanda + Lambda vs Kappa + watermark/窗口/exactly-once + 流式 SQL Materialize/RisingWave + 实时 OLAP ClickHouse/Apache Druid/Apache Pinot/StarRocks/Apache Doris) / 数仓与查询引擎 (Snowflake/BigQuery/Redshift/Databricks SQL/Trino/Presto/DuckDB/Polars + 存算分离 + MPP) / 数据质量测试与可观测性 (dbt tests/Great Expectations/Soda + 数据契约 + Monte Carlo data downtime + 新鲜度/量/schem
orchestration-patterns
Airflow/Prefect/Dagster DAG design — task dependencies, retries, SLAs, backfill strategies, sensors, and failure recovery. Use this skill whenever the user is building or debugging a scheduled pipeline with multiple steps, asking how to handle task failures, setting up retries or alerts, designing a DAG structure, choosing between orchestrators, or dealing with backfill/reprocessing of historical data. Also trigger when the user mentions Airflow operators, Prefect flows, Dagster assets, task queues, or pipeline scheduling — even if they don't say "orchestration" explicitly. If a pipeline has more than two steps and needs to run on a schedule, this skill should be active.
data-engineering-data-pipeline
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.