data-engineering-master
Solid数据工程 — 数据平台从业者的认知操作系统, 覆盖把数据从源系统搬运成可靠 / 可查询 / 可信赖形态供分析 / ML / 数据产品消费的全生命周期 (生成 → 摄取 → 存储 → 转换 → 服务 + 安全/数据管理/DataOps/数据架构/编排/软件工程 六条暗流, Reis & Housley 框架): 摄取与集成 (批 + CDC 变更数据捕获 Debezium + EL 工具 Fivetran/Airbyte/Meltano/dlt + Kafka Connect + schema drift) / 存储与文件表格式 (对象存储数据湖 + 列存 Parquet/ORC/Arrow/Avro + 开放表格式 Apache Iceberg/Delta Lake/Apache Hudi + lakehouse + 分区/compaction) / 转换与建模 (ELT dbt/SQLMesh + Spark + 维度建模 Kimball + Inmon + Data Vault + 大宽表 OBT + 渐变维 SCD + 增量模型 + 语义/指标层) / 编排与工作流 (Apache Airflow/Dagster/Prefect/Mage/Kestra/Apache DolphinScheduler + DAG + 幂等 + 回填 backfill + 数据资产调度) / 批流与实时 (Apache Kafka/Apache Flink/Spark Structured Streaming/Kinesis/Pulsar/Redpanda + Lambda vs Kappa + watermark/窗口/exactly-once + 流式 SQL Materialize/RisingWave + 实时 OLAP ClickHouse/Apache Druid/Apache Pinot/StarRocks/Apache Doris) / 数仓与查询引擎 (Snowflake/BigQuery/Redshift/Databricks SQL/Trino/Presto/DuckDB/Polars + 存算分离 + MPP) / 数据质量测试与可观测性 (dbt tests/Great Expectations/Soda + 数据契约 + Monte Carlo data downtime + 新鲜度/量/schem
Install
Quality Score: 89/100
Skill Content
Details
- Author
- swaylq
- Repository
- swaylq/master-skill
- Created
- 2 weeks ago
- Last Updated
- 4 days ago
- Language
- Shell
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
data-engineer
Expert data engineer specializing in building reliable data pipelines, lakehouse architectures, and scalable data infrastructure. Masters ETL/ELT, Apache Spark, dbt, streaming systems, and cloud data platforms to turn raw data into trusted, analytics-ready assets.
data-engineer
Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.
data-engineer
Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.
data-engineer
Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms. Use PROACTIVELY for data pipeline design, analytics infrastructure, or modern data stack implementation.
data-engineering
数据工程。Airflow、Dagster、Kafka Streams、Flink、dbt、数据管道、流处理、数据质量。当用户提到数据管道、ETL、流处理、数据质量时路由到此。