data-lineage-mapper

Solid

Extracts and maps data lineage from various sources including SQL, dbt, Airflow, and Spark, generating comprehensive lineage graphs for impact analysis.

Data & Documents 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Data Lineage Mapper Extracts and maps data lineage from various sources to provide comprehensive data flow visibility. ## Overview This skill parses and extracts data lineage information from SQL queries, dbt projects, Airflow DAGs, and Spark jobs. It generates comprehensive lineage graphs showing data flow from source to destination, enabling impact analysis and data governance. ## Capabilities - **SQL parsing for lineage extraction** - Parse SELECT, INSERT, MERGE statements - **dbt lineage integration** - Extract lineage from manifest.json - **Airflow task lineage mapping** - Map data flows across DAG tasks - **Spark job lineage extraction** - Parse Spark SQL and DataFrame operations - **Cross-system lineage connection** - Connect lineage across different tools - **Column-level lineage tracing** - Track individual column transformations - **Impact analysis** - Downstream/upstream impact assessment - **Lineage graph generation** - Visual and machine-readable lineage - **Integration with data catalogs** - Export to DataHub, Amundsen, Alation ## Input Schema ```json { "sources": { "type": "array", "required": true, "items": { "type": { "type": "string", "enum": ["sql", "dbt", "airflow", "spark", "file"] }, "content": { "type": "string|object", "description": "SQL string, file path, or manifest object" }, "metadata": { "type": "object", "properties": { "database": "str...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category