← ClaudeAtlas

data-qualitylisted

Write systematic data quality checks — validation rules, Great Expectations suites, dbt tests, anomaly detection, null/type/range/referential integrity assertions, and monitoring patterns for production pipelines. Use this skill whenever the user is dealing with bad data in a pipeline, setting up validation before or after a load step, adding tests to dbt models, writing Great Expectations expectations, or trying to detect when upstream data has changed shape. Also trigger when stakeholders keep finding incorrect numbers, when a pipeline silently loads garbage, or when the user asks "how do I make sure my data is correct". Prevention is cheaper than debugging.
Methasit-Pun/data_engineer_claude_skills · ★ 0 · Data & Documents · score 60
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Data Quality ## Why Data Quality Fails Silently Pipelines succeed (exit 0) even when they load garbage. A missing WHERE clause, a silent NULL coercion, an upstream schema change — none of these throw exceptions. The result is a dashboard that looks fine until someone cross-references it with reality. Systematic data quality checks turn silent failures into loud ones. The goal is to catch issues at the pipeline boundary — not after a stakeholder finds them. --- ## Four Dimensions to Check Every check falls into one of these categories: | Dimension | What it catches | Examples | |---|---|---| | **Completeness** | Missing or null values | `user_id IS NULL`, row count = 0 | | **Validity** | Values outside expected domain | Negative revenue, future birthdates, unknown status codes | | **Consistency** | Internal contradictions | `end_date < start_date`, child with no parent | | **Freshness** | Data is stale or arrived late | Last `updated_at` is 3 days ago when it should be hourly | --- ## dbt Tests — Start Here dbt's built-in tests cover the most common checks with zero code: ```yaml # models/staging/schema.yml models: - name: stg_orders columns: - name: order_id tests: - not_null - unique - name: status tests: - accepted_values: values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled'] - name: customer_id tests: - not_null - relationships: