data-qualitylisted
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Data Quality
## Why Data Quality Fails Silently
Pipelines succeed (exit 0) even when they load garbage. A missing WHERE clause, a silent NULL coercion, an upstream schema change — none of these throw exceptions. The result is a dashboard that looks fine until someone cross-references it with reality.
Systematic data quality checks turn silent failures into loud ones. The goal is to catch issues at the pipeline boundary — not after a stakeholder finds them.
---
## Four Dimensions to Check
Every check falls into one of these categories:
| Dimension | What it catches | Examples |
|---|---|---|
| **Completeness** | Missing or null values | `user_id IS NULL`, row count = 0 |
| **Validity** | Values outside expected domain | Negative revenue, future birthdates, unknown status codes |
| **Consistency** | Internal contradictions | `end_date < start_date`, child with no parent |
| **Freshness** | Data is stale or arrived late | Last `updated_at` is 3 days ago when it should be hourly |
---
## dbt Tests — Start Here
dbt's built-in tests cover the most common checks with zero code:
```yaml
# models/staging/schema.yml
models:
- name: stg_orders
columns:
- name: order_id
tests:
- not_null
- unique
- name: status
tests:
- accepted_values:
values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
- name: customer_id
tests:
- not_null
- relationships: