← ClaudeAtlas

data-quality-auditlisted

Runs a structured audit on a table covering nulls, duplicates, freshness, schema drift, primary key uniqueness, value distributions, and referential integrity. Use when the user mentions data quality, DQ check, audit this table, "can I trust this data", null check, dedup, freshness, or before relying on a new source.
vermapragya/analytics-skill · ★ 0 · Data & Documents · score 72
Install: claude install-skill vermapragya/analytics-skill
# Data Quality Audit ## When to use this skill Use **before** trusting data for analysis, modeling, or reporting. Triggers: - "Audit this table" - "Check the data quality of…" - "Can I trust this data?" - "Is fct_orders fresh?" - "Are there duplicates in…" - Before kicking off any modeling skill (`logistic-regression`, `survival-analysis`, etc.) ## Required inputs | Input | Why it matters | |---|---| | Table name | What to audit | | Stated primary key | What grain rows should be at | | Expected freshness | How recent data should be | | Critical columns | Columns where null/garbage breaks downstream | | Foreign key columns | For referential integrity | ## Workflow Run these in order. Stop and report if any check FAILS critically. ### 1. Row count and time range ```sql select count(*) as row_count, min(<time_col>) as earliest, max(<time_col>) as latest, datediff('hour', max(<time_col>), current_timestamp) as hours_since_latest from <table>; ``` **Expected:** non-zero rows, latest timestamp within freshness SLA. **Fail if:** row count = 0, or `hours_since_latest` > freshness SLA × 1.5. ### 2. Primary key uniqueness ```sql select <pk_cols>, count(*) as dupes from <table> group by <pk_cols> having count(*) > 1 limit 100; ``` **Expected:** zero rows. **Fail if:** any duplicates. Investigate before proceeding. ### 3. Null check on critical columns ```sql select sum(case when <col_1> is null then 1 else 0 end) as nulls_col_1, sum(case when <col