schema-designlisted
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Schema Design for Analytical Workloads
## Start with the Grain
The single most important decision in schema design is the **grain**: what does one row represent? Be explicit and precise.
Bad grain definition: "orders"
Good grain definition: "one row per order line item, at the time of fulfillment"
Every column in a fact table must be true at that grain. If you try to mix grains in one table (order-level and line-item-level facts), you'll produce incorrect aggregations and confuse every analyst who touches it.
---
## Star Schema — Default Choice for Analytics
A star schema has one central fact table surrounded by dimension tables. It's optimized for BI tool query patterns — simple JOINs, fast aggregations.
```
fct_orders
├── order_id (PK)
├── customer_key (FK → dim_customers)
├── product_key (FK → dim_products)
├── date_key (FK → dim_date)
├── quantity
└── revenue_usd
dim_customers
├── customer_key (surrogate PK)
├── customer_id (natural/source key)
├── name
├── country
└── segment
dim_date
├── date_key
├── full_date
├── year, month, week, day_of_week
└── is_holiday
```
**Rules:**
- Fact tables contain measures (numbers you aggregate) and foreign keys to dimensions
- Dimension tables contain descriptive attributes (text, categories, dates)
- Dimension tables are denormalized — repeat values rather than normalizing them out
---
## Slowly Changing Dimensions (SCD)
What happens when a customer changes their country, or a product