data-wranglinglisted
Install: claude install-skill Tibsfox/gsd-skill-creator
# Data Wrangling
Data wrangling is the work that sits between raw data and analysis -- the unglamorous, indispensable practice of making data trustworthy. Estimates vary, but practitioners consistently report that 60-80% of analysis time is spent wrangling. This skill covers the principles and techniques of data cleaning, transformation, reshaping, and integration, grounded in Hadley Wickham's tidy data framework and extended to the realities of messy real-world datasets.
**Agent affinity:** tukey (EDA-driven cleaning), nightingale (routing wrangling tasks)
**Concept IDs:** data-data-sources, data-data-quality, data-sampling-bias
## The Wrangling Pipeline
| Stage | Goal | Key operations |
|---|---|---|
| 1. Ingestion | Get data into a working environment | Read CSV/JSON/Parquet/SQL, handle encodings, parse dates |
| 2. Profiling | Understand what you have | Shape, dtypes, nulls, distributions, cardinality |
| 3. Cleaning | Fix structural problems | Dedup, type coercion, standardize categories, fix encodings |
| 4. Missing data | Handle gaps | Detect patterns (MCAR/MAR/MNAR), impute or flag |
| 5. Transformation | Derive analysis-ready features | Normalize, bin, log-transform, create indicators |
| 6. Reshaping | Match the analysis structure | Melt, pivot, tidy form, denormalize |
| 7. Integration | Combine sources | Joins (inner/left/right/full/cross), concatenation, dedup post-join |
| 8. Validation | Confirm readiness | Schema checks, assertion tests, row-count reconci