data-wranglinglisted

Data cleaning, transformation, reshaping, joins, missing data handling, and tidy data principles. Covers the full pipeline from raw ingestion to analysis-ready datasets -- type coercion, deduplication, outlier detection, normalization, melting/pivoting, regex extraction, and reproducible transformation chains. Use when preparing, cleaning, or transforming data for analysis.
Tibsfox/gsd-skill-creator · ★ 61 · Data & Documents · score 80

Install: claude install-skill Tibsfox/gsd-skill-creator

# Data Wrangling Data wrangling is the work that sits between raw data and analysis -- the unglamorous, indispensable practice of making data trustworthy. Estimates vary, but practitioners consistently report that 60-80% of analysis time is spent wrangling. This skill covers the principles and techniques of data cleaning, transformation, reshaping, and integration, grounded in Hadley Wickham's tidy data framework and extended to the realities of messy real-world datasets. **Agent affinity:** tukey (EDA-driven cleaning), nightingale (routing wrangling tasks) **Concept IDs:** data-data-sources, data-data-quality, data-sampling-bias ## The Wrangling Pipeline | Stage | Goal | Key operations | |---|---|---| | 1. Ingestion | Get data into a working environment | Read CSV/JSON/Parquet/SQL, handle encodings, parse dates | | 2. Profiling | Understand what you have | Shape, dtypes, nulls, distributions, cardinality | | 3. Cleaning | Fix structural problems | Dedup, type coercion, standardize categories, fix encodings | | 4. Missing data | Handle gaps | Detect patterns (MCAR/MAR/MNAR), impute or flag | | 5. Transformation | Derive analysis-ready features | Normalize, bin, log-transform, create indicators | | 6. Reshaping | Match the analysis structure | Melt, pivot, tidy form, denormalize | | 7. Integration | Combine sources | Joins (inner/left/right/full/cross), concatenation, dedup post-join | | 8. Validation | Confirm readiness | Schema checks, assertion tests, row-count reconci