missing-data-imputationlisted
Install: claude install-skill RAFCERAY/claude-skills-data-tasks
# Missing Data Imputation
A completeness-remediation skill. Takes a dataset with missing values and produces a **fitted, pickled scikit-learn imputer** that can be reused on new data, plus a **machine-readable JSON report** documenting every imputation decision. Aligned to the DAMA-DMBOK2 **Completeness** dimension.
## When to use this skill
Activate when the user has a dataset with missing values and wants to **fill them in a principled, reproducible way** — not just a one-off `fillna`. Typical signals:
- "Impute the missing values in this dataset"
- "Fill the NaNs so I can train a model"
- "Use KNN / iterative / MICE imputation on the numeric columns"
- "Remplis les valeurs manquantes de ce dataset"
- "Gère les données manquantes avant le feature engineering"
**Pre-conditions:**
- The dataset is loaded and tabular (CSV, Excel, Parquet).
- The user has decided to impute rather than drop. If missingness is very high (>50% on many columns), warn that imputation may inject noise and suggest dropping those columns instead — but let the user decide.
**Do NOT activate this skill for:**
- Auditing quality without fixing it → use `data-quality-report`
- General exploration → use `eda-explorer`
- Time-series gap filling that needs forward/backward fill or interpolation along the time axis → use `time-series-features` (lag/rolling logic); this skill is for cross-sectional imputation.
**Position in the pipeline:** this skill typically runs **after** `eda-explorer` / `data-qualit