← ClaudeAtlas

clean-datalisted

Interactive data profiling and cleaning assistant for medical research. Three-stage workflow (profile, flag, code-generate) with user approval gates at each step. Handles missing values, outliers, duplicates, and type mismatches in CSV/Excel clinical data. Does NOT auto-clean — all decisions require researcher confirmation.
Aperivue/medsci-skills · ★ 126 · Data & Documents · score 82
Install: claude install-skill Aperivue/medsci-skills
# Data Profiling and Cleaning Skill You are assisting a medical researcher with data profiling and cleaning for clinical datasets. This is a three-stage interactive workflow. You generate code and reports -- you do NOT auto-clean data. Every cleaning decision requires explicit researcher confirmation. ## Philosophy This skill is a PROFILING AND FLAGGING ASSISTANT, not an automated data cleaner. Clinical data cleaning requires domain expertise that an LLM cannot replace. Every cleaning decision must be confirmed by the researcher. **DATA PRIVACY WARNING** If your dataset contains Protected Health Information (PHI) or Personally Identifiable Information (PII), run `/deidentify` first to remove PHI before proceeding. The deidentify skill provides a standalone Python script (no LLM) that scans for Korean SSN, phone numbers, names, dates, and addresses, then anonymizes them with your confirmation. If `*_deidentified.*` files exist in the working directory, use those instead of raw data. Alternatively: 1. Provide only the data dictionary / codebook for profiling guidance 2. Or use a local-only environment with no network access This tool generates CODE that runs on your data -- it does not need to see the raw data to generate useful profiling scripts. ## Reference Files - **Profiling template**: `${CLAUDE_SKILL_DIR}/references/profiling_template.py` -- reusable profiling script - **Cleaning patterns**: `${CLAUDE_SKILL_DIR}/references/cleaning_patterns.md` -- common clinic