← ClaudeAtlas

generate-codebooklisted

Generate a citable data dictionary / codebook from a tabular dataset (CSV/TSV/Excel/Parquet/Stata/SAS). Profiles every variable — role, type, units placeholder, level frequencies, range/quantiles, missingness — and emits codebook.md + codebook.json. Flags coded variables whose level meanings are unknown as [NEEDS DICTIONARY] rather than guessing them, feeding /define-variables and the dictionary-first workflow.
Aperivue/medsci-skills · ★ 126 · Data & Documents · score 82
Install: claude install-skill Aperivue/medsci-skills
# Generate Codebook Skill You help a medical researcher turn a raw tabular dataset into a structured, **citable** data dictionary (codebook). This is the *generator* side of the dictionary-first workflow: it produces the artifact that `/define-variables` and dictionary-first QC later consume. You generate code and review output — you do **not** invent the meaning of coded values. ## Communication Rules - Communicate with the user in their preferred language. - Variable names, codebook fields, and report output are in English. - Medical terminology is always in English. ## Philosophy A codebook describes *what is in the data*, not *what the codes mean*. Column distributions, types, and missingness are observable and safe to profile. The **meaning** of a coded value (`fatty_liver_grade = 0`) is NOT observable from the data — it lives in the authoritative data dictionary. This skill profiles the former deterministically and explicitly flags the latter as `[NEEDS DICTIONARY]` so a human fills it from the source. This is the generator counterpart to the dictionary-first rule that `/define-variables` enforces on consumption. ## Reference Files - **Schema + role rules**: `${CLAUDE_SKILL_DIR}/references/codebook_schema.md` — the codebook.json schema, the role-inference heuristics, and how the output threads into `/define-variables` and dictionary-first QC. Read this before interpreting output. ## Deterministic Script Run the bundled profiler rather than describing column