mikestangdevs
UserSkills that make an agent treat code like something you'll have to maintain. Straight, opinionated, small.
Categories
Indexed Skills (17)
boyscout
Use when touching an existing file to make a change, so you leave it slightly cleaner than you found it without derailing into a giant refactor. Identifies small, safe, in-scope improvements adjacent to the change you're already making — a better name, a deleted dead line, an extracted helper, a clarified condition — and keeps them separate from the behavior change so review stays clean. Use on every non-trivial edit to an existing file; the discipline is keeping cleanup small and bounded, not skipping it.
delete-this
Use when a codebase feels heavy, when you suspect dead code, or before adding a feature to an area that's already cluttered. Finds and safely removes code that no longer earns its place — dead branches, unused exports, commented-out blocks, abandoned feature flags, speculative abstractions used once, and TODOs from a previous era. Proves code is unused before deleting and stages deletions so each is independently revertible. Use when the answer to 'why is this here?' is silence.
explain-back
Use before trusting, merging, or building on top of code you (or an agent) just wrote or are about to change. The agent explains the code back in plain language — what it does, why it exists, what breaks if it's wrong — and refuses to proceed until the mental model is confirmed. Use when a diff looks fine but nobody actually understands it, when reviewing AI-generated code, when onboarding to an unfamiliar file, or when you catch yourself about to approve something you can't explain.
loud-errors
Use when you spot a swallowed, vague, or context-losing error — an empty catch, a bare `except: pass`, `catch (e) { console.log(e) }` that then continues, `throw new Error("something went wrong")`, an error logged then ignored, or a caught exception that drops its cause. Makes failures loud and specific: each error says what operation failed, with which inputs, and what the caller should do — and stops being silently swallowed. Use when reviewing AI-generated code (agents swallow errors to make the happy path 'work'), when debugging a failure that left no useful trace, or before shipping a path that can fail.
name-things
Use when naming or renaming functions, variables, types, files, or modules — or when a code review keeps stumbling over confusing names. Forces names to reveal intent and role, kills generic names (data, handle, process, manager, util), and aligns vocabulary with the domain. Use when reading code feels harder than it should, when two things have nearly the same name, or before extracting a new function or type that needs a name.
seams
Use when you want to unit-test logic but can't without a database, network, clock, or a pile of mocks — or when a function fetches, computes, branches on policy, and writes all in one breath. Splits the decision (pure logic) from the action (I/O and side effects) along the natural seam, so the logic becomes testable with plain values and no mocks. This is functional-core / imperative-shell at the function level — about testability and purity, not about where files live. The proof is a unit test that needs zero mocks. Use when a small change ripples into unrelated behaviors, or when one function knows too much.
setup-craft-skills
Run once per repository to get the most out of the other craft skills (explain-back, name-things, delete-this, seams, loud-errors, boyscout, why-comments, fit-in). Scaffolds an optional shared CONTEXT.md — domain glossary, language/stack, conventions, and untouchables — that those skills read to match your project's vocabulary and standards. The craft skills work without it; CONTEXT.md just makes their output fit your codebase instead of generic defaults. Use when first installing the craft-skills plugin in a new codebase, or when the project's domain language or conventions have drifted and need re-capturing.
fit-in
Use before writing new code into an existing codebase — a new function, file, test, or feature in a repo that already has its own idioms. Reads the surrounding code first and matches its patterns (error style, test layout, naming casing, import conventions, file structure) instead of importing a generic house style. Use when AI-generated code is correct but 'doesn't look like ours,' when a codebase is becoming a patchwork of dialects, or when starting work in an unfamiliar repo where you'd otherwise default to your own habits.
why-comments
Use when a file is littered with comments that narrate what the code already says (`// increment the counter`), when load-bearing code has no explanation of why it exists, or when reviewing AI-generated code (agents pad output with narration and never record the why). Deletes comments that restate the code, keeps and sharpens the ones that capture a non-obvious why — the workaround, the spec quirk, the ordering constraint — and adds a why where surprising code has none. Use when you hit a line that makes you ask 'why on earth is this here?' and the comment above it just says what it does.
bank-the-suite
Use after completing any unit of work in a repo that has a real test suite — before starting the next task, compacting context, or declaring done. Runs the FULL suite (not just the tests near the change), in the background and parallelized when it's slow, and 'banks' a green state: honest pass/fail counts, pre-existing failures explicitly carved out by name, no moving on while red is unexplained. Use when the agent only ran the two tests it just wrote, when a long suite keeps getting skipped 'for speed', when failures are described vaguely ('a few unrelated tests fail'), or when work is about to stack on top of an unverified base.
checkpoint-handoff
Use before any context boundary — a context compaction, the end of a long session, a handoff to another agent or a future session — on multi-session work. Writes the resume state DOWN, outside the conversation: a working doc with what's done, what's in flight, the numbered remaining tasks in order, the standing constraints (the rules that must survive verbatim), and the files worth re-reading first. The test: a cold session with zero conversational memory could pick up the next task without asking a single question. Use when the user says they're about to compact, when context is getting long mid-project, when pausing a phased effort, or when work will continue 'tomorrow' (it never continues in the same window).
everywhere-else
Use immediately after fixing a bug or improving a pattern in one place — before reporting done. Asks the question agents never ask themselves: 'did you do that anywhere else?' Enumerates the full population of sibling sites (other templates, scenarios, endpoints, components, fixtures, configs that share the same pattern), audits every one for the same defect, and applies the fix across the population — or explicitly reports which siblings were checked and found clean. Use when a fix was just applied to one instance of a repeated structure, when a codebase has N parallel implementations of anything, or when a report says 'fixed' but only one file changed.
fix-it-now
Use the moment a real bug is found during any task — an audit, a review, a test wave, unrelated feature work — and the temptation appears to document it and move on. Defaults to fixing confirmed bugs in the current pass: a found bug is work surfaced, not work deferred. If a fix genuinely can't happen now (out of scope, needs a decision, blocked), the deferral must be explicit and accepted by the user — never a silent TODO buried in a report. Also fires on the half-kept feature: anything kept in the codebase gets finished properly or removed; there is no 'mostly works' tier. Use when an audit produced findings, when a report contains 'known issue', or when a bug was written down instead of fixed.
prove-every-number
Use whenever code contains a numeric constant, threshold, default, rate, capacity, tolerance, or coefficient — especially one the agent just introduced. Enforces the rule: every number is either derived (from real data or first principles, traceably) or cited (to a spec, datasheet, standard, or documented decision) — never invented, never tuned to make a test or demo pass. Real data is the truth; an assumption is the explicitly-labeled backup, used only when the data genuinely doesn't exist. Use when reviewing AI-generated code for magic numbers, when a value was 'picked to look reasonable', when a test was made green by adjusting a constant, or when building anything where the output's credibility depends on its inputs.
readiness-gate
Use at every 'are we ready?' moment — before declaring a task done, shipping, demoing, advancing to the next phase, or answering 'so everything works and we're good?'. Replaces vibes-based confidence ('should work now', 'looks good') with an explicit, evidence-backed go/no-go: what was verified and how, what was NOT verified, what changed, and the honest residual risk. Calibrates certainty to evidence — never asserts '100% sure' beyond what was actually tested, and never truncates the findings to fit a tidy summary. Use when reporting completion of any unit of work, when the user asks for a readiness check against a spec or checklist, or when you catch yourself about to say 'this should work'.
smell-the-numbers
Use whenever code produces an output someone will read — a report, a metric, a chart, a calculation result, a dashboard value — and especially when a result is surprising, suspiciously clean, or just changed a lot. Treats every implausible number as guilty until root-caused: all-zeros, -0, a value at 1858% of its physical limit, a metric that moved 40x from one run to the next, a flagship case that suddenly fails, an output that's perfectly smooth where reality is noisy. The bug is found and explained — never cosmetically hidden, clamped, or filtered out of the display. Use when reviewing simulation/analytics output, after a big refactor changes results, or when something 'works' but the numbers feel wrong.
test-like-you-mean-it
Use after building any feature, model, or fix that the team has to trust — before calling it done. Replaces smoke tests and tests-written-to-pass with an adversarial test wave: tests designed to BREAK the thing (invariants, conservation laws, edge regimes, reference cross-checks, hostile inputs), where findings are expected and get fixed, not explained away. Use when the agent (or you) just wrote tests alongside the code it's testing, when a suite passes suspiciously fast, when coverage exists but confidence doesn't, or when someone says 'tests pass' and you still wouldn't bet on it in production.
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.