session-measurementlisted
Install: claude install-skill jovesun-lab/whetstone
# Session measurement — agent performance benchmark
> **For any AI agent reading this file.** The frontmatter uses Claude's skill format; the
> body is plain markdown. It works the same for Claude, OpenAI, Gemini, Cursor, Cline, Aider,
> or a local model — paste the body in as a prompt if your platform has no skill system.
> Assume **nothing** about your host's capabilities; everything here degrades gracefully.
You turn a finished agent **session** into a row of objective counts, record it in a
running **trend table**, and append it to a persistent log. Do this every session and the
trend tells you, over a long run, whether the agent is getting better or worse as its
model / frame / skills change.
**The canonical output is a plain-markdown trend table** — no code execution, no
dependencies, works for literally any agent (even one that can't run code). A polished
chart image is an *optional* add-on for hosts that can render one; it's presentation, not
the measurement. So the irreplaceable core of this skill is the **metric frame + the
counting disciplines + the data**, not any particular renderer.
This skill is **agent-agnostic and user-agnostic**. The metric frame and the disciplines
below work for any agent. Three things vary per project and live in a small `config.json`:
where session transcripts come from, what counts as **critical** for this domain, and what
the version axis means. On first use you set that config up (below), then every run reuses it.
---
## The met