rtl-area-timinglisted
Install: claude install-skill Midstall/claude-for-hardware
# RTL Area and Timing Optimization
## Overview
Making RTL smaller or faster is a sequence of structural decisions, each justified by a measurement. The wins are rarely where intuition points: the giant is often a structure you didn't think of (a "ROM" that is really 94k flops), and the critical path is usually one specific primitive, not "logic depth" in general.
**Core principle:** Diagnose with data, change one structure, re-measure. Optimize the actual critical path or the actual giant, and stop the moment it stops being the bottleneck. Guessing wastes builds and can place worse.
## When to Use
- A design won't fit, or misses its timing constraint
- A wide multiply, barrel shifter, or big mux is suspected of dominating
- A microcoded or "compute all handlers and select" datapath is too large
- You're about to "optimize" something without having read the reports
This is the RTL-technique companion to `fpga-synthesis-fit` (the tool methodology for measuring). Measure there, transform here.
## Pipeline A Wide Multiply Internally
A single-cycle NxN multiply (64x64) maps to DSP tiles plus a long partial-product carry chain, and that chain is usually the critical path.
Registering only the multiply's OUTPUT does not break the internal carry chain; the operands-to-output path is still essentially the whole multiply. You must pipeline INTERNALLY: decompose into smaller products (four 32x32), register the partial products, then sum the shifted partials in a second register