mixed-precisionlisted

Use FP16/BF16 mixed precision to accelerate training and reduce memory. Use when optimizing GPU performance.
thada2402/AutoResearchClaw · ★ 1 · AI & Automation · score 73

Install: claude install-skill thada2402/AutoResearchClaw

## Mixed Precision Training Best Practice Use torch.cuda.amp for automatic mixed precision: - Wrap forward pass in torch.cuda.amp.autocast() - Use GradScaler for loss scaling - BF16 preferred over FP16 on Ampere+ GPUs (RTX 3xxx, A100, RTX 4xxx) - Watch for NaN gradients — reduce learning rate if needed - Do NOT use amp with custom CUDA kernels unless tested