nlp-pretraininglisted

Best practices for language model pretraining and fine-tuning. Use when generating or reviewing NLP training code.
thada2402/AutoResearchClaw · ★ 1 · AI & Automation · score 75

Install: claude install-skill thada2402/AutoResearchClaw

## NLP Pretraining/Fine-tuning Best Practice Fine-tuning recipe: - Use pre-trained checkpoints (HuggingFace hub) - AdamW optimizer, lr=2e-5 to 5e-5 - Linear warmup (6% of total steps) + linear decay - Batch size: 16-32 (use gradient accumulation for larger effective batch) - 3-5 epochs for classification, 1-2 for generation - Weight decay: 0.01 Parameter-efficient methods: - LoRA: r=8-64, alpha=16-128, apply to q/v projections - Prefix tuning: 10-20 prefix tokens - Adapters: bottleneck dimension 64-256 Evaluation: - Classification: accuracy, F1 (macro for imbalanced) - Generation: perplexity, BLEU/ROUGE, human evaluation - Use multiple seeds and report mean +/- std