fine-tuning-with-trl

Featured

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

AI & Automation 27,705 stars 2858 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# TRL - Transformer Reinforcement Learning ## Quick start TRL provides post-training methods for aligning language models with human preferences. **Installation**: ```bash pip install trl transformers datasets peft accelerate ``` **Supervised Fine-Tuning** (instruction tuning): ```python from trl import SFTTrainer trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset, # Prompt-completion pairs ) trainer.train() ``` **DPO** (align with preferences): ```python from trl import DPOTrainer, DPOConfig config = DPOConfig(output_dir="model-dpo", beta=0.1) trainer = DPOTrainer( model=model, args=config, train_dataset=preference_dataset, # chosen/rejected pairs processing_class=tokenizer ) trainer.train() ``` ## Common workflows ### Workflow 1: Full RLHF pipeline (SFT → Reward Model → PPO) Complete pipeline from base model to human-aligned model. Copy this checklist: ``` RLHF Training: - [ ] Step 1: Supervised fine-tuning (SFT) - [ ] Step 2: Train reward model - [ ] Step 3: PPO reinforcement learning - [ ] Step 4: Evaluate aligned model ``` **Step 1: Supervised fine-tuning** Train base model on instruction-following data: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from trl import SFTTrainer, SFTConfig from datasets import load_dataset # Load model model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B") # Load instruction datase...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

fine-tuning-with-trl

9,182 Updated 1 months ago

Orchestra-Research

AI & Automation Solid

fine-tuning-with-trl

175,435 Updated today

NousResearch

AI & Automation Featured

hugging-face-model-trainer

Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.

39,350 Updated today

sickn33

AI & Automation Listed

hugging-face-model-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

3 Updated today

tayyabexe

AI & Automation Solid

grpo-rl-training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

9,182 Updated 1 months ago

Orchestra-Research