fine-tuning-with-trl

Solid

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

AI & Automation 175,435 stars 29875 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# TRL - Transformer Reinforcement Learning ## Quick start TRL provides post-training methods for aligning language models with human preferences. **Installation**: ```bash pip install trl transformers datasets peft accelerate ``` **Supervised Fine-Tuning** (instruction tuning): ```python from trl import SFTTrainer trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset, # Prompt-completion pairs ) trainer.train() ``` **DPO** (align with preferences): ```python from trl import DPOTrainer, DPOConfig config = DPOConfig(output_dir="model-dpo", beta=0.1) trainer = DPOTrainer( model=model, args=config, train_dataset=preference_dataset, # chosen/rejected pairs processing_class=tokenizer ) trainer.train() ``` ## Common workflows ### Workflow 1: Full RLHF pipeline (SFT → Reward Model → PPO) Complete pipeline from base model to human-aligned model. Copy this checklist: ``` RLHF Training: - [ ] Step 1: Supervised fine-tuning (SFT) - [ ] Step 2: Train reward model - [ ] Step 3: PPO reinforcement learning - [ ] Step 4: Evaluate aligned model ``` **Step 1: Supervised fine-tuning** Train base model on instruction-following data: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from trl import SFTTrainer, SFTConfig from datasets import load_dataset # Load model model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B") # Load instruction datase...

Details

Author
NousResearch
Repository
NousResearch/hermes-agent
Created
10 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

27,705 Updated today
davila7
AI & Automation Solid

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

9,182 Updated 1 months ago
Orchestra-Research
AI & Automation Featured

hugging-face-model-trainer

Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.

39,350 Updated today
sickn33
AI & Automation Listed

hugging-face-model-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

3 Updated today
tayyabexe
AI & Automation Solid

grpo-rl-training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

9,182 Updated 1 months ago
Orchestra-Research