verl-rl-training

Featured

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# verl: Volcano Engine Reinforcement Learning for LLMs verl is a flexible, efficient, and production-ready RL training library for large language models from ByteDance's Seed team. It implements the HybridFlow framework (EuroSys 2025) and powers models like Doubao-1.5-pro achieving O1-level performance on math benchmarks. ## When to Use verl **Choose verl when you need:** - Production-ready RL training at scale (tested up to 671B parameters) - Flexibility to swap backends (FSDP ↔ Megatron-LM ↔ vLLM ↔ SGLang) - Support for multiple RL algorithms (PPO, GRPO, RLOO, REINFORCE++, DAPO) - Multi-turn rollout with tool calling for agentic workflows - Vision-language model RL training **Consider alternatives when:** - You need Megatron-native training → use **slime** or **miles** - You want PyTorch-native abstractions with Monarch → use **torchforge** - You only need simple SFT/DPO → use **TRL** or **Axolotl** ## Key Features - **Training backends**: FSDP, FSDP2, Megatron-LM - **Rollout engines**: vLLM, SGLang, HuggingFace Transformers - **Algorithms**: PPO, GRPO, DAPO, RLOO, ReMax, REINFORCE++, SPIN, SPPO - **Models**: Qwen-3, Llama-3.1, DeepSeek, Gemma-2 (0.5B to 671B) - **Advanced**: LoRA RL, sequence parallelism, expert parallelism, multi-turn tools ## Installation ```bash # Option 1: pip install pip install verl[vllm] # or verl[sglang] for SGLang backend # Option 2: Docker (recommended for production) docker pull verlai/verl:vllm011.latest # Option 3: From source git c...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category