verl-rl-training

Featured

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# verl: Volcano Engine Reinforcement Learning for LLMs verl is a flexible, efficient, and production-ready RL training library for large language models from ByteDance's Seed team. It implements the HybridFlow framework (EuroSys 2025) and powers models like Doubao-1.5-pro achieving O1-level performance on math benchmarks. ## When to Use verl **Choose verl when you need:** - Production-ready RL training at scale (tested up to 671B parameters) - Flexibility to swap backends (FSDP ↔ Megatron-LM ↔ vLLM ↔ SGLang) - Support for multiple RL algorithms (PPO, GRPO, RLOO, REINFORCE++, DAPO) - Multi-turn rollout with tool calling for agentic workflows - Vision-language model RL training **Consider alternatives when:** - You need Megatron-native training → use **slime** or **miles** - You want PyTorch-native abstractions with Monarch → use **torchforge** - You only need simple SFT/DPO → use **TRL** or **Axolotl** ## Key Features - **Training backends**: FSDP, FSDP2, Megatron-LM - **Rollout engines**: vLLM, SGLang, HuggingFace Transformers - **Algorithms**: PPO, GRPO, DAPO, RLOO, ReMax, REINFORCE++, SPIN, SPPO - **Models**: Qwen-3, Llama-3.1, DeepSeek, Gemma-2 (0.5B to 671B) - **Advanced**: LoRA RL, sequence parallelism, expert parallelism, multi-turn tools ## Installation ```bash # Option 1: pip install pip install verl[vllm] # or verl[sglang] for SGLang backend # Option 2: Docker (recommended for production) docker pull verlai/verl:vllm011.latest # Option 3: From source git c...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

verl-rl-training

9,609 Updated 1 months ago

Orchestra-Research

AI & Automation Solid

slime-rl-training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

191,515 Updated today

NousResearch

AI & Automation Solid

slime-rl-training

9,609 Updated 1 months ago

Orchestra-Research