openrlhf-training

Solid

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# OpenRLHF - High-Performance RLHF Training ## Quick start OpenRLHF is a Ray-based RLHF framework optimized for distributed training with vLLM inference acceleration. **Installation**: ```bash # Launch Docker container docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN \ -v $PWD:/openrlhf nvcr.io/nvidia/pytorch:25.02-py3 bash # Uninstall conflicts sudo pip uninstall xgboost transformer_engine flash_attn pynvml -y # Install OpenRLHF with vLLM pip install openrlhf[vllm] ``` **PPO Training** (Hybrid Engine): ```bash ray start --head --node-ip-address 0.0.0.0 --num-gpus 8 ray job submit --address="http://127.0.0.1:8265" \ --runtime-env-json='{"working_dir": "/openrlhf"}' \ -- python3 -m openrlhf.cli.train_ppo_ray \ --ref_num_nodes 1 --ref_num_gpus_per_node 8 \ --reward_num_nodes 1 --reward_num_gpus_per_node 8 \ --critic_num_nodes 1 --critic_num_gpus_per_node 8 \ --actor_num_nodes 1 --actor_num_gpus_per_node 8 \ --vllm_num_engines 4 --vllm_tensor_parallel_size 2 \ --colocate_all_models \ --vllm_gpu_memory_utilization 0.5 \ --pretrain OpenRLHF/Llama-3-8b-sft-mixture \ --reward_pretrain OpenRLHF/Llama-3-8b-rm-700k \ --save_path ./output/llama3-8b-rlhf \ --micro_train_batch_size 8 --train_batch_size 128 \ --micro_rollout_batch_size 16 --rollout_batch_size 1024 \ --max_epochs 1 --prompt_max_len 1024 --generate_max_len 1024 \ --zero_stage 3 --bf16 \ --actor_learning_rate 5e-7 --critic_learning_rate 9e-6 \ --init_kl_coe...

Details

Author
Orchestra-Research
Repository
Orchestra-Research/AI-Research-SKILLs
Created
7 months ago
Last Updated
1 months ago
Language
TeX
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category