stable-baselines3

Solid

Use this skill for reinforcement learning tasks including training RL agents (PPO, SAC, DQN, TD3, DDPG, A2C, etc.), creating custom Gym environments, implementing callbacks for monitoring and control, using vectorized environments for parallel training, and integrating with deep RL workflows. This skill should be used when users request RL algorithm implementation, agent training, environment design, or RL experimentation.

AI & Automation 27,705 stars 2858 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Stable Baselines3 ## Overview Stable Baselines3 (SB3) is a PyTorch-based library providing reliable implementations of reinforcement learning algorithms. This skill provides comprehensive guidance for training RL agents, creating custom environments, implementing callbacks, and optimizing training workflows using SB3's unified API. ## Core Capabilities ### 1. Training RL Agents **Basic Training Pattern:** ```python import gymnasium as gym from stable_baselines3 import PPO # Create environment env = gym.make("CartPole-v1") # Initialize agent model = PPO("MlpPolicy", env, verbose=1) # Train the agent model.learn(total_timesteps=10000) # Save the model model.save("ppo_cartpole") # Load the model (without prior instantiation) model = PPO.load("ppo_cartpole", env=env) ``` **Important Notes:** - `total_timesteps` is a lower bound; actual training may exceed this due to batch collection - Use `model.load()` as a static method, not on an existing instance - The replay buffer is NOT saved with the model to save space **Algorithm Selection:** Use `references/algorithms.md` for detailed algorithm characteristics and selection guidance. Quick reference: - **PPO/A2C**: General-purpose, supports all action space types, good for multiprocessing - **SAC/TD3**: Continuous control, off-policy, sample-efficient - **DQN**: Discrete actions, off-policy - **HER**: Goal-conditioned tasks See `scripts/train_rl_agent.py` for a complete training template with best practices. ### 2. Cu...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

stable-baselines3

Use this skill for reinforcement learning tasks including training RL agents (PPO, SAC, DQN, TD3, DDPG, A2C, etc.), creating custom Gym environments, implementing callbacks for monitoring and control, using vectorized environments for parallel training, and integrating with deep RL workflows. This skill should be used when users request RL algorithm implementation, agent training, environment design, or RL experimentation.

335 Updated today
aiskillstore
AI & Automation Solid

stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

26,817 Updated today
K-Dense-AI
AI & Automation Solid

stable-baselines3

Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.

2,210 Updated 1 weeks ago
foryourhealth111-pixel
AI & Automation Solid

reinforcement-learning-skill

RL training for robot control using simulation with sim-to-real transfer

1,160 Updated today
a5c-ai
AI & Automation Solid

pufferlib

High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.

26,817 Updated today
K-Dense-AI