huggingface-accelerate

Featured

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

AI & Automation 27,705 stars 2858 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# HuggingFace Accelerate - Unified Distributed Training ## Quick start Accelerate simplifies distributed training to 4 lines of code. **Installation**: ```bash pip install accelerate ``` **Convert PyTorch script** (4 lines): ```python import torch + from accelerate import Accelerator + accelerator = Accelerator() model = torch.nn.Transformer() optimizer = torch.optim.Adam(model.parameters()) dataloader = torch.utils.data.DataLoader(dataset) + model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) for batch in dataloader: optimizer.zero_grad() loss = model(batch) - loss.backward() + accelerator.backward(loss) optimizer.step() ``` **Run** (single command): ```bash accelerate launch train.py ``` ## Common workflows ### Workflow 1: From single GPU to multi-GPU **Original script**: ```python # train.py import torch model = torch.nn.Linear(10, 2).to('cuda') optimizer = torch.optim.Adam(model.parameters()) dataloader = torch.utils.data.DataLoader(dataset, batch_size=32) for epoch in range(10): for batch in dataloader: batch = batch.to('cuda') optimizer.zero_grad() loss = model(batch).mean() loss.backward() optimizer.step() ``` **With Accelerate** (4 lines added): ```python # train.py import torch from accelerate import Accelerator # +1 accelerator = Accelerator() # +2 model = torch.nn.Linear(10, 2) optimizer = torch.optim.Adam(model.parameters()) dataloader = to...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

ray-train

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

9,182 Updated 1 months ago

Orchestra-Research

AI & Automation Solid

distributed-llm-pretraining-torchtitan

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

9,182 Updated 1 months ago

Orchestra-Research