pytorch-fsdp2

Solid

Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Skill: Use PyTorch FSDP2 (`fully_shard`) correctly in a training script This skill teaches a coding agent how to **add PyTorch FSDP2** to a training loop with correct initialization, sharding, mixed precision/offload configuration, and checkpointing. > FSDP2 in PyTorch is exposed primarily via `torch.distributed.fsdp.fully_shard` and the `FSDPModule` methods it adds in-place to modules. See: `references/pytorch_fully_shard_api.md`, `references/pytorch_fsdp2_tutorial.md`. --- ## When to use this skill Use FSDP2 when: - Your model **doesn’t fit** on one GPU (parameters + gradients + optimizer state). - You want an eager-mode sharding approach that is **DTensor-based per-parameter sharding** (more inspectable, simpler sharded state dicts) than FSDP1. - You may later compose DP with **Tensor Parallel** using **DeviceMesh**. Avoid (or be careful) if: - You need strict backwards-compatible checkpoints across PyTorch versions (DCP warns against this). - You’re forced onto older PyTorch versions without the FSDP2 stack. ## Alternatives (when FSDP2 is not the best fit) - **DistributedDataParallel (DDP)**: Use the standard data-parallel wrapper when you want classic distributed data parallel training. - **FullyShardedDataParallel (FSDP1)**: Use the original FSDP wrapper for parameter sharding across data-parallel workers. Reference: `references/pytorch_ddp_notes.md`, `references/pytorch_fsdp1_api.md`. --- ## Contract the agent must follow 1. **Launch with `torchrun`** a...

Details

Author
Orchestra-Research
Repository
Orchestra-Research/AI-Research-SKILLs
Created
7 months ago
Last Updated
1 months ago
Language
TeX
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category