optimizing-attention-flash

Solid

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

100

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Flash Attention - Fast Memory-Efficient Attention ## Quick start Flash Attention provides 2-4x speedup and 10-20x memory reduction for transformer attention through IO-aware tiling and recomputation. **PyTorch native (easiest, PyTorch 2.2+)**: ```python import torch import torch.nn.functional as F q = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) # [batch, heads, seq, dim] k = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) v = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) # Automatically uses Flash Attention if available out = F.scaled_dot_product_attention(q, k, v) ``` **flash-attn library (more features)**: ```bash pip install flash-attn --no-build-isolation ``` ```python from flash_attn import flash_attn_func # q, k, v: [batch, seqlen, nheads, headdim] out = flash_attn_func(q, k, v, dropout_p=0.0, causal=True) ``` ## Common workflows ### Workflow 1: Enable in existing PyTorch model Copy this checklist: ``` Flash Attention Integration: - [ ] Step 1: Check PyTorch version (≥2.2) - [ ] Step 2: Enable Flash Attention backend - [ ] Step 3: Verify speedup with profiling - [ ] Step 4: Test accuracy matches baseline ``` **Step 1: Check PyTorch version** ```bash python -c "import torch; print(torch.__version__)" # Should be ≥2.2.0 ``` If <2.2, upgrade: ```bash pip install --upgrade torch ``` **Step 2: Enable Flash Attention backend** Replace standard attention: ```python # Before (standard attention) attn_weights...

Details

Author: Orchestra-Research
Repository: Orchestra-Research/AI-Research-SKILLs
Created: 7 months ago
Last Updated: 1 months ago
Language: TeX
License: MIT

Integrates with

Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured