long-context

Solid

Extend context windows of transformer models using RoPE, YaRN, ALiBi, and position interpolation techniques. Use when processing long documents (32k-128k+ tokens), extending pre-trained models beyond original context limits, or implementing efficient positional encodings. Covers rotary embeddings, attention biases, interpolation methods, and extrapolation strategies for LLMs.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Long Context: Extending Transformer Context Windows ## When to Use This Skill Use Long Context techniques when you need to: - **Process long documents** (32k, 64k, 128k+ tokens) with transformer models - **Extend context windows** of pre-trained models (LLaMA, Mistral, etc.) - **Implement efficient positional encodings** (RoPE, ALiBi) - **Train models** with length extrapolation capabilities - **Deploy models** that handle variable-length inputs efficiently - **Fine-tune** existing models for longer contexts with minimal compute **Key Techniques**: RoPE (Rotary Position Embeddings), YaRN, ALiBi (Attention with Linear Biases), Position Interpolation **Papers**: RoFormer (arXiv 2104.09864), YaRN (arXiv 2309.00071), ALiBi (arXiv 2108.12409), Position Interpolation (arXiv 2306.15595) ## Installation ```bash # HuggingFace Transformers (includes RoPE, YaRN support) pip install transformers torch # For custom implementations pip install einops # Tensor operations pip install rotary-embedding-torch # Standalone RoPE # Optional: FlashAttention for efficiency pip install flash-attn --no-build-isolation ``` ## Quick Start ### RoPE (Rotary Position Embeddings) ```python import torch import torch.nn as nn class RotaryEmbedding(nn.Module): """Rotary Position Embeddings (RoPE).""" def __init__(self, dim, max_seq_len=8192, base=10000): super().__init__() # Compute inverse frequencies inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() /...

Details

Author
Orchestra-Research
Repository
Orchestra-Research/AI-Research-SKILLs
Created
7 months ago
Last Updated
1 months ago
Language
TeX
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category