chunking-strategy

Solid

Provides chunking strategies for RAG systems. Generates chunk size recommendations (256-1024 tokens), overlap percentages (10-20%), and semantic boundary detection methods. Validates semantic coherence and evaluates retrieval precision/recall metrics. Use when building retrieval-augmented generation systems, vector databases, or processing large documents.

AI & Automation 263 stars 31 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%
81
Recency 20%
90
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Chunking Strategy for RAG Systems ## Overview Provides chunking strategies for RAG systems, vector databases, and document processing. Recommends chunk sizes, overlap percentages, and boundary detection methods; validates semantic coherence; evaluates retrieval metrics. ## When to Use Use when building or optimizing RAG systems, vector search pipelines, document chunking workflows, or performance-tuning existing systems with poor retrieval quality. ## Instructions ### Choose Chunking Strategy Select based on document type and use case: 1. **Fixed-Size Chunking** (Level 1) - Use for simple documents without clear structure - Start with 512 tokens and 10-20% overlap - Adjust: 256 for factoid queries, 1024 for analytical 2. **Recursive Character Chunking** (Level 2) - Use for documents with structural boundaries - Hierarchical separators: paragraphs → sentences → words - Customize for document types (HTML, Markdown, JSON) 3. **Structure-Aware Chunking** (Level 3) - Use for structured content (Markdown, code, tables, PDFs) - Preserve semantic units: functions, sections, table blocks - Validate structure preservation post-split 4. **Semantic Chunking** (Level 4) - Use for complex documents with thematic shifts - Embedding-based boundary detection with 0.8 similarity threshold - Buffer size: 3-5 sentences 5. **Advanced Methods** (Level 5) - Late Chunking for long-context models - Contextual Retrieval for high-precision require...

Details

Author
giuseppe-trisciuoglio
Repository
giuseppe-trisciuoglio/developer-kit
Created
7 months ago
Last Updated
1 weeks ago
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

rag-chunking-strategy

Document chunking with multiple strategies including semantic, recursive, and fixed-size chunking

1,160 Updated today
a5c-ai
AI & Automation Solid

rag-implementation

Retrieval-Augmented Generation patterns including chunking, embeddings, vector stores, and retrieval optimization Use when: rag, retrieval augmented, vector search, embeddings, semantic search.

27,705 Updated today
davila7
AI & Automation Listed

rag-architect

Design RAG pipelines with informed chunking, embedding, retrieval, and evaluation decisions. TRIGGER when: user asks about RAG pipeline design, chunking strategies, embedding models, vector databases, or retrieval-augmented generation. DO NOT TRIGGER when: user asks about fine-tuning, prompt engineering without retrieval, or general LLM usage.

1 Updated 1 weeks ago
DROOdotFOO
AI & Automation Solid

rag-architect

Designs and implements production-grade RAG systems by chunking documents, generating embeddings, configuring vector stores, building hybrid search pipelines, applying reranking, and evaluating retrieval quality. Use when building RAG systems, vector databases, or knowledge-grounded AI applications requiring semantic search, document retrieval, context augmentation, similarity search, or embedding-based indexing.

9,537 Updated 1 weeks ago
Jeffallan
AI & Automation Solid

rag-architect

Use when the user asks to design RAG pipelines, optimize retrieval strategies, choose embedding models, implement vector search, or build knowledge retrieval systems.

16,782 Updated 3 days ago
alirezarezvani