chunking-strategy

Solid

Provides chunking strategies for RAG systems. Generates chunk size recommendations (256-1024 tokens), overlap percentages (10-20%), and semantic boundary detection methods. Validates semantic coherence and evaluates retrieval precision/recall metrics. Use when building retrieval-augmented generation systems, vector databases, or processing large documents.

AI & Automation 263 stars 31 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Chunking Strategy for RAG Systems ## Overview Provides chunking strategies for RAG systems, vector databases, and document processing. Recommends chunk sizes, overlap percentages, and boundary detection methods; validates semantic coherence; evaluates retrieval metrics. ## When to Use Use when building or optimizing RAG systems, vector search pipelines, document chunking workflows, or performance-tuning existing systems with poor retrieval quality. ## Instructions ### Choose Chunking Strategy Select based on document type and use case: 1. **Fixed-Size Chunking** (Level 1) - Use for simple documents without clear structure - Start with 512 tokens and 10-20% overlap - Adjust: 256 for factoid queries, 1024 for analytical 2. **Recursive Character Chunking** (Level 2) - Use for documents with structural boundaries - Hierarchical separators: paragraphs → sentences → words - Customize for document types (HTML, Markdown, JSON) 3. **Structure-Aware Chunking** (Level 3) - Use for structured content (Markdown, code, tables, PDFs) - Preserve semantic units: functions, sections, table blocks - Validate structure preservation post-split 4. **Semantic Chunking** (Level 4) - Use for complex documents with thematic shifts - Embedding-based boundary detection with 0.8 similarity threshold - Buffer size: 3-5 sentences 5. **Advanced Methods** (Level 5) - Late Chunking for long-context models - Contextual Retrieval for high-precision require...

Details

Author: giuseppe-trisciuoglio
Repository: giuseppe-trisciuoglio/developer-kit
Created: 7 months ago
Last Updated: 1 weeks ago
Language: Python
License: MIT

Integrates with

LangChain · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

rag-chunking-strategy

Document chunking with multiple strategies including semantic, recursive, and fixed-size chunking

1,160 Updated today

a5c-ai

AI & Automation Solid

rag-implementation

Retrieval-Augmented Generation patterns including chunking, embeddings, vector stores, and retrieval optimization Use when: rag, retrieval augmented, vector search, embeddings, semantic search.

27,705 Updated today

davila7

AI & Automation Listed

rag-architect

Design RAG pipelines with informed chunking, embedding, retrieval, and evaluation decisions. TRIGGER when: user asks about RAG pipeline design, chunking strategies, embedding models, vector databases, or retrieval-augmented generation. DO NOT TRIGGER when: user asks about fine-tuning, prompt engineering without retrieval, or general LLM usage.

1 Updated 1 weeks ago

DROOdotFOO

AI & Automation Solid

rag-architect

Designs and implements production-grade RAG systems by chunking documents, generating embeddings, configuring vector stores, building hybrid search pipelines, applying reranking, and evaluating retrieval quality. Use when building RAG systems, vector databases, or knowledge-grounded AI applications requiring semantic search, document retrieval, context augmentation, similarity search, or embedding-based indexing.

9,537 Updated 1 weeks ago

Jeffallan

AI & Automation Solid

rag-architect

Use when the user asks to design RAG pipelines, optimize retrieval strategies, choose embedding models, implement vector search, or build knowledge retrieval systems.

16,782 Updated 3 days ago

alirezarezvani