rag-chunking-strategy

Solid

Document chunking with multiple strategies including semantic, recursive, and fixed-size chunking

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
64
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# RAG Chunking Strategy Skill ## Capabilities - Implement multiple document chunking strategies - Configure semantic chunking based on content boundaries - Set up recursive character text splitting - Design fixed-size chunking with overlap - Implement document-aware chunking (markdown, code, etc.) - Optimize chunk sizes for retrieval quality ## Target Processes - rag-pipeline-implementation - chunking-strategy-design ## Implementation Details ### Chunking Strategies 1. **RecursiveCharacterTextSplitter**: Hierarchical splitting with separators 2. **SemanticChunker**: Embedding-based semantic boundaries 3. **TokenTextSplitter**: Token-aware splitting 4. **MarkdownHeaderTextSplitter**: Structure-aware markdown splitting 5. **CodeSplitter**: Language-aware code chunking ### Configuration Options - Chunk size (characters or tokens) - Chunk overlap percentage - Separator hierarchy - Embedding model for semantic chunking - Document type detection ### Best Practices - Match chunk size to embedding model limits - Use appropriate overlap for context preservation - Test retrieval quality with different strategies - Consider document structure in strategy selection ### Dependencies - langchain-text-splitters - sentence-transformers (for semantic chunking)

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

chunking-strategy

Provides chunking strategies for RAG systems. Generates chunk size recommendations (256-1024 tokens), overlap percentages (10-20%), and semantic boundary detection methods. Validates semantic coherence and evaluates retrieval precision/recall metrics. Use when building retrieval-augmented generation systems, vector databases, or processing large documents.

263 Updated 1 weeks ago
giuseppe-trisciuoglio
AI & Automation Solid

rag-architect

Use when the user asks to design RAG pipelines, optimize retrieval strategies, choose embedding models, implement vector search, or build knowledge retrieval systems.

16,782 Updated 3 days ago
alirezarezvani
AI & Automation Solid

rag-architect

Designs and implements production-grade RAG systems by chunking documents, generating embeddings, configuring vector stores, building hybrid search pipelines, applying reranking, and evaluating retrieval quality. Use when building RAG systems, vector databases, or knowledge-grounded AI applications requiring semantic search, document retrieval, context augmentation, similarity search, or embedding-based indexing.

9,537 Updated 1 weeks ago
Jeffallan
AI & Automation Listed

rag-architect

Design RAG pipelines with informed chunking, embedding, retrieval, and evaluation decisions. TRIGGER when: user asks about RAG pipeline design, chunking strategies, embedding models, vector databases, or retrieval-augmented generation. DO NOT TRIGGER when: user asks about fine-tuning, prompt engineering without retrieval, or general LLM usage.

1 Updated 1 weeks ago
DROOdotFOO
AI & Automation Solid

rag-implementation

Retrieval-Augmented Generation patterns including chunking, embeddings, vector stores, and retrieval optimization Use when: rag, retrieval augmented, vector search, embeddings, semantic search.

27,705 Updated today
davila7