← ClaudeAtlas

ai-data-engineeringlisted

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).
ancoleman/ai-design-components · ★ 368 · AI & Automation · score 80
Install: claude install-skill ancoleman/ai-design-components
# AI Data Engineering ## Purpose Build data infrastructure for AI/ML systems including RAG pipelines, feature stores, and embedding generation. Provides architecture patterns, orchestration workflows, and evaluation metrics for production AI applications. ## When to Use **Use this skill when:** - Building RAG (Retrieval-Augmented Generation) pipelines - Implementing semantic search or vector databases - Setting up ML feature stores for real-time serving - Creating embedding generation pipelines - Evaluating RAG quality with RAGAS metrics - Orchestrating data workflows for AI systems - Integrating with frontend skills (ai-chat, search-filter) **Skip this skill if:** - Building traditional CRUD applications (use databases-relational) - Simple key-value storage (use databases-nosql) - No AI/ML components in the application ## RAG Pipeline Architecture RAG pipelines have 5 distinct stages. Understanding this architecture is critical for production implementations. ``` ┌─────────────────────────────────────────────────────────────┐ │ RAG Pipeline (5 Stages) │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 1. INGESTION → Load documents (PDF, DOCX, Markdown) │ │ 2. INDEXING → Chunk (512 tokens) + Embed + Store │ │ 3. RETRIEVAL → Query embedding + Vector search + Filters │ │ 4. GENERATION → Context injection + LLM streaming │