← ClaudeAtlas

book-sft-pipelinelisted

This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.
Kalyanikhandare29/Agent-Skills-for-Context-Engineering · ★ 3 · AI & Automation · score 79
Install: claude install-skill Kalyanikhandare29/Agent-Skills-for-Context-Engineering
# Book SFT Pipeline A complete system for converting books into SFT datasets and training style-transfer models. This skill teaches the pipeline from raw ePub to a model that writes in any author's voice. ## When to Activate Activate this skill when: - Building fine-tuning datasets from literary works - Creating author-voice or style-transfer models - Preparing training data for Tinker or similar SFT platforms - Designing text segmentation pipelines for long-form content - Training small models (8B or less) on limited data ## Core Concepts ### The Three Pillars of Book SFT **1. Intelligent Segmentation** Text chunks must be semantically coherent. Breaking mid-sentence teaches the model to produce fragmented output. Target: 150-400 words per chunk, always at natural boundaries. **2. Diverse Instruction Generation** Use multiple prompt templates and system prompts to prevent overfitting. A single prompt style leads to memorization. Use 15+ prompt templates with 5+ system prompts. **3. Style Over Content** The goal is learning the author's rhythm and vocabulary patterns, not memorizing plots. Synthetic instructions describe what happens without quoting the text. ## Pipeline Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ ORCHESTRATOR AGENT │ │ Coordinates pipeline phases, manages state, handles failures │ └──────────────────────┬──────────────────────────────────────────┘