book-sft-pipelinelisted
Install: claude install-skill Kalyanikhandare29/Agent-Skills-for-Context-Engineering
# Book SFT Pipeline
A complete system for converting books into SFT datasets and training style-transfer models. This skill teaches the pipeline from raw ePub to a model that writes in any author's voice.
## When to Activate
Activate this skill when:
- Building fine-tuning datasets from literary works
- Creating author-voice or style-transfer models
- Preparing training data for Tinker or similar SFT platforms
- Designing text segmentation pipelines for long-form content
- Training small models (8B or less) on limited data
## Core Concepts
### The Three Pillars of Book SFT
**1. Intelligent Segmentation**
Text chunks must be semantically coherent. Breaking mid-sentence teaches the model to produce fragmented output. Target: 150-400 words per chunk, always at natural boundaries.
**2. Diverse Instruction Generation**
Use multiple prompt templates and system prompts to prevent overfitting. A single prompt style leads to memorization. Use 15+ prompt templates with 5+ system prompts.
**3. Style Over Content**
The goal is learning the author's rhythm and vocabulary patterns, not memorizing plots. Synthetic instructions describe what happens without quoting the text.
## Pipeline Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ Coordinates pipeline phases, manages state, handles failures │
└──────────────────────┬──────────────────────────────────────────┘