train-fasttext

Solid

This skill provides guidance for training FastText text classification models with constraints on accuracy and model size. It should be used when training fastText supervised models, optimizing model size while maintaining accuracy thresholds, or when hyperparameter tuning for text classification tasks.

AI & Automation 364 stars 68 forks Updated today MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%
85
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
80
License 10%
100
Description 5%
100

Skill Content

# Train FastText ## Overview This skill provides structured approaches for training FastText supervised text classification models, particularly when balancing competing constraints like accuracy thresholds and model size limits. It covers hyperparameter tuning strategies, size optimization techniques, and common pitfalls to avoid. ## Pre-Training Analysis Before starting any training, perform these critical assessments: ### 1. Estimate Training Time Calculate approximate training time based on: - Dataset size (number of samples and vocabulary) - Target epochs - Model complexity (dimension, wordNgrams) **Rule of thumb**: For large datasets (>100k samples), expect 5-20+ minutes per training run. Plan iteration budget accordingly. ### 2. Understand Size Drivers FastText model size is primarily determined by: ``` Size ≈ (vocabulary_size × dimension) + (bucket × dimension) ``` Key parameters affecting size: - `dim`: Vector dimension (default 100) - `bucket`: Number of hash buckets (default 2000000) - `minCount`: Minimum word frequency to include (default 1) - `minn/maxn`: Character n-gram range (default 0/0 for supervised) ### 3. Identify Target Tradeoffs Before training, establish: - Hard constraints (e.g., max file size, minimum accuracy) - Soft preferences (e.g., prefer smaller model if accuracy is similar) - Whether quantization will be used ## Training Strategy ### Phase 1: Quick Exploration (Use Data Subset) To avoid wasting time on full dataset training duri...

Details

Author
majiayu000
Repository
majiayu000/claude-skill-registry
Created
5 months ago
Last Updated
today
Language
HTML
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

training-machine-learning-models

Build train machine learning models with automated workflows. Analyzes datasets, selects model types (classification, regression), configures parameters, trains with cross-validation, and saves model artifacts. Use when asked to "train model" or "evalua... Trigger with relevant phrases based on skill purpose.

2,210 Updated 1 weeks ago
foryourhealth111-pixel
AI & Automation Solid

training-machine-learning-models

This skill trains machine learning models using automated workflows. It analyzes datasets, selects appropriate model types (classification, regression, etc.), configures training parameters, trains the model with cross-validation, generates performance metrics, and saves the trained model artifact. Use this skill when the user requests to "train" a model, needs to evaluate a dataset for machine learning purposes, or wants to optimize model performance. The skill supports common frameworks like scikit-learn.

2,274 Updated today
jeremylongshore
AI & Automation Solid

topic-modeling-text-mining

Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning

1,160 Updated today
a5c-ai
AI & Automation Solid

optimizing-deep-learning-models

This skill optimizes deep learning models using various techniques. It is triggered when the user requests improvements to model performance, such as increasing accuracy, reducing training time, or minimizing resource consumption. The skill leverages advanced optimization algorithms like Adam, SGD, and learning rate scheduling. It analyzes the existing model architecture, training data, and performance metrics to identify areas for enhancement. The skill then automatically applies appropriate optimization strategies and generates optimized code. Use this skill when the user mentions "optimize deep learning model", "improve model accuracy", "reduce training time", or "optimize learning rate".

2,274 Updated today
jeremylongshore
AI & Automation Solid

finetuning

Generates a Jupyter notebook that fine-tunes a base model using SageMaker serverless training jobs. Use when the user says "start training", "fine-tune my model", "I'm ready to train", or when the plan reaches the finetuning step. Supports SFT, DPO, and RLVR trainers, including RLVR Lambda reward function creation.

765 Updated 2 days ago
awslabs