hugging-face-model-trainer

Solid

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

AI & Automation 3 stars 0 forks Updated yesterday Apache-2.0

Install

View on GitHub

Quality Score: 79/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# TRL Training on Hugging Face Jobs ## Overview Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub. **TRL provides multiple training methods:** - **SFT** (Supervised Fine-Tuning) - Standard instruction tuning - **DPO** (Direct Preference Optimization) - Alignment from preference data - **GRPO** (Group Relative Policy Optimization) - Online RL training - **Reward Modeling** - Train reward models for RLHF **For detailed TRL method documentation:** ```python hf_doc_search("your query", product="trl") hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO # etc. ``` **See also:** `references/training_methods.md` for method overviews and selection guidance ## When to Use This Skill Use this skill when users want to: - Fine-tune language models on cloud GPUs without local infrastructure - Train with TRL methods (SFT, DPO, GRPO, etc.) - Run training jobs on Hugging Face Jobs infrastructure - Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp) - Ensure trained models are permanently saved to the Hub - Use modern workflows with optimized defaults ### When to Use Unsloth Use **Unsloth** (`references/unsloth.md`) instead of standard TRL when: - **Limited GPU memory** - Unsloth uses ~60% less VR...

Details

Author: tayyabexe
Repository: tayyabexe/skills
Created: 8 months ago
Last Updated: yesterday
Language: Python
License: Apache-2.0

Integrates with

OpenAI · AI Hugging Face · AI Ollama · AI

Similar Skills

Semantically similar based on skill content — not just same category

DevOps & Infrastructure Solid

hugging-face-jobs

This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.

3 Updated yesterday

tayyabexe

AI & Automation Listed

huggingface

HuggingFace ecosystem for robotics projects: hub datasets and models for robot learning, and demo Spaces. DELEGATES: for hub mechanics (download/upload/auth/jobs), install HuggingFace's own skills — /plugin marketplace add huggingface/skills, then /plugin install hf-cli@huggingface-skills — and defer to them; this skill adds only the robotics-specific layer (which datasets and models matter for manipulation and navigation, robotics dataset conventions on the hub). Use when: HF hub operations inside a robotics project, 'huggingface dataset for robots', 'upload the policy to the hub', and the HF skills aren't installed yet. Pairs with lerobot and data.

0 Updated yesterday

robium-ai

AI & Automation Featured

ai-toolkit-trainer

Train custom LoRAs with ostris AI-Toolkit — covers WAN 2.2/2.1 (people, styles, video motion) and Z-Image (Turbo & Base, low-VRAM image LoRAs). Use when the user wants to train a WAN or Z-Image LoRA; covers local + RunPod setup, dataset prep, key params, and using the result in a ComfyUI workflow.

455 Updated today

artokun