dvc-dataset-versioning
SolidDataset versioning skill using DVC for tracking data changes, managing data pipelines, and ensuring reproducibility.
Install
Quality Score: 96/100
Skill Content
Details
- Author
- a5c-ai
- Repository
- a5c-ai/babysitter
- Created
- 4 months ago
- Last Updated
- today
- Language
- JavaScript
- License
- MIT
Similar Skills
Semantically similar based on skill content — not just same category
version-dataset
Dataset version control for research reproducibility. Builds a deterministic content-hash manifest of a dataset (file SHA-256 + tabular schema + per-column value hashes), verifies a later copy against it to detect drift (schema change, row-count change, value changes), and diffs two manifests. Use to prove an analysis ran on the intended data, lock a dataset version, or reproducibility-lock bundled demos.
data-versioning-manager
Skill for managing data versions and provenance
dataset-curator
Use this skill when designing, cleaning, deduplicating, or documenting datasets for model training and evaluation including schema design, class imbalance handling, and train/val/test splits. Not for running model training or hyperparameter tuning. Not for real-time data pipeline engineering.
dataset-transformation
Generates a Jupyter notebook that transforms datasets between ML schemas for model training or evaluation. Use when the user says "transform", "convert", "reformat", "change the format", or when a dataset's schema needs to change to match the target format — always use this skill for format changes rather than writing inline transformation code. Supports OpenAI chat, SageMaker SFT/DPO/RLVR, HuggingFace preference, Bedrock Nova, VERL, and custom JSONL formats from local files or S3.
hugging-face-datasets
Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.