dataset-transformation
SolidGenerates a Jupyter notebook that transforms datasets between ML schemas for model training or evaluation. Use when the user says "transform", "convert", "reformat", "change the format", or when a dataset's schema needs to change to match the target format — always use this skill for format changes rather than writing inline transformation code. Supports OpenAI chat, SageMaker SFT/DPO/RLVR, HuggingFace preference, Bedrock Nova, VERL, and custom JSONL formats from local files or S3.
Install
Quality Score: 95/100
Skill Content
Details
- Author
- awslabs
- Repository
- awslabs/agent-plugins
- Created
- 3 months ago
- Last Updated
- 2 days ago
- Language
- Shell
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
transforming-data
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
dataset-curator
Use this skill when designing, cleaning, deduplicating, or documenting datasets for model training and evaluation including schema design, class imbalance handling, and train/val/test splits. Not for running model training or hyperparameter tuning. Not for real-time data pipeline engineering.
dataset-evaluation
Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.
datarobot-data-preparation
Tools and guidance for data upload, dataset management, data validation, and preparing data for DataRobot projects. Use when uploading datasets, managing data, or validating data for DataRobot.
dataset-loader-creator
Create dataset loader creator operations. Auto-activating skill for ML Training. Triggers on: dataset loader creator, dataset loader creator Part of the ML Training skill category. Use when working with dataset loader creator functionality. Trigger with phrases like "dataset loader creator", "dataset creator", "dataset".