← ClaudeAtlas

generate-from-schemalisted

Generate synthetic datasets from a schema specification using the Rockfish SDK. Use when a user wants to create synthetic tabular or time-series data with specific structure — independent or derived columns, state machines, timeseries, entity relationships (including composite foreign keys), or realistic PII-like values (names, emails, addresses, SSNs) via NamedEntityProvider. Trigger on phrases like "generate synthetic data", "fake data from a schema", "create a test dataset", "GenerateFromDataSchema", or mentions of entity/foreign-key/state-machine data.
Rockfish-Data/tacklebox · ★ 0 · API & Backend · score 72
Install: claude install-skill Rockfish-Data/tacklebox
# Generate from schema Use `rockfish.actions.GenerateFromDataSchema` to produce synthetic datasets from a schema specification. ## When to use this skill Use when the user wants to generate synthetic tabular or time-series data with: - Specific column types (IDs, categoricals, numeric distributions). - Derived columns (computed from other columns — e.g. mapping or sampling). - Stateful behavior (state machines or timeseries). - Cross-entity relationships (foreign keys, including composite keys). - Realistic PII-like values (names, emails, addresses, SSNs). If the user wants to inject *scenarios* (spikes, outages, ramps, shifts) into an existing time-series dataset, use the `inject-scenarios` skill instead. ## Concept `rockfish.actions.GenerateFromDataSchema` takes a `DataSchema` and produces one synthetic dataset per `Entity`. A schema is a tree: ``` DataSchema ├── entities: list[Entity] │ ├── name, cardinality │ ├── columns: list[Column] │ │ ├── name, data_type │ │ ├── column_type (independent | derived | stateful | foreign_key) │ │ └── domain (id | categorical | uniform_dist | state_machine │ │ | timeseries | named_entity_provider | ...) │ └── (optional) timestamp ├── entity_relationships: list[EntityRelationship] └── (optional) global_timestamp ``` ## How to use 1. Construct a `DataSchema` matching the user's data requirements. Two equivalent forms: - **JSON dict** — convenient for simple cases, language-agnostic.