datachain-knowledge

Solid

Use whenever datasets, cloud storage buckets, or data pipelines are mentioned — creating, saving, querying, listing, exploring, deleting, or processing data in S3, GCS, Azure Blob, or local storage. Also use when running any script that may create datasets as a side effect. Maintains a knowledge base at dc-knowledge/ (JSON + markdown). ALWAYS use this skill when the user creates a dataset, saves pipeline output, runs a data script, or references any storage bucket.

AI & Automation 2,781 stars 145 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

Maintain a knowledge base at `dc-knowledge/`. `.md` files are the persistent output. `.json` files are intermediate (generated in Step 3, consumed in Step 4, then deleted). `CAST.md` (sibling to this file) is the canonical methodology — the four layers, naming + tagging, layer-ladder planning, calibration, dialogue template, reuse rules, methodology transmission. Mode B reads it in full as a precondition. When something methodology-related needs to change, change `CAST.md`, not this file. ## Critical Rules `CAST.md` §6 owns the CAST-doctrine rules (follow CAST, never bypass DataChain, C/A/S substrate mandatory, one script per stage, one `.save()` per script). The rules below are operational additions unique to this skill. 1. **Path is `dc-knowledge/`** — NOT `.datachain/`. The `.datachain/` directory is the internal database; the knowledge base lives at `dc-knowledge/`. 2. **Never pass `update=True`** to `dc.read_storage()` in Task or exploration code unless the user explicitly asks to refresh the listing. L1/L2/L3 build scripts are the exception (`CAST.md` §5). 3. **Prefer DataChain operations** over plain Python for all metadata analysis. 4. **Bounded output** — JSON and markdown files stay small regardless of data size. 5. **Stop on auth/connection errors** — `bucket_scan.py` runs a fast access check. If it exits with an error JSON on stderr, **stop immediately** and show the error to the user. Do not retry with different regions, profiles, or endpoints — ask for the m...

Details

Author
datachain-ai
Repository
datachain-ai/datachain
Created
1 years ago
Last Updated
today
Language
Python
License
Apache-2.0

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category