data-formats

Solid

Working with diverse data formats: binary, text, structured, and custom

AI & Automation 859 stars 98 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
98
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Data Formats How to work with diverse and unknown data formats. ## Format Detection Always inspect before parsing: ```bash file <filename> # MIME type detection xxd <filename> | head -5 # hex dump (first bytes) head -3 <filename> # text preview python3 -c " with open('<filename>', 'rb') as f: h = f.read(16) print(h, h.hex()) " ``` ## Common Formats ### Binary - **Magic bytes**: Most binary formats start with a signature (ELF: `\x7fELF`, PNG: `\x89PNG`) - **Endianness**: Check if little-endian or big-endian (`struct.unpack('<I', ...)` vs `'>I'`) - **Alignment**: Fields are often aligned to 4 or 8 bytes - **Offsets**: Binary headers often contain offsets to other sections ### Structured text - **CSV/TSV**: Check delimiter (comma, tab, pipe), quoting, header row - **JSON**: `python3 -c "import json; json.load(open('f'))"` - **YAML**: Check indentation, anchors/aliases - **TOML**: `python3 -c "import tomllib; ..."` - **XML**: Check encoding declaration, namespaces ### Checkpoints / Model files - **PyTorch**: `.pt`, `.pth` → `torch.load(f, map_location='cpu')` - **TensorFlow**: `.ckpt` → index + data files, use `tf.train.load_checkpoint()` - **NumPy**: `.npy`, `.npz` → `numpy.load()` - **HuggingFace**: `config.json` + `model.safetensors` - **ONNX**: `onnx.load()` ### Database files - **SQLite**: `file` says "SQLite 3.x database" → `sqlite3 <file> ".tables"` - **WAL files**: SQLite write-ahead log — recover with `sqlit...

Details

Author
vstorm-co
Repository
vstorm-co/pydantic-deepagents
Created
6 months ago
Last Updated
yesterday
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

exploratory-data-analysis

Perform comprehensive exploratory data analysis on scientific data files across 200+ file formats. This skill should be used when analyzing any scientific data file to understand its structure, content, quality, and characteristics. Automatically detects file type and generates detailed markdown reports with format-specific analysis, quality metrics, and downstream analysis recommendations. Covers chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and general scientific data formats.

16 Updated 1 weeks ago
charlieviettq
AI & Automation Solid

exploratory-data-analysis

Perform comprehensive exploratory data analysis on scientific data files across 200+ file formats. This skill should be used when analyzing any scientific data file to understand its structure, content, quality, and characteristics. Automatically detects file type and generates detailed markdown reports with format-specific analysis, quality metrics, and downstream analysis recommendations. Covers chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and general scientific data formats.

2,279 Updated 3 weeks ago
foryourhealth111-pixel
Data & Documents Solid

exploratory-data-analysis

Perform comprehensive exploratory data analysis on scientific data files across 200+ file formats. This skill should be used when analyzing any scientific data file to understand its structure, content, quality, and characteristics. Automatically detects file type and generates detailed markdown reports with format-specific analysis, quality metrics, and downstream analysis recommendations. Covers chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and general scientific data formats.

27,984 Updated today
davila7