← ClaudeAtlas

genimllisted

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.
aiskillstore/marketplace · ★ 350 · AI & Automation · score 80
Install: claude install-skill aiskillstore/marketplace
# Geniml: Genomic Interval Machine Learning ## Overview Geniml is a Python package for building machine learning models on genomic interval data from BED files. It provides unsupervised methods for learning embeddings of genomic regions, single cells, and metadata labels, enabling similarity searches, clustering, and downstream ML tasks. ## Installation Install geniml using uv: ```bash uv uv pip install geniml ``` For ML dependencies (PyTorch, etc.): ```bash uv uv pip install 'geniml[ml]' ``` Development version from GitHub: ```bash uv uv pip install git+https://github.com/databio/geniml.git ``` ## Core Capabilities Geniml provides five primary capabilities, each detailed in dedicated reference files: ### 1. Region2Vec: Genomic Region Embeddings Train unsupervised embeddings of genomic regions using word2vec-style learning. **Use for:** Dimensionality reduction of BED files, region similarity analysis, feature vectors for downstream ML. **Workflow:** 1. Tokenize BED files using a universe reference 2. Train Region2Vec model on tokens 3. Generate embeddings for regions **Reference:** See `references/region2vec.md` for detailed workflow, parameters, and examples. ### 2. BEDspace: Joint Region and Metadata Embeddings Train shared embeddings for region sets and metadata labels using StarSpace. **Use for:** Metadata-aware searches, cross-modal queries (region→label or label→region), joint analysis of genomic content and experimental conditions. **Workflow:** 1.