genimllisted
Install: claude install-skill aiskillstore/marketplace
# Geniml: Genomic Interval Machine Learning
## Overview
Geniml is a Python package for building machine learning models on genomic interval data from BED files. It provides unsupervised methods for learning embeddings of genomic regions, single cells, and metadata labels, enabling similarity searches, clustering, and downstream ML tasks.
## Installation
Install geniml using uv:
```bash
uv uv pip install geniml
```
For ML dependencies (PyTorch, etc.):
```bash
uv uv pip install 'geniml[ml]'
```
Development version from GitHub:
```bash
uv uv pip install git+https://github.com/databio/geniml.git
```
## Core Capabilities
Geniml provides five primary capabilities, each detailed in dedicated reference files:
### 1. Region2Vec: Genomic Region Embeddings
Train unsupervised embeddings of genomic regions using word2vec-style learning.
**Use for:** Dimensionality reduction of BED files, region similarity analysis, feature vectors for downstream ML.
**Workflow:**
1. Tokenize BED files using a universe reference
2. Train Region2Vec model on tokens
3. Generate embeddings for regions
**Reference:** See `references/region2vec.md` for detailed workflow, parameters, and examples.
### 2. BEDspace: Joint Region and Metadata Embeddings
Train shared embeddings for region sets and metadata labels using StarSpace.
**Use for:** Metadata-aware searches, cross-modal queries (region→label or label→region), joint analysis of genomic content and experimental conditions.
**Workflow:**
1.