alterlab-tiledbvcf

Solid

Store and query genomic variant data at scale with TileDB-VCF — ingest VCF/BCF into compressed TileDB arrays, add samples incrementally, run fast parallel region/sample queries, and export back to VCF. Use when managing population-genomics variant datasets that are too large for flat VCF, building joint variant stores, or querying thousands of samples by region. Part of the AlterLab Academic Skills suite.

AI & Automation 27 stars 4 forks Updated today MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
48
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# TileDB-VCF ## Overview TileDB-VCF is a high-performance C++ library with Python and CLI interfaces for efficient storage and retrieval of genomic variant-call data. Built on TileDB's sparse array technology, it enables scalable ingestion of VCF/BCF files, incremental sample addition without expensive merging operations, and efficient parallel queries of variant data stored locally or in the cloud. ## When to Use This Skill This skill should be used when: - Building a queryable, compressed variant store from many single-sample VCF/BCF files (cohort/population datasets too large for flat VCF) - Incrementally adding new samples to an existing store without re-merging - Querying specific genomic regions across many samples (region/sample-partitioned reads) - Exporting region/sample subsets back to VCF/BCF for downstream tools - Working with variant data on cloud storage (S3, Azure, GCS) or TileDB Cloud - Prototyping or teaching scalable genomics-variant workflows ## Quick Start ### Installation **Preferred method: conda/mamba from the `tiledb` channel.** `tiledbvcf-py` is NOT on PyPI, conda-forge, or bioconda — it ships from the `tiledb` Anaconda channel, with native `osx-arm64` builds (no Rosetta/`CONDA_SUBDIR` workaround needed on Apple Silicon). Supports Python 3.9–3.12. ```bash # Native Apple Silicon (osx-arm64) — also works on osx-64 / linux-64 conda create -n tiledb-vcf -c conda-forge -c tiledb \ python=3.12 tiledbvcf-py=0.40 pandas pyarrow numpy conda activate t...

Details

Author
AlterLab-IEU
Repository
AlterLab-IEU/AlterLab-Academic-Skills
Created
2 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category