← ClaudeAtlas

geo-databaselisted

NCBI GEO access via GEOparse and E-utilities. Search by keyword/organism/platform, download GSE series matrices, parse GPL annotations, extract GSM metadata, load expression matrices into pandas. For single-cell use cellxgene-census; for multi-DB access use gget-genomic-databases.
jaechang-hits/SciAgent-Skills · ★ 183 · Data & Documents · score 81
Install: claude install-skill jaechang-hits/SciAgent-Skills
# GEO Gene Expression Omnibus Database ## Overview GEO (Gene Expression Omnibus) is NCBI's public repository for high-throughput functional genomics data, containing 200,000+ datasets (series) from microarrays, RNA-seq, ChIP-seq, methylation, and proteomics experiments. GEOparse provides a Python interface for downloading and parsing GEO records (GSE series, GPL platforms, GSM samples) while NCBI E-utilities enables programmatic search across GEO's metadata. ## When to Use - Searching for publicly available gene expression datasets by organism, tissue, disease, or experimental condition - Downloading and parsing a specific GEO series (GSE) with its expression matrix and sample metadata - Extracting sample annotation tables (e.g., treatment groups, clinical covariates) for meta-analysis - Loading microarray expression data (GPL platform-annotated probes) into a tidy DataFrame - Retrieving all GEO experiments associated with a gene or pathway of interest - Building automated pipelines that download and process GEO datasets for downstream analysis - For single-cell RNA-seq data at scale, use `cellxgene-census`; for aligned reads, download FASTQ from ENA/SRA instead ## Prerequisites - **Python packages**: `GEOparse`, `requests`, `pandas` - **Data requirements**: GSE/GPL/GSM accession numbers, or search terms - **Environment**: internet connection; write access to local directory for downloads - **Rate limits**: E-utilities: 3 req/s unauthenticated, 10 req/s with API key; GE