geo-databaselisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# GEO Gene Expression Omnibus Database
## Overview
GEO (Gene Expression Omnibus) is NCBI's public repository for high-throughput functional genomics data, containing 200,000+ datasets (series) from microarrays, RNA-seq, ChIP-seq, methylation, and proteomics experiments. GEOparse provides a Python interface for downloading and parsing GEO records (GSE series, GPL platforms, GSM samples) while NCBI E-utilities enables programmatic search across GEO's metadata.
## When to Use
- Searching for publicly available gene expression datasets by organism, tissue, disease, or experimental condition
- Downloading and parsing a specific GEO series (GSE) with its expression matrix and sample metadata
- Extracting sample annotation tables (e.g., treatment groups, clinical covariates) for meta-analysis
- Loading microarray expression data (GPL platform-annotated probes) into a tidy DataFrame
- Retrieving all GEO experiments associated with a gene or pathway of interest
- Building automated pipelines that download and process GEO datasets for downstream analysis
- For single-cell RNA-seq data at scale, use `cellxgene-census`; for aligned reads, download FASTQ from ENA/SRA instead
## Prerequisites
- **Python packages**: `GEOparse`, `requests`, `pandas`
- **Data requirements**: GSE/GPL/GSM accession numbers, or search terms
- **Environment**: internet connection; write access to local directory for downloads
- **Rate limits**: E-utilities: 3 req/s unauthenticated, 10 req/s with API key; GE