biopython-molecular-biologylisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Biopython: Computational Molecular Biology Toolkit
## Overview
Biopython is the standard open-source Python library for computational molecular biology, providing modular APIs for sequence handling, biological file parsing, NCBI database access, BLAST searches, protein structure analysis, and phylogenetics. It supports Python 3 and requires NumPy.
## When to Use
- Parse and convert biological file formats (FASTA, GenBank, FASTQ, PDB, mmCIF, PHYLIP)
- Fetch sequences or publications from NCBI databases (GenBank, PubMed, Protein) programmatically
- Run and parse BLAST searches (remote NCBI or local BLAST+)
- Perform pairwise or multiple sequence alignments with custom scoring
- Analyze 3D protein structures — distances, angles, DSSP, superimposition
- Build and visualize phylogenetic trees from sequence alignments
- Calculate sequence statistics (GC content, molecular weight, melting temperature)
- Batch-process thousands of sequences with custom filtering logic
- Use `pysam` instead for reading SAM/BAM/CRAM alignment files and working with mapped reads; use `scikit-bio` instead for advanced ecological diversity metrics
## Prerequisites
- **Python packages**: `biopython`, `numpy`, `matplotlib` (for tree visualization)
- **Data requirements**: Sequence files (FASTA, GenBank, FASTQ) or accession IDs for NCBI access
- **Environment**: Python 3.8+; NCBI Entrez requires email registration
```bash
pip install biopython numpy matplotlib
```
## Quick Start
```python
from Bio