bio-fasta-database-curatorlisted
Install: claude install-skill fmschulz/omics-skills
# FASTA Database Curator
## Overview
Automate the curation and standardization of biological sequence databases. This skill handles the tedious work of processing FASTA/FAA files, ensuring consistent header formats, removing duplicates, and preparing databases for downstream analysis.
Supplementary version-grounded tool notes: [tools.md](tools.md).
**Key Capabilities:**
- Header format standardization (pipe separators, prefixes)
- Duplicate detection and removal (by sequence or ID)
- Format conversion (GenBank → FASTA, multi-line → single-line)
- Database merging with conflict resolution
- Statistics generation (counts, lengths, taxonomy, GC content)
- Validation (no whitespace in headers, proper formatting)
- Taxonomy label extraction and standardization
## When to Use This Skill
Use this skill when:
- User needs to standardize sequence headers
- User wants to merge multiple FASTA files
- User needs to remove duplicate sequences
- User is preparing a database for HMM/BLAST/MMseqs2
- User wants database statistics and quality metrics
- User needs to convert between sequence formats
## Header Format Standards
### Recommended Format
Use pipe-separated fields with consistent prefixes:
```
>PREFIX|ACCESSION|DESCRIPTION
SEQUENCE...
```
**Examples:**
```
>VP|Mavirus_MCP|Major capsid protein [Virophage]
>PLV|NC_021333_1|Polinton-like virus hypothetical protein
>NCLDV|YP_009173877.1|DNA polymerase [Marseilleviridae]
```
### Common Transformations
```python
# Remove white