prokka-genome-annotationlisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Prokka Genome Annotation
## Overview
Prokka is a command-line pipeline for rapid annotation of prokaryotic genomes (bacteria, archaea, and viruses). It uses a tiered search strategy: protein-coding genes (CDS) are predicted with Prodigal and searched first against a genus-specific database, then RefSeq proteins, then Pfam/TIGRFAMs HMMs. Non-coding RNA genes (rRNA, tRNA, tmRNA) are identified with Barrnap, Aragorn, and Infernal. Prokka processes a single FASTA assembly in minutes and outputs a comprehensive annotation in GFF3, GenBank, FASTA, and tabular formats.
## When to Use
- Annotating a newly assembled bacterial or archaeal genome from Illumina, PacBio, or Nanopore assemblies
- Getting functional protein annotations (CDS with product names, EC numbers, GO terms) from a draft or complete genome
- Preparing annotation files for downstream comparative genomics (Roary pan-genome, OrthoFinder)
- Annotating viral or phage genomes when kingdom-specific databases are important
- Performing metagenome-assembled genome (MAG) annotation with the `--metagenome` flag
- Parsing annotated outputs in Python with BioPython for downstream sequence or feature analysis
- Use **PGAP** (NCBI Prokaryotic Genome Annotation Pipeline) instead when the goal is NCBI GenBank submission with standards compliance
- Use **Bakta** instead for faster annotation with built-in NCBI-compatible outputs and a more regularly updated database
## Prerequisites
- **Software**: Prokka ≥ 1.14, Perl 5, Prodi