alterlab-nf-core-sarek

Solid

Runs FASTQ-to-VCF germline and somatic variant calling via the Nextflow nf-core/sarek pipeline pinned to -r 3.8.1 — builds the samplesheet.csv (patient, sex, status, sample, lane, fastq_1, fastq_2), runs bwa-mem/bwa-mem2/dragmap alignment plus GATK4 MarkDuplicates and BQSR against the GATK GRCh38 resource bundle (dbSNP, Mills/1000G indels), and selects callers — explicitly correcting that sarek defaults to Strelka when --tools is unset (pass haplotypecaller for GATK best practice or deepvariant for CNN accuracy), with a non-Nextflow manual GATK4 fallback. Use when the user wants a variant-calling pipeline, FASTQ to VCF, germline or somatic SNV/indel calling, nf-core/sarek, GATK best-practices alignment-to-VCF, or BQSR/HaplotypeCaller/Mutect2/DeepVariant; annotate hits with alterlab-clinvar/alterlab-gnomad/alterlab-cosmic, parse VCFs with alterlab-pysam, store at scale with alterlab-tiledbvcf. Part of the AlterLab Academic Skills suite.

AI & Automation 27 stars 4 forks Updated today MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
48
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# nf-core/sarek — FASTQ-to-VCF Variant Calling The workflow-runner entry point for raw-reads-to-variants: drive the **Nextflow [nf-core/sarek](https://nf-co.re/sarek/3.8.1/) pipeline (pinned `-r 3.8.1`)** to take germline or somatic short-read FASTQ through alignment, GATK4 duplicate marking and base-quality recalibration, and SNV/indel calling, then hand the resulting VCFs to the suite's database and parsing skills for interpretation. This skill is the **command-line / workflow** counterpart to the suite's Python-library bioinformatics skills. Use it for the *raw-data-to-VCF* leg; use the library skills (`alterlab-pysam`, `alterlab-tiledbvcf`) once you hold a VCF. ## When to Use This Skill Trigger this skill when the user wants to: - Go from **FASTQ to VCF** — call variants on whole-genome (WGS) or whole-exome (WES) short reads. - Run **germline** SNV/indel calling (one or many normal samples). - Run **somatic / tumor-normal** calling (matched tumor + normal, or tumor-only). - Use **nf-core/sarek** specifically, or want a reproducible "GATK best-practices alignment-to-VCF" pipeline without hand-writing every step. - Resume a run from an intermediate **`--step`** (already have BAM/CRAM, only need recalibration or variant calling). ### Does NOT Trigger — route adjacent requests here | The request is really about… | Route to | |---|---| | Parsing / filtering / reading an **existing** VCF/BAM in Python (pysam/htslib) | `alterlab-pysam` | | **Storing / querying** lar...

Details

Author
AlterLab-IEU
Repository
AlterLab-IEU/AlterLab-Academic-Skills
Created
2 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

alterlab-qiime2-amplicon

Runs 16S/ITS amplicon (microbiome) analysis with the QIIME 2 amplicon distribution (2026.1; renamed to "qiime2" in 2026.4) in the correct order: manifest import, cutadapt trim-paired primer removal BEFORE dada2 denoise-paired (trunc-len chosen from the demux quality .qzv), feature-classifier classify-sklearn against a version-matched SILVA 138 or Greengenes2 classifier, and diversity core-metrics-phylogenetic — teaching the .qza/.qzv artifact-and-provenance model and the 2026.1 feature-table summarize change (the former summarize_plus). Use when the request mentions QIIME2, QIIME 2, qiime, 16S, 18S, ITS, amplicon, microbiome, ASV, DADA2 denoising, feature table, taxonomic classification, or core-metrics diversity. For downstream alpha/beta diversity, PCoA, and PERMANOVA on the exported feature table prefer alterlab-scikit-bio; this is conda-only (no pip install). Part of the AlterLab Academic Skills suite.

27 Updated today
AlterLab-IEU
AI & Automation Solid

gatk-variant-caller

GATK best practices skill for germline and somatic variant calling with joint genotyping

1,313 Updated today
a5c-ai
AI & Automation Listed

nextflow-development

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

2 Updated today
nota-america