Eager
A fully reproducible and state-of-the-art ancient DNA analysis pipeline
Install / Use
/learn @nf-core/EagerREADME
A fully reproducible and state-of-the-art ancient DNA analysis pipeline.
[!IMPORTANT]
nf-core/eager versions 2.* are only compatible with Nextflow versions up to 22.10.6!
Introduction
<!-- nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->nf-core/eager is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The pipeline pre-processes raw data from FASTQ inputs, or preprocessed BAM inputs. It can align reads and performs extensive general NGS and aDNA specific quality-control on the results. It comes with docker, singularity or conda containers making installation trivial and results highly reproducible.
<p align="center"> <img src="docs/images/usage/eager2_workflow.png" alt="nf-core/eager schematic workflow" width="70%" </p>Quick Start
-
Install
nextflow(>=20.07.1&&<=22.10.6) -
Install any of
Docker,Singularity,Podman,ShifterorCharliecloudfor full pipeline reproducibility (please only useCondaas a last resort; see docs) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/eager -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>in your command. This will enable eitherdockerorsingularityand set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/eager -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta' -
Once your run has completed successfully, clean up the intermediate files.
nextflow clean -f -k
See usage docs for all of the available options when running the pipeline.
N.B. You can see an overview of the run in the MultiQC report located at ./results/MultiQC/multiqc_report.html
Modifications to the default pipeline are easily made using various options as described in the documentation.
Pipeline Summary
Default Steps
By default the pipeline currently performs the following:
- Create reference genome indices for mapping (
bwa,samtools, andpicard) - Sequencing quality control (
FastQC) - Sequencing adapter removal, paired-end data merging (
AdapterRemoval) - Read mapping to reference using (
bwa aln,bwa mem,CircularMapper, orbowtie2) - Post-mapping processing, statistics and conversion to bam (
samtools) - Ancient DNA C-to-T damage pattern visualisation (
DamageProfilerormapDamage) - PCR duplicate removal (
DeDuporMarkDuplicates) - Post-mapping statistics and BAM quality control (
Qualimap) - Library Complexity Estimation (
preseq) - Overall pipeline statistics summaries (
MultiQC)
Additional Steps
Additional functionality contained by the pipeline currently includes:
Input
- Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)
Preprocessing
- Illumina two-coloured sequencer poly-G tail removal (
fastp) - Post-AdapterRemoval trimming of FASTQ files prior mapping (
fastp) - Automatic conversion of unmapped reads to FASTQ (
samtools) - Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)
aDNA Damage manipulation
- Damage removal/clipping for UDG+/UDG-half treatment protocols (
BamUtil) - Damaged reads extraction and assessment (
PMDTools) - Nuclear DNA contamination estimation of human samples (
angsd)
Genotyping
- Creation of VCF genotyping files (
GATK UnifiedGenotyper,GATK HaplotypeCallerandFreeBayes) - Creation of EIGENSTRAT genotyping files (
pileupCaller) - Creation of Genotype Likelihood files (
angsd) - Consensus sequence FASTA creation (
VCF2Genome) - SNP Table generation (
MultiVCFAnalyzer)
Biological Information
- Mitochondrial to Nuclear read ratio calculation (
MtNucRatioCalculator) - Statistical sex determination of human individuals (
Sex.DetERRmine)
Metagenomic Screening
- Low-sequenced complexity filtering (
BBduk) - Taxonomic binner with alignment (
MALT) - Taxonomic binner without alignment (
Kraken2) - aDNA characteristic screening of taxonomically binned data from MALT (
MaltExtract)
Functionality Overview
A graphical overview of suggested routes through the pipeline depending on context can be seen below.
<p align="center"> <img src="docs/images/usage/eager2_metromap_complex.png" alt="nf-core/eager metro map" width="70%" </p>Documentation
The nf-core/eager pipeline comes with documentation about the pipeline: usage and output.
- Nextflow installation
- Pipeline configuration
- Running the pipeline
- This includes tutorials, FAQs, and troubleshooting instructions
- Output and how to interpret the results
Credits
This pipeline was mostly written by Alexander Peltzer (apeltzer) and James A. Fellows Yates, with contributions from Stephen Clayton, Thiseas C. Lamnidis, Maxime Borry, Zandra Fagernäs, Aida Andrades Valtueña and Maxime Garcia and the nf-core community.
We thank the following people for their extensive assistance in the development of this pipeline:
Authors (alphabetical)
- Aida Andrades Valtueña
- Alexander Peltzer
- James A. Fellows Yates
- Judith Neukamm
- Maxime Borry
- Maxime Garcia
- Stephen Clayton
- Thiseas C. Lamnidis
- Zandra Fagernäs
Additional Contributors (alphabetical)
Those who have provided conceptual guidance, suggestions, bug reports etc.
- Alex Hübner
- Alexandre Gilardet
- Arielle Munters
- Åshild Vågene
- Asmaa Ali
- Charles Plessy
- Elina Salmela
- Fabian Lehmann
- He Yu
- Hester van Schalkwyk
- Ido Bar
- Irina Velsko
- Işın Altınkaya
- Johan Nylander
- Jonas Niemann
- Katerine Eaton
- Kathrin Nägele
- Kevin Lord
- Laura Lacher
- Luc Venturini
- Mahesh Binzer-Panchal
- Marcel Keller
- Megan Michel
- Pierre Lindenbaum
- [Pontus Skoglund](https://github.com/po
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR

