EVG
A rapid and accurate ensemble pipeline for graph-based variant genotyping with lower depth of short reads
Install / Use
/learn @JiaoLab2021/EVGREADME
EVG
<!-- [](https://github.com/JiaoLab2021/EVG/releases) --> <!-- [](https://anaconda.org/DuZeZhen/evg) -->Introduction
A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline

Requirements
Please note the following requirements before building and running the software:
Linuxoperating system- cmake version
3.12or higher - Python version
3.9 - C++ compiler that supports
C++17or higher, and thezliblibrary installed (we recommend using GCC version"7.3.0"or newer) for buildinggraphvcfandfastAQ - The following dependencies must also be installed: tabix, bwa, samtools, VG, GraphAligner, Paragraph, BayesTyper, GraphTyper2, PanGenie
Recent major updates:
(2025/04/30, v1.2.2)
- Updated Giraffe indexing and alignment commands for vg ≥1.63.0.
- Pinned BayesTyper to 1.5=h176a8bc_0 due to bugs in newer conda versions.
(2024/06/25, v1.2.0)
- If a sample's genotype information is missing in the VCF file, the previous version would throw a segmentation fault. In version
v1.2.0, it will be replaced with0|0.
Installation
Install via Anaconda
The easiest way to install EVG is through Anaconda, but please note that in this case, the Python version must be 3.9. Conda will automatically set the Python version for you, so please ensure that your system can install Python 3.9.
# Create a new environment named evg_env
conda create -n evg_env
# Activate the environment
conda activate evg_env
# Install EVG with all dependencies
conda install -c bioconda -c conda-forge -c kdm801 -c duzezhen evg
Building on Linux
Use the following script to build the software:
- First, obtain the source code.
git clone https://github.com/JiaoLab2021/EVG.git
cd EVG
- Next, compile the software and add the current directory to your system's
PATHenvironment variable. Please make sure thatEVG,graphvcf, andfastAQare all in the same folder, asEVGwill call these two programs from its own directory.
cmake ./
make
chmod +x EVG.py
ln -sf EVG.py EVG
echo 'export PATH="$PATH:'$(pwd)'"' >> ~/.bashrc
source ~/.bashrc
- Assuming that you have installed all the required software dependencies, please make sure they have been added to your environment path or activated in the corresponding code environment. If you haven't installed them yet, you can use the following code to install all the dependencies:
# Create a new environment named evg_env
conda create -n evg_env
# Activate the environment
conda activate evg_env
# Install software using conda
conda install -c bioconda -c conda-forge -c kdm801 tabix bwa samtools vg graphaligner paragraph 'bayestyper==1.5=h176a8bc_0' graphtyper kmc pangenie
# "ModuleNotFoundError: No module named 'pysam.bcftools'", therefore it is recommended to upgrade pysam in this case
conda update pysam
Note
The default version of PanGenie installed by conda is 2.1.0, but EVG requires version 3.0 or higher. If you choose PanGenie as your downstream tool, please remove the current PanGenie from your conda environment and manually install the latest version of PanGenie, then add it to your environment variables.
Test
To verify that the software has been installed correctly, perform a test run using the following steps:
EVG -h
graphvcf -h
fastAQ -h
tabix -h
bwa
samtools
vg -h
GraphAligner -h
paragraph -h
bayesTyper -h
graphtyper -h
PanGenie -h
kmc -h
jellyfish -h
# test
cd test
EVG -r test.fa -v test.vcf.gz -s sample.txt --software VG-MAP VG-Giraffe GraphAligner Paragraph BayesTyper GraphTyper2 PanGenie &>log.txt &
Usage
Input Files
- Reference Genome
- VCF File of Population Variants
- Sample File:
# Sample File
sample1 sample1.r1.fq.gz sample1.r2.fq.gz
sample2 sample2.r1.fq.gz sample2.r2.fq.gz
...
sampleN sampleN.r1.fq.gz sampleN.r2.fq.gz
Please note that the Sample file must be formatted exactly as shown above, where each sample is listed with its corresponding read files.
Running
For convenience, let's assume the following file names for the input:
refgenome.fainput.vcf.gzsample.txt
EVG automatically selects suitable software based on the genome, mutation and sequencing data. If desired, users can also use the "--software" command to specify their preferred software. The default running command is as follows:
EVG -r refgenome.fa -v input.vcf.gz -s sample.txt
The results are stored in the merge/ folder, and each file is named after the corresponding sample listed in sample.txt: sample1.vcf.gz, sample2.vcf.gz, ..., sampleN.vcf.gz.
$ tree merge/
merge/
├── test1.vcf.gz
└── test2.vcf.gz
0 directories, 2 files
Parameter
--depth: This parameter specifies the maximum sequencing data depth allowed for downstream analysis. If this value is exceeded, EVG will randomly downsample reads to the specified level in order to speed up the run. The default downsampling level is set at 15×, but it can be adjusted to meet specific requirements.--mode: This parameter determines the operating mode ofEVG. In fast mode, only certain software is utilized to genotype SNPs and indels, while precise mode employs all software to genotype all variants.--force: If there are pre-existing files in the running directory ofEVG, this parameter can be used to forcibly empty the folder. Otherwise, the software will encounter an error and exit.--restart: This parameter allows the software to resume from where it left off if it unexpectedly stops, enabling a breakpoint restart. Note that software completion is determined by file existence. It's recommended to manually check for incomplete or empty files before using this parameter and delete them.
graphvcf
If you already have results from different genotyping software and do not need to use EVG, you can directly use graphvcf to merge your results.
graphvcf merge -v merged.vcf.gz --Paragraph xx.vcf.gz --BayesTyper xx.vcf.gz --VG-Giraffe xx.vcf.gz -n sample1 -o sample.vcf.gz
Detailed instructions for using graphvcf can be found on the Wiki page.
Citation
When using the following tools, please cite the corresponding articles:
-
EVG:- Du, ZZ., He, JB. & Jiao, WB. A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline. Genome Biol 25, 91 (2024).
-
vg map:- Hickey, G., Heller, D., Monlong, J. et al. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biol 21, 35 (2020).
-
vg giraffe:- Jouni Sirén et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).
-
GraphAligner:- Rautiainen, M., Marschall, T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 21, 253 (2020).
-
Paragraph:-
Chen, S., Krusche, P., Dolzhenko, E. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol 20, 291 (2019).
-
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: Genomics. (2013).
-
-
BayesTyper:-
Sibbesen, J.A., Maretty, L., The Danish Pan-Genome Consortium. et al. Accurate genotyping across variant classes and lengths using variant graphs. Nat Genet 50, 1054–1059 (2018).
-
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: Genomics. (2013).
-
-
GraphTyper2:- Eggertsson, H.P., Kristmundsdottir, S., Beyter, D. et al. [GraphTyper2 enables population-sca
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
