varigraph

Introduction

An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes

Please note the following requirements before building and running the software:

Linux operating system
cmake version 3.12 or higher
C++ compiler that supports C++17 or higher, and the zlib library installed (we recommend using GCC version "7.3.0" or newer) for building varigraph

conda create -n varigraph
conda activate varigraph
conda install -c duzezhen varigraph

Use the following script to build the software:

git clone https://github.com/JiaoLab2021/varigraph.git
cd varigraph

Next, compile the software and add the current directory to your system's PATH environment variable.

cmake ./
make -j 5
echo 'export PATH="$PATH:'$(pwd)'"' >> ~/.bashrc
source ~/.bashrc

# Sample File
sample1 sample1.r1.fq.gz sample1.r2.fq.gz
sample2 sample2.r1.fq.gz sample2.r2.fq.gz
...
sampleN sampleN.r1.fq.gz sampleN.r2.fq.gz

Please note that the Sample file must be formatted exactly as shown above, where each sample is listed with its corresponding read files.

For convenience, let's assume the following file names for the input:

varigraph runs in two steps: the first step builds the genome graph, and the second step performs the genotyping. Here is the specific code:

1. Building the Genome Graph:

varigraph construct -r refgenome.fa -v input.vcf.gz --save-graph graph.bin

Adjustment for Tetraploid Genome:
- If your VCF file involves variants from a tetraploid genome, include the --vcf-ploidy 4 parameter.

2. Performing Genotyping:

varigraph genotype --load-graph graph.bin -s samples.cfg --use-depth

Adjustments for Genotyping:
- Homozygous Samples: For homozygous samples, add -g hom to improve genotyping accuracy.
- Use --use-depth for accurate genotyping regardless of ploidy.

The software supports species with ploidy ranging from 2 to 8. Please set the --sample-ploidy parameter to the corresponding value for the species:
- Autotetraploid: Set --sample-ploidy 4 (such as Solanum tuberosum, Medicago sativa, and Vaccinium corymbosum)
Autopolyploids (such as tetraploid potato): For these species, simply set --sample-ploidy to the corresponding ploidy level (e.g., 4 for tetraploid potato).
Allopolyploids (such as Brassica napus (AACC) or hexaploid wheat (AABBDD)): For these species, simply set --sample-ploidy to 2.
For accurate genotyping, make sure to choose the correct ploidy setting based on whether your species is a autopolyploid or allopolyploid.

GPU Version: varigraph also has a GPU-enabled version for faster computation if your server is equipped with GPUs.
Usage:
- Use --gpu to specify GPU usage. For example, --gpu 0 uses GPU 0.
- Adjust GPU memory usage with --buffer parameter. Smaller values consume less GPU memory.

MIT