NgsTools
Programs to analyse NGS data for population genetics purposes
Install / Use
/learn @mfumagalli/NgsToolsREADME
ngsTools
NGS (Next-Generation Sequencing) technologies have revolutionised population genetic research by enabling unparalleled data collection from the genomes or subsets of genomes from many individuals. Current technologies produce short fragments of sequenced DNA called reads that are either de novo assembled or mapped to a pre-existing reference genome. This leads to chromosomal positions being sequenced a variable number of times across the genome. This parameter is usually referred to as the sequencing depth. Individual genotypes are then inferred from the proportion of nucleotide bases covering each site after the reads have been aligned.
Low sequencing depth and high error rates stemming from base calling and mapping errors can cause SNP (Single Nucleotide Polymorphism) and genotype calling from NGS data to be associated with considerable statistical uncertainty. Probabilistic models, which take these errors into account, have been proposed to accurately assign genotypes and estimate allele frequencies (e.g. Nielsen et al. 2012; for a review Nielsen et al. 2011).
ngsTools is a collection of programs for population genetics analyses from NGS data, taking into account data statistical uncertainty. The methods implemented in these programs do not rely on SNP or genotype calling, and are particularly suitable for low sequencing depth data. An application note illustrating its application has published (Fumagalli et al. 2014).
NOTE: this repository is intended for general use as it groups together the latest stable version of each tool. Developers (and only them) may want to check each tool's main repository.
Packages
-
ANGSD - Software for analyzing next generation sequencing data taking genotype uncertainty into account by working with genotype probabilities (instead of called genotypes). This is especially useful for low and medium depth data (Korneliussen et al., 2014). NOTE: this program is NOT developed by
ngsToolsso, if you have any questions about it or encounter any errors/bugs, please check its wiki or contact its authors. -
ngsSim - Simple sequencing read simulator that can generate data for multiple populations with variable levels of depth, error rates, genetic variability, and individual inbreeding (Kim et al., 2011). It generates mapped reads and the corresponding genotype likelihoods, avoiding mapping uncertainty and being extremely fast.
-
ngsF - This program provides a method to estimate individual inbreeding coefficients using an EM algorithm (Vieira et al., 2013). These can provide insight into a population's mating system or demographic history and, more importantly, they can be used as a prior in ANGSD.
-
ngsPopGen - Several tools to perform population genetic analyses from NGS data (Fumagalli et al., 2013, Fumagalli, 2013).
- ngsFst - Quantifying population genetic differentiation
- ngsCovar - Population structure via PCA (principal components analysis)
- ngs2dSFS - Estimate 2D-SFS from posterior probabilities of sample allele frequencies
- ngsStat - Estimate number of segregating sites, expected average heterozygosity and other nucleotide diversity indexes
-
ngsUtils - General tools to manipulate data produced by ngsTools.
- GetMergedGeno - Merge genotype posterior probabilities files
- GetSubGeno - Select a subset of genotype posterior probabilities files
- GetSubSim - Select a subset of simulated data files
- GetSwitchedGeno - Switch major/minor in genotype posterior probabilities files
-
ngsDist is a program that estimates genetic distances from genotype posterior probabilities (Vieira et al., 2016).
-
ngsF-HMM is a program developed and written by F.G. Vieira to estimate per-individual inbreeding tracts using a two-state Hidden Markov Model (Vieira et al. 2016). It uses a probabilistic framework that takes the uncertainty of genotype's assignation into account; making it specially suited for low-quality or low-coverage datasets. It is not officially part of
ngsToolsso it must be installed separately. -
ngsLD is a program that calculates pairwise Linkage Disequilibrium (LD) under a probabilistic framework (Fox et al., 2019).
-
HMMploidy is a program written by S. Soraggi to infer ploidy levels from low-coverage sequencing data. It is not officially part of
ngsToolsso it must be installed separately.
Installation
ngsTools can be easily installed but some packages have some external dependencies:
- Mandatory:
gcc: >= 4.9.2 tested on Debian 7.8 (wheezy)zlib: v1.2.7 tested on Debian 7.8 (wheezy)gsl: v1.15 tested on Debian 7.8 (wheezy)libbz2.so, required by htslibliblzma.so, required by htsliblibcurl.so, required by angsd
- Optional (only needed for testing or auxilliary scripts):
md5sumsamtoolsPerlpackages:Getopt::Long,Graph::Easy,Math::BigFloat, andIO::ZlibRpackages:optparse,tools,ggplot2,reshape2,plyr,gtools,LDheatmap,ape,grid,methods,phangorn, andplot3D
If you have issues with gsl package, then on linux make sure you have these packages installed: gsl-bin libgsl-dbg libgsl-dev libgslcblas0.
To download ngsTools and its submodules use:
% git clone --recursive https://github.com/mfumagalli/ngsTools.git
If you prefer, although it is not recomended, you can download a zipped folder on the right side of this page ("Download ZIP").
To install these tools just run:
% cd ngsTools
% make
To run the tests (currently deprecated):
% make test
Executables are built into each tool directory in the repository. If you wish to clean all binaries and intermediate files:
% make clean
To get the latest version of ngsTools package:
% git pull
% git submodule update
NOTE: for developers only: if you wish to make changes and update the whole package:
# in the modified repo
# be sure to be on the master branch: git checkout master
% git commit -a -m 'Local changes...'
% git push
# in ngsTools main repo
% git commit -a -m 'Submodules updated'
% git push
# check that everything went well:
% git status
To build the tool set into an Apptainer or SingularityCE container you can take the following definition file as a starting point:
<details> <summary> Apptainer/SingularityCE container definition file </summary>Bootstrap: docker
From: ubuntu:20.04
%post
export DEBIAN_FRONTEND=noninteractive
apt-get update -y
apt install -y git build-essential pkg-config
apt install -y libz-dev libbz2-dev liblzma-dev libcurl4-openssl-dev libssl-dev libgsl-dev
git clone --recursive https://github.com/mfumagalli/ngsTools.git
cd ngsTools
# this will build a specific commit (in this case commit 6505f80) #
# alternatively you can build the latest commit by commenting out the next two lines
git checkout 6505f80
git submodule update --init --recursive
make
%environment
export LC_ALL=C
%runscript
export PATH=/ngsTools/angsd:$PATH
export PATH=/ngsTools/ngsDist:$PATH
export PATH=/ngsTools/ngsF:$PATH
export PATH=/ngsTools/ngsF-HMM:$PATH
export PATH=/ngsTools/ngsLD:$PATH
export PATH=/ngsTools/ngsPopGen:$PATH
export PATH=/ngsTools/ngsSim:$PATH
export PATH=/ngsTools/ngsUtils:$PATH
$@
%labels
Author Radovan Bast
%help
You can build this container with:
$ sudo singularity build container.sif container.def
This is how I use this container image:
$ ./container.sif ngsDist [other arguments]
$ ./container.sif [some other NGS tool]
</details>
Input Files
All programs receive as input files produced by ANGSD. In general, these files can contain genotype likelihoods, genotype posterior probabilities, sample allele frequency posterior probabilities or an estimate of the SFS (Site Frequency Spectrum). Please refer to each tool's repository or the comprehensive Tutorial for more explanations and examples on how these tools work.
Tutorial
A tutorial on some analyses using ngsTools/ANGSD from BAM files can be found here. In this Tutorial, you will find how to filter your data, assess population structure and estimate summary statistics using these tools for low-depth data. For most cases, you will find all the information you need here.
Additional information
Authors
Matteo Fumagalli & Filipe G. Vieira. Other programmers and developers: Tyler Lynderoth, Rasmus Nielsen. Some lines of code have been and adap
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
