Selscan
Haplotype based scans for selection
Install / Use
/learn @szpiech/SelscanREADME
selscan -- a program to calculate EHH-based scans for positive selection in genomes
Copyright (C) 2014 Zachary A Szpiech
selscan currently implements EHH, iHS, XP-EHH, nSL, XP-nSL and iHH12.
It should be run separately for each chromosome and population (or population
pair for XP-EHH). selscan is 'dumb' with respect ancestral/derived coding and
simply expects haplotype data to be coded 0/1. Unstandardized iHS/nSL scores are
thus reported as log(iHH1/iHH0) based on the coding you have provided.
📚 Citations
A Rahman, TQ Smith, ZA Szpiech. (2025) Fast and Memory-Efficient Dynamic Programming Approach for Large-Scale EHH-Based Selection Scans. Molecular biology and evolution 42.11 (2025): msaf275. doi: https://doi.org/10.1093/molbev/msaf275
ZA Szpiech (2024) selscan 2.0: scanning for sweeps in unphased data. Bioinformatics, 40(1), btae006.
doi: https://doi.org/10.1093/bioinformatics/btae006
ZA Szpiech and RD Hernandez (2014) selscan: an efficient multi-threaded program
to calculate EHH-based scans for positive selection. Molecular Biology and Evolution
31: 2824-2827.
ZA Szpiech et al. (2021) Application of a novel haplotype-based scan for local adaptation
to study high-altitude adaptation in rhesus macaques. Evolution Letters
doi: https://doi.org/10.1002/evl3.232
R Torres et al. (2018) Human demographic history has amplified the effects of
background selection across the genome. PLoS Genetics 15: e1007898.
N Garud et al. (2015) Recent selective sweeps in North American Drosophila
melanogaster show signatures of soft sweeps. PLoS Genetics 11: 1–32.
A Ferrer-Admetlla et al. (2014) On detecting incomplete soft or hard selective sweeps
using haplotype structure. Molecular Biology and Evolution 31: 1275-1291.
K Wagh et al. (2012) Lactase Persistence and Lipid Pathway Selection in the Maasai. PloS ONE 7: e44751.
PC Sabeti et al. (2007) Genome-wide detection and characterization of positive
selection in human populations. Nature 449: 913–918.
BF Voight et al. (2006) A map of recent positive selection in the human
genome. PLoS Biology 4: e72.
PC Sabeti et al. (2002) Detecting recent positive selection in the human
genome from haplotype structure. Nature 419: 832–837.
🛠️ Installation from source
git clone https://github.com/szpiech/selscan/
cd selscan && git checkout main
cd src && make
If you prefer OS-specific makefiles, replace make with one of the following:
make -f Makefile_macos→ for macOSmake -f Makefile_linux→ for Linuxmake -f Makefile_win→ for Windows
📦 Precompiled Binaries
Precompiled binaries are available for the following platforms:
- Linux:
/bin/linux/ - Windows:
/bin/win/ - macOS Universal:
/bin/macos/
Additionally, we provide binaries for:
- macOS Apple Silicon, ARM64 only:
/bin/macos-arm64/ - macOS Intel, x86_64 and older macs:
/bin/osx/
📖 Usage
For details, refer to the manual.
** Data must have no missing genotypes. **
selscan v2.1.0 -- a program to calculate EHH-based scans for positive selection in genomes.
Source code and binaries can be found at <https://www.github.com/szpiech/selscan>.
selscan currently implements EHH, iHS, XP-EHH, nSL, and XP-nSL.
To calculate EHH:
./selscan --ehh <locusID> --vcf <vcf> --map <mapfile> --out <outfile>
To calculate iHS:
./selscan --ihs --vcf <vcf> --map <mapfile> --out <outfile>
To calculate nSL:
./selscan --nsl --vcf <vcf> --out <outfile>
To calculate XP-nSL:
./selscan --xpnsl --vcf <vcf> --vcf-ref <vcf> --out <outfile>
To calculate iHH12:
./selscan --ihh12 --vcf <vcf> --map <mapfile> --out <outfile>
To calculate XP-EHH:
./selscan --xpehh --vcf <vcf> --vcf-ref <vcf> --map <mapfile> --out <outfile>
----------Command Line Arguments----------
--alt <bool>: Set this flag to calculate homozygosity based on the sum of the
squared haplotype frequencies in the observed data instead of using
binomial coefficients.
Default: false
--cutoff <double>: The EHH decay cutoff.
Default: 0.05
--ehh <string>: Calculate EHH of the '1' and '0' haplotypes at the specified
locus. Output: <physical dist> <genetic dist> <'1' EHH> <'0' EHH>
Default: __NO_LOCUS__
--ehh-win <int>: When calculating EHH, this is the length of the window in bp
in each direction from the query locus.
Default: 100000
--gap-scale <int>: Gap scale parameter in bp. If a gap is encountered between
two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is
scaled by GAP_SCALE/GAP.
Default: 20000
--hap <string>: A hapfile with one column per haplotype, and one row per
variant. Variants should be coded 0/1
Default: __hapfile1
--help <bool>: Prints this help dialog.
Default: false
--ihh12 <bool>: Set this flag to calculate iHH12.
Default: false
--ihs <bool>: Set this flag to calculate iHS.
Default: false
--ihs-detail <bool>: Set this flag to write out left and right iHH scores for '1' and '0' in addition to iHS.
Default: false
--keep-low-freq <bool>: Include low frequency variants in the construction of your haplotypes.
Default: false
--maf <double>: If a site has a MAF below this value, the program will not use
it as a core snp.
Default: 0.05
--map <string>: A mapfile with one row per variant site.
Formatted <chr#> <locusID> <genetic pos> <physical pos>.
Default: __mapfile
--max-extend <int>: The maximum distance an EHH decay curve is allowed to extend from the core.
Set <= 0 for no restriction.
Default: 1000000
--max-extend-nsl <int>: The maximum distance an nSL haplotype is allowed to extend from the core.
Set <= 0 for no restriction.
Default: 100
--max-gap <int>: Maximum allowed gap in bp between two snps.
Default: 200000
--multi-param <string>: Specify a JSON file with multiple parameter sets.
Each set should match the structure of command-line arguments.
The program will run the analysis for each set, generating separate outputs.
Useful for batch processing and exploring different configurations.
Default: __jsonFile
--nsl <bool>: Set this flag to calculate nSL.
Default: false
--out <string>: The basename for all output files.
Default: outfile
--pi <bool>: Set this flag to calculate mean pairwise sequence difference in a sliding window.
Default: false
--pi-win <int>: Sliding window size in bp for calculating pi.
Default: 100
--pmap <bool>: Use physical map instead of a genetic map.
Default: false
--ref <string>: A hapfile with one row per haplotype, and one column per
variant. Variants should be coded 0/1. This is the 'reference'
population for XP-EHH calculations. Ignored otherwise.
Default: __hapfile2
--skip-low-freq <bool>: **This flag is now on by default. If you want to include low frequency variants
in the construction of your haplotypes please use the --keep-low-freq flag.
Default: false
--thap <string>: A hapfile in IMPUTE hap format with one column per haplotype, and one row per
variant. Variants should be coded 0/1
Default: __thapfile1
--thap-ref <string>: A hapfile in IMPUTE hap format with one column per haplotype, and row per
variant. Variants should be coded 0/1. This is the 'reference'
population for XP calculations. Ignored otherwise.
Default: __thapfile2
--threads <int>: The number of threads to spawn during the calculation.
Partitions loci across threads.
Default: Maximum concurrency supported by the system (hardware threads).
--tped <string>: A TPED file containing haplotype and map data.
Variants should be coded 0/1
Default: __hapfile1
--tped-ref <string>: A TPED file containing haplotype and map data.
Variants should be coded 0/1. This is the 'reference'
population for XP-EHH calculations and should contain the same number
of loci as the query population. Ignored otherwise.
Default: __hapfile2
--trunc-ok <bool>: If an EHH decay reaches the end of a sequence before reaching the cutoff,
integrate the curve anyway (iHS and XPEHH only).
Normal function is to disregard the score for that core.
Default: false
--unphased <bool>: Set this flag to use multilocus genotypes.
Default: false
--vcf <string>: A VCF file containing haplotype data.
A map file must be specified with --map.
Default: __hapfile1
--vcf-ref <string>: A VCF file containing haplotype and map data.
Variants should be coded 0/1. This is the 'reference'
population for XP-EHH calculations and should contain the same number
of loci as the query population. Ignored otherwise.
Default: __hapfile2
--wagh <bool>: Set this flag to calculate XP-EHH using definition of EHH which
separates core SNP alleles in the denominator.
Default: false
--xpehh <bool>: Set this flag to calculate XP-EHH.
Default: false
--xpnsl <bool>: Set this flag to calculate XP-nSL.
Default: false
📝 Change Log
10MAR2026 - selscan v2.1.2 - Bug fixes for the scalable version v2.1+:
- Removed NaNs from output (fixed issue #152).
- Fixed a bug affecting input where number of haplotypes is a multiple of 64 causing crashes (fixed issue #154).
- Updated binaries for compatibility with older macOS versions.
- Fixed filename for EHH12 output and log.
- Skips VCF entries with multiple records at the same genomic position.
- MAF filtering now correctly affects iHH12 (was previously ignored).
26SEP2025 - selscan v2.1.1 - Bug fixes for the fast and memory-efficient version introduced in v2.1:
- Refined cutoff handling for edge cases, improving correlation with v2.0 outputs.
- Corrected reporting of sites with low minor allele counts (avoids zero-area and infinite nSL/iHS).
- Fixed nSL distance cutof
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
