SkillAgentSearch skills...

Selscan

Haplotype based scans for selection

Install / Use

/learn @szpiech/Selscan
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- badges: start -->

Version License: GPL v3

<!--badges: end -->

selscan -- a program to calculate EHH-based scans for positive selection in genomes

Copyright (C) 2014 Zachary A Szpiech

selscan currently implements EHH, iHS, XP-EHH, nSL, XP-nSL and iHH12.

It should be run separately for each chromosome and population (or population
pair for XP-EHH). selscan is 'dumb' with respect ancestral/derived coding and
simply expects haplotype data to be coded 0/1. Unstandardized iHS/nSL scores are
thus reported as log(iHH1/iHH0) based on the coding you have provided.

📚 Citations

A Rahman, TQ Smith, ZA Szpiech. (2025) Fast and Memory-Efficient Dynamic Programming Approach for Large-Scale EHH-Based Selection Scans. Molecular biology and evolution 42.11 (2025): msaf275. doi: https://doi.org/10.1093/molbev/msaf275
ZA Szpiech (2024) selscan 2.0: scanning for sweeps in unphased data. Bioinformatics, 40(1), btae006.
	doi: https://doi.org/10.1093/bioinformatics/btae006
ZA Szpiech and RD Hernandez (2014) selscan: an efficient multi-threaded program 
	to calculate EHH-based scans for positive selection. Molecular Biology and Evolution 
	31: 2824-2827.
ZA Szpiech et al. (2021) Application of a novel haplotype-based scan for local adaptation 
	to study high-altitude adaptation in rhesus macaques. Evolution Letters 
	doi: https://doi.org/10.1002/evl3.232
R Torres et al. (2018) Human demographic history has amplified the effects of
	background selection across the genome. PLoS Genetics 15: e1007898.
N Garud et al. (2015) Recent selective sweeps in North American Drosophila
	melanogaster show signatures of soft sweeps. PLoS Genetics 11: 1–32.
A Ferrer-Admetlla et al. (2014) On detecting incomplete soft or hard selective sweeps
	using haplotype structure. Molecular Biology and Evolution 31: 1275-1291.
K Wagh et al. (2012) Lactase Persistence and Lipid Pathway Selection in the Maasai. PloS ONE 7: e44751.
PC Sabeti et al. (2007) Genome-wide detection and characterization of positive 
	selection in human populations. Nature 449: 913–918.
BF Voight et al. (2006) A map of recent positive selection in the human 
	genome. PLoS Biology 4: e72.
PC Sabeti et al. (2002) Detecting recent positive selection in the human 
	genome from haplotype structure. Nature 419: 832–837.

🛠️ Installation from source

git clone https://github.com/szpiech/selscan/
cd selscan && git checkout main
cd src && make

If you prefer OS-specific makefiles, replace make with one of the following:

  • make -f Makefile_macos    → for macOS
  • make -f Makefile_linux    → for Linux
  • make -f Makefile_win       → for Windows

📦 Precompiled Binaries

Precompiled binaries are available for the following platforms:

  • Linux: /bin/linux/
  • Windows: /bin/win/
  • macOS Universal: /bin/macos/

Additionally, we provide binaries for:

  • macOS Apple Silicon, ARM64 only: /bin/macos-arm64/
  • macOS Intel, x86_64 and older macs: /bin/osx/

📖 Usage

For details, refer to the manual.

** Data must have no missing genotypes. **

selscan v2.1.0 -- a program to calculate EHH-based scans for positive selection in genomes.
Source code and binaries can be found at <https://www.github.com/szpiech/selscan>.

selscan currently implements EHH, iHS, XP-EHH, nSL, and XP-nSL.

To calculate EHH:
./selscan --ehh <locusID> --vcf <vcf> --map <mapfile> --out <outfile>

To calculate iHS:
./selscan --ihs --vcf <vcf> --map <mapfile> --out <outfile>

To calculate nSL:
./selscan --nsl --vcf <vcf> --out <outfile>

To calculate XP-nSL:
./selscan --xpnsl --vcf <vcf> --vcf-ref <vcf> --out <outfile>

To calculate iHH12:
./selscan --ihh12 --vcf <vcf> --map <mapfile> --out <outfile>

To calculate XP-EHH:
./selscan --xpehh --vcf <vcf> --vcf-ref <vcf> --map <mapfile> --out <outfile>

----------Command Line Arguments----------

--alt <bool>: Set this flag to calculate homozygosity based on the sum of the
	squared haplotype frequencies in the observed data instead of using
	binomial coefficients.
	Default: false

--cutoff <double>: The EHH decay cutoff.
	Default: 0.05

--ehh <string>: Calculate EHH of the '1' and '0' haplotypes at the specified
	locus. Output: <physical dist> <genetic dist> <'1' EHH> <'0' EHH>
	Default: __NO_LOCUS__

--ehh-win <int>: When calculating EHH, this is the length of the window in bp
	in each direction from the query locus.
	Default: 100000

--gap-scale <int>: Gap scale parameter in bp. If a gap is encountered between
	two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is
	scaled by GAP_SCALE/GAP.
	Default: 20000

--hap <string>: A hapfile with one column per haplotype, and one row per
	variant. Variants should be coded 0/1
	Default: __hapfile1

--help <bool>: Prints this help dialog.
	Default: false

--ihh12 <bool>: Set this flag to calculate iHH12.
	Default: false

--ihs <bool>: Set this flag to calculate iHS.
	Default: false

--ihs-detail <bool>: Set this flag to write out left and right iHH scores for '1' and '0' in addition to iHS.
	Default: false

--keep-low-freq <bool>: Include low frequency variants in the construction of your haplotypes.
	Default: false

--maf <double>: If a site has a MAF below this value, the program will not use
	it as a core snp.
	Default: 0.05

--map <string>: A mapfile with one row per variant site.
	Formatted <chr#> <locusID> <genetic pos> <physical pos>.
	Default: __mapfile

--max-extend <int>: The maximum distance an EHH decay curve is allowed to extend from the core.
	Set <= 0 for no restriction.
	Default: 1000000

--max-extend-nsl <int>: The maximum distance an nSL haplotype is allowed to extend from the core.
	Set <= 0 for no restriction.
	Default: 100

--max-gap <int>: Maximum allowed gap in bp between two snps.
	Default: 200000

--multi-param <string>: Specify a JSON file with multiple parameter sets.
        Each set should match the structure of command-line arguments.
        The program will run the analysis for each set, generating separate outputs.
        Useful for batch processing and exploring different configurations.

        Default: __jsonFile

--nsl <bool>: Set this flag to calculate nSL.
	Default: false

--out <string>: The basename for all output files.
	Default: outfile

--pi <bool>: Set this flag to calculate mean pairwise sequence difference in a sliding window.
	Default: false

--pi-win <int>: Sliding window size in bp for calculating pi.
	Default: 100

--pmap <bool>: Use physical map instead of a genetic map.
	Default: false

--ref <string>: A hapfile with one row per haplotype, and one column per
	variant. Variants should be coded 0/1. This is the 'reference'
	population for XP-EHH calculations.  Ignored otherwise.
	Default: __hapfile2

--skip-low-freq <bool>: **This flag is now on by default. If you want to include low frequency variants
in the construction of your haplotypes please use the --keep-low-freq flag.
	Default: false

--thap <string>: A hapfile in IMPUTE hap format with one column per haplotype, and one row per
        variant. Variants should be coded 0/1
        Default: __thapfile1

--thap-ref <string>: A hapfile in IMPUTE hap format with one column per haplotype, and row per
        variant. Variants should be coded 0/1. This is the 'reference'
        population for XP calculations.  Ignored otherwise.
        Default: __thapfile2

--threads <int>: The number of threads to spawn during the calculation.
	Partitions loci across threads.
	Default: Maximum concurrency supported by the system (hardware threads).

--tped <string>: A TPED file containing haplotype and map data.
	Variants should be coded 0/1
	Default: __hapfile1

--tped-ref <string>: A TPED file containing haplotype and map data.
	Variants should be coded 0/1. This is the 'reference'
	population for XP-EHH calculations and should contain the same number
	of loci as the query population. Ignored otherwise.
	Default: __hapfile2

--trunc-ok <bool>: If an EHH decay reaches the end of a sequence before reaching the cutoff,
	integrate the curve anyway (iHS and XPEHH only).
	Normal function is to disregard the score for that core.
	Default: false

--unphased <bool>: Set this flag to use multilocus genotypes.
	Default: false

--vcf <string>: A VCF file containing haplotype data.
	A map file must be specified with --map.
	Default: __hapfile1

--vcf-ref <string>: A VCF file containing haplotype and map data.
	Variants should be coded 0/1. This is the 'reference'
	population for XP-EHH calculations and should contain the same number
	of loci as the query population. Ignored otherwise.
	Default: __hapfile2

--wagh <bool>: Set this flag to calculate XP-EHH using definition of EHH which
	separates core SNP alleles in the denominator.
	Default: false

--xpehh <bool>: Set this flag to calculate XP-EHH.
	Default: false

--xpnsl <bool>: Set this flag to calculate XP-nSL.
	Default: false

📝 Change Log

10MAR2026 - selscan v2.1.2 - Bug fixes for the scalable version v2.1+: 
	- Removed NaNs from output (fixed issue #152).
	- Fixed a bug affecting input where number of haplotypes is a multiple of 64 causing crashes (fixed issue #154).
	- Updated binaries for compatibility with older macOS versions.
	- Fixed filename for EHH12 output and log.
	- Skips VCF entries with multiple records at the same genomic position.
	- MAF filtering now correctly affects iHH12 (was previously ignored).

26SEP2025 - selscan v2.1.1 - Bug fixes for the fast and memory-efficient version introduced in v2.1:

    - Refined cutoff handling for edge cases, improving correlation with v2.0 outputs.
    - Corrected reporting of sites with low minor allele counts (avoids zero-area and infinite nSL/iHS).
    - Fixed nSL distance cutof

Related Skills

View on GitHub
GitHub Stars143
CategoryDevelopment
Updated15d ago
Forks31

Languages

C

Security Score

95/100

Audited on Mar 15, 2026

No findings