SkillAgentSearch skills...

SnpCountCU

Count common and unique SNPs among several populations from a VCF format file.

Install / Use

/learn @JingfangSI/SnpCountCU
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SnpCountCU

Count common and unique SNPs among several populations from a VCF format file.

Brief description

This script is used to count the number of SNPs that are common among populations and unique within populations from VCF formart file.

The populations and individuals included in the population are specified by the -l/--list parameter. The number of population-shared and population-specific SNPs are calculated based on the intersection and difference of the snp contained in each population. By default, SNPs contained in a population are defined as all non-"0/0" type sites of the individual corresponding to the population in the vcf file. You can set the -f/--freq-threshold parameter to filter SNPs in each population according to the frequency of ALT alleles. SNPs shared among all population and unique to each population determined in the list file are counted by default. You can use the --common-pop parameter to determine additional population combinations to calculate the number of snp they shared.

Additionally, you can uses multiple threads to perform parallel calculations on each population to speed up the program. Therefore, the number of threads should be less than the set total number of populations, otherwise it will not provide higher calculation efficiency.

Note: This script is based on python3. Before running this script, make sure that your python version meets the requirements, and the modules required in the script have been installed, especially cyvcf2.

Please don't hesitate to open an Issue if you find any problem or suggestions for a new feature.

Usage

Just type python3 SnpCountCU.py -h or ./SnpCountCU.py -h to show the help of the program:

usage: SnpCountCU.py [-h] -v VCF -l LIST -r REGION -o OUT [-f FREQ_THRESHOLD]
                     [--common-pop COMMON_POP] [-nt NUM_THREADS]

This script is used to count the number of SNPs that are common among
population and unique within population from VCF formart file.

optional arguments:
  -h, --help            show this help message and exit
  -v VCF, --vcf VCF     Name of the input vcf file, must be gziped and indexed
                        by 'bcftools index'
  -l LIST, --list LIST  Name of the input population list, two columns: pop_id
                        sample_id
  -r REGION, --region REGION
                        Name the input list file, one chromosome name per line
  -o OUT, --out OUT     Name of the output file
  -f FREQ_THRESHOLD, --freq-threshold FREQ_THRESHOLD
                        A frequence threshold value, SNPs with ALT allele
                        frequence greater than this value are regarded as
                        existing in a population.(default: 0.00001)
  --common-pop COMMON_POP
                        Add additional population combinations to count common
                        snp, e.g. 'GroupA:pop1,pop2,pop3;GroupB:pop4,pop5'
  -nt NUM_THREADS, --num-threads NUM_THREADS
                        Number of threads. This value should be lower than the
                        number of populations in the list, otherwise it will
                        not provide additional efficiency

Examples

In the following examples you can omit python3 if you change the permissions of vcf2phylip.py to executable.

Example 1: Use default parameters and 4 threads:

python3 SnpCountCU.py -v file.vcf.gz -l pop.list  -r chrom.list -o snpcout_common_uniq.out -nt 4

Example 2: Add two new population combinations(G1 and G2) and use frequence threshold 0.8:

python3 SnpCountCU.py -v file.vcf.gz -l pop.list  -r chrom.list -f 0.8 --commom-pop G1:pop1,pop2;G2:pop3,pop4 -o snpcout_common_uniq.out -nt 4

Related Skills

View on GitHub
GitHub Stars5
CategoryProduct
Updated1y ago
Forks0

Languages

Python

Security Score

75/100

Audited on Apr 18, 2024

No findings