Radseq
Collection of Python scripts for parsing/analyses of RAD-seq data
Install / Use
/learn @pimbongaerts/RadseqREADME
RAD-seq script library
Collection of Python scripts for parsing/analysis of reduced representation sequencing data (e.g. RAD-seq, nextRAD). While many of the scripts are functional, some still need considerable cleaning up and more thorough testing - and this repository therefore very much represents a work in progress.
These scripts all require Python 3, with some requiring additional packages (BioPython and NumPy - both of which can be easily installed using the Miniconda or Anaconda installers, or PyVCF - which can be installed using e.g. pip install PyVCF). Usage information for each script can be obtained using the -h or --help flag (e.g. python3 name_of_script.py -h, or is also listed in this README.
This documentation is dynamically generated using the listed README_compile.py script, extracting purpose, usage and links to example files from the argparse information of each script.
Recently added
vcf_remap2genome.py - script to remap VCF from de novo RAD assembly back to a reference genome
pyrad_find_caps_markers.py - search PyRAD output file for diagnostic CAPS loci that can distinguish two groups (or one group and all other samples)
vcf_clone_detect.py - script to facilitate identification of clones in dataset
vcf
vcf_remap.py - Remaps variants in VCF format to new CHROM and POS as obtained through the
mapping_get_bwa_matches.py scripts. Positions are rough estimates because:
(1) new position is simply an offset of the mapping position + 0-based
position in locus (and e.g. do not take into account reference insertions),
(2) one standard contig length is used to determine pos in reverse mapping
reads (flag 16). [File did not pass PEP8 check]
usage: vcf_remap.py [-h] vcf_file mapping_file locus_length
positional arguments:
vcf_file vcf input file
mapping_file file with mapping results
locus_length length of query loci
optional arguments:
-h, --help show this help message and exit
Example input file(s): vcf_file.vcf.
vcf_missing_data.py - Outputs list of missing data (# and % of SNPs) for each sample in VCF, to identify poor-performing samples to eliminate prior to SNP filtering. Takes vcf_filename as argument. Outputs to STDOUT (no output file). [File did not pass PEP8 check]
usage: vcf_missing_data.py [-h] vcf_file
positional arguments:
vcf_file input file with SNP data (`.vcf`)
optional arguments:
-h, --help show this help message and exit
Example input file(s): vcf_file.vcf.
vcf_rename_loci.py - Renames CHROMS in .vcf file according to list with old/new names, and only
outputs those loci that are listed. [File did not pass PEP8 check]
usage: vcf_rename_loci.py [-h] vcf_file locusnames_file
positional arguments:
vcf_file input file with SNP data (`.vcf`)
locusnames_file text file (tsv or csv) with old and new name for each locus
(/CHROM)
optional arguments:
-h, --help show this help message and exit
Example input file(s): vcf_file.vcf, locusnames_file.txt.
vcf_find_clones.py - Script compares the allelic similarity of individuals in a VCF, and outputs
all pairwise comparisons. This can be used to detect potential clones based on
percentage match. Note: highest matches can be assessed in the output file by
using $ sort -rn --key=5 output_file.txt | head -n 50 in the terminal. [File did not pass PEP8 check]
usage: vcf_find_clones.py [-h] vcf_file
positional arguments:
vcf_file input file with SNP data (`.vcf`)
optional arguments:
-h, --help show this help message and exit
Example input file(s): vcf_file.vcf.
vcf_get_chrom_pos_from_number.py - Translates sequential marker numbers back to CHROM/POS from original .vcf
file. Several programs only allow for integers to identify markers, this
script is to restore the original CHROM/POS for markers that were identified. [File did not pass PEP8 check]
usage: vcf_get_chrom_pos_from_number.py [-h] vcf_file markernumbers_file
positional arguments:
vcf_file input file with SNP data (`.vcf`)
markernumbers_file text file with SNP numbers that were identified
optional arguments:
-h, --help show this help message and exit
Example input file(s): vcf_file.vcf, markernumbers_file.txt.
vcf_spider.py - Wrapper for PGDspider on Mac OS to convert .vcf files to various formats.
Note : set PGDSPIDER_PATH constant before use, and make script executable in
terminal with $ chmod +x vcf_spider.py.
usage: vcf_spider.py [-h] vcf_filename pop_filename output_filename
positional arguments:
vcf_filename original vcf file
pop_filename pop filename (.txt)
output_filename output filename (extension used to determine file format
(.genepop, .bayescan, .structure or .arlequin)
optional arguments:
-h, --help show this help message and exit
vcf_clone_detect.py - Attempts to identify groups of clones in a dataset. The script (1) conducts
pairwise comparisons (allelic similarity) for all individuals in a .vcf
file, (2) produces a histogram of genetic similarities, (3) lists the highest
matches to assess for a potential clonal threshold, (4) clusters the groups of
clones based on a particular threshold (supplied or roughly inferred), and (5)
lists the clonal individuals that can be removed from the dataset (so that one
individual with the least amount of missing data remains). If optional popfile
is given, then clonal groups are sorted by population. Note: Firstly, the
script is run with a .vcf file and an optional popfile to produce an output
file (e.g. python3 vcf_clone_detect.py.py --vcf vcf_file.vcf --pop pop_file.txt --output compare_file.csv). Secondly, it can be rerun using the
precalculated similarities under different thresholds (e.g. python3 vcf_clone_detect.py.py --input compare_file.csv --threshold 94.5) [File did not pass PEP8 check]
usage: vcf_clone_detect.py [-h] [-v vcf_file] [-p pop_file] [-i compare_file]
[-o compare_file] [-t threshold]
optional arguments:
-h, --help show this help message and exit
-v vcf_file, --vcf vcf_file
input file with SNP data (`.vcf`)
-p pop_file, --pop pop_file
text file (tsv or csv) with individuals and
populations (to accompany `.vcf` file)
-i compare_file, --input compare_file
input file (csv) with previously calculated pairwise
comparisons (using the `--outputfile` option)
-o compare_file, --output compare_file
output file (csv) for all pairwise comparisons (can
later be used as input with `--inputfile`)
-t threshold, --threshold threshold
manual similarity threshold (e.g. `94.5` means at
least 94.5 percent allelic similarity for individuals
to be considered clones)
vcf_minrep_filter_abs.py - Filters .vcf file for SNPs that are genotyped for a minimum number of
individuals in each of the populations (rather than overall proportion of
individuals). This can help to guarantee a minimum number of individuals to
calculate population-based statistics, and eliminate loci that might be
suffering from locus drop-out in particular populations. Note: only
individuals that are listed in popfile are taken into account to determine
number of individuals genotyped (but all indivs are outputted). [File did not pass PEP8 check]
usage: vcf_minrep_filter_abs.py [-h]
vcf_file pop_file min_proportion
output_filename
positional arguments:
vcf_file input file with SNP data (`.vcf`)
pop_file text file (tsv or csv) with individuals and populations
min_proportion proportion of individuals required to be genotyped in each
population for a SNP to be included (e.g `0.8` for 80
percent of individuals)
output_filename name of output file (`.vcf`)
optional arguments:
-h, --help show this help message and exit
Example input file(s): vcf_file.vcf, pop_file.txt.
vcf_minrep_filter.py - Filters .vcf file for SNPs that are genotyped for a minimum proportion of
individuals in each of the populations (rather than overall proportion of
individuals). This can help to guarantee a minimum number of individuals to
calculate population-based statistics, and eliminate loci that might be
suffering from locus drop-out in particular populations. Note: only
individuals that are listed in popfile are taken into account to determine
proportion of individuals genotyped (but all indivs are outputted). [File did not pass PEP8 check]
usage: vcf_minrep_fi
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
99.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
344.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
