PoolSeq
Analyze and simulate Pool-Seq time series data
Install / Use
/learn @ThomasTaus/PoolSeqREADME
poolSeq
poolSeq is an R-package that allows you to analyze and simulate Pool-Seq time-series data. Its functionality includes estimation of the effective population size and quantification of selective strength, as well as dominance. Besides simulating Pool-Seq data under a specific scenario, you can also load empirical data into R using the common sync-file format (see PoPoolation2).
Please cite the related publication: Taus T, Futschik A and Schlötterer C. 2017. Quantifying selection with pool-Seq data. Mol. Bio. Evol. 34:3023-3034.
Installation
Before installing poolSeq you need to make sure that all the dependencies are available:
- R (>= 3.3.1)
- data.table (>= 1.9.4)
- foreach (>= 1.4.2)
- stringi (>= 0.4-1)
- matrixStats (>= 0.14.2)
- Rcpp (>= 1.0.3)
For now you need to install these manually. Once this is done you can proceed by downloading the latest release of poolSeq. After the download you can install poolSeq with the following R command:
install.packages("/Path/To/poolSeq_0.3.0.tar.gz", repos=NULL, type="source")
Usage
Detailled documentation is available for each function of poolSeq, including exemplary code.
?wf.traj
example(wf.traj)
The following sections provide a basic introduction to the core functions of poolSeq.
Read sync-files
Synchronized (sync) files contain allele frequencies at specific genomic loci in multiple populations. Suppose you want to load a sync-file containing allele frequency trajectories of 2 populations (F0.R1, F10.R1, F20.R1, F0.R2, F10.R2, F20.R2). The following command allows you to read such file with poolSeq:
mySync <- read.sync(file="/Path/to/data.sync", gen=c(0, 10, 20, 0, 10, 20), repl=c(1, 1, 1, 2, 2, 2), rising = FALSE)
Once the data is loaded in R you can easily get allele frequency trajectories for a specific genomic position pos on chromosome chr in replicate repl with af.traj(mySync, chr, pos, repl). Alternatively, allele frequencies and sequence coverage of all genomic loci can be obtained for a given replicate and generation using af(sync, repl, gen) or coverage(sync, repl, gen), respectively.
Simulate Pool-Seq time-series data
poolSeq enables you to simulate allele frequency trajectories of unlinked loci under a Wright-Fisher model. Assuming an effective population size Ne of 1,000 diploid individuals, genetic drift at 10,000 unlinked loci with random starting allele frequencies p0 over 20 generations can be simulated like this:
simTraj <- wf.traj(p0=runif(10000, 0, 1), Ne=1000, t=c(0, 10, 20))
Pool-Seq can either be applied using all individuals of a population, or only a subset. Assuming that only 300 out of the 1,000 individuals were selected for Pool-Seq, the resulting sampling noise can be added as follows:
simTraj <- matrix(sample.alleles(simTraj, size=300, mode="individuals", Ncensus=1000), nrow=nrow(simTraj), dimnames=dimnames(simTraj))
Then sampling noise at an average sequencing coverage of 60x can be added, drawing coverage depths from a Poisson distribution:
af <- sample.alleles(simTraj, size=60, mode="coverage")
simTraj <- matrix(af$p.smpld, nrow=nrow(simTraj), dimnames=dimnames(simTraj))
simCov <- matrix(af$size, nrow=nrow(simTraj), dimnames=dimnames(simTraj))
Estimate effective population size
Based on the allele frequency data simulated in the previous examples, Ne can be estimated like this:
estimateNe(p0=simTraj[,"F0"], pt=simTraj[,"F20"], cov0=simCov[,"F0"], covt=simCov[,"F20"], t=20, Ncensus=1000, poolSize=c(300, 300))
Estimate selection parameters
poolSeq also enables you to estimate the selection coefficient from time-series data. In the following example, first an allele frequency trajectory is simulated assuming a selection coefficient s of 0.1 and co-dominance h=0.5. Then s is re-estimated from the simulated data:
simTraj <- wf.traj(p0=0.05, Ne=1000, t=seq(0, 60, by=10), s=0.1, h=0.5)
estimateSH(simTraj, Ne=1000, t=seq(0, 60, by=10), h=0.5)
You can also assess, whether the estimate of s is significantly different from 0
estimateSH(simTraj, Ne=1000, t=seq(0, 60, by=10), h=0.5, simulate.p.value=TRUE)
and compute a confidence interval for the s-estimate:
est <- estimateSH(simTraj, Ne=1000, t=seq(0, 60, by=10), h=0.5)
confint(est)
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
