OTTERS
A powerful TWAS framework leveraging summary-level reference data
Install / Use
/learn @daiqile96/OTTERSREADME
OTTERS: Omnibus Transcriptome Test using Expression Reference Summary data

A powerful TWAS framework leveraging summary-level reference data.
We provide an integrated tool that can
- perform LD-clumping
- prepare the required inputs (LD reference and eQTL summary statistics) in the required formats for P+T, lassosum, SDPR, and PRS-CS
- perform TWAS Stage I: run P+T, lassosum, SDPR, and PRS-CS to train eQTL weights.
- perform TWAS Stage II: gene-based association test using GWAS summary statistics.
In this tool, we
- integrate TABIX and PLINK 1.9 tools to extract input data per target gene more efficiently
- integrate PRS-CS software, lassosum R package, and SDPR software to train eQTL weights
- enable parallel computation to train GReX imputation models / perform gene-based association test simultaneously for multiple genes.
Latest Updates
January 14, 2025: Added --lassosum_ld_blocks argument
- Purpose: Select LD blocks for lassosum analysis
- Available options (from Berisa & Pickrell (2015)):
EUR.hg19,ASN.hg19,AFR.hg19EUR.hg38,ASN.hg38,AFR.hg38
- Default:
EUR.hg38
March 29, 2025: Added support for sex chromosomes (--chrom=X and --chrom=Y)
- Note: In OTTERS outputs:
- Chromosome X → labeled as 23
- Chromosome Y → labeled as 24
- Example resources:
March 31, 2025: Added VCF/VCF.gz support for LD reference panels
- New
--geno_typeoptions:vcf - Default remains:
plink(binary PLINK format) - Note:
- Require bgzip and tabixed vcf files, e.g. Exp_geno.vcf.gz and Exp_geno.vcf.gz.tbi
- For each gene, OTTERS uses the Tabix tool to extract the relevant region from the VCF, and then calls PLINK to convert the extracted vcf into binary PLINK format. See the functions extract_vcf_region_with_tabix and call_PLINK_extract for details.
- Example resources:
- Analysis with VCF.gz genotype data
geno_dir=Exp_geno.vcf.gz--geno_type=vcf- data to run this example analysis
- Analysis with VCF.gz genotype data
Getting Started
Download and install following required tools, modules, and packages:
-
Download OTTERS using:
git clone https://github.com/daiqile96/OTTERS.git -
Here is my code to download and install BGZIP and TABIX:
wget https://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.6.tar.bz2 tar jxvf tabix-0.2.6.tar.bz2 cd tabix-0.2.6 make # copy the binary bgzip and tabix to my path: ~/projects/bin cp bgzip tabix ~/projects/bin # test if BGZIP and TABIX are successfully installed export PATH=~/projects/bin:$PATH bgzip tabix -
Here is my code to download and install PLINK 1.9:
# Download the latest binary of PLINK 1.9 wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20220402.zip # Unzip the archive unzip plink_linux_x86_64_20220402.zip -d ~/projects/bin # Remove archive rm plink2_linux_x86_64_latest.zip # Test if PLINK 1.9 is successfully installed export PATH=~/projects/bin:$PATH plink -
To permanently store the path, please add
export PATH=~/projects/bin:$PATHto your ~/.bash_profile (or ~/.bashrc).Then reload file .bash_profile from the command line:
source ~/.bash_profile -
Python modules/libraries:
If you're new to python, here is an example to set up the Python Environment to use OTTERS.
-
To apply SDPR and lassosum as imputation models, please install:
-
SDPR to perform SDPR
Here is my code to download SDPR
cd ~/projects/bin git clone https://github.com/eldronzhou/SDPR.git # make sure that dynamic libraries gsl/lib/libgsl.so and MKL/lib/libmkl_rt.so are not changed export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/projects/bin/SDPR/MKL/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/projects/bin/SDPR/gsl/libPlease make sure that dynamic libraries are not changed every time when use SDPR.
-
R packages:
-
Example:
# activate the environment
conda activate otters
# set number of threads to be used
N_THREADS=1
# set up my OTTERS directory and SDPR directory
OTTERS_DIR=/home/qdai8/projects/bin/OTTERS
SDPR_DIR=/home/qdai8/projects/bin/SDPR
# make sure the dynamic libraries of SDPR are not changed (For SDPR)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SDPR_DIR}/MKL/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SDPR_DIR}/gsl/lib
# prevent automatically using all available cores on a compute node (For SDPR and PRS-CS)
export MKL_NUM_THREADS=$N_THREADS
export NUMEXPR_NUM_THREADS=$N_THREADS
export OMP_NUM_THREADS=$N_THREADS
# Load R to perform lassosum
module load R
# Start to run OTTERS
cd ${OTTERS_DIR}/Example
# Input for OTTERS STAGE I
# Annotation File
exp_anno=exp_anno.txt
# Genotype data from LD reference panel
geno_dir=Exp_geno
# eQTL summary statistics
sst_file=Exp_eQTLSumStats.txt
# Input for OTTERS STAGE II
# GWAS summary statistics
gwas_sst_file=Exp_GWASSumStats.txt
# Set chromosome number (The example is for chromosome 4)
chr=4
# Set LD-clumping threshold in STAGE I
clump_r2=0.99
# Set output directory for STAGE I
out_dir=Results
# STAGE I
# train eQTL weights using P+T, lassosum, SDPR and PRS-CS.
# It may take several minutes to complete.
python3 ${OTTERS_DIR}/training.py \
--OTTERS_dir=${OTTERS_DIR} \
--SDPR_dir=${SDPR_DIR} \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--sst_file=${sst_file} \
--out_dir=${out_dir} \
--chrom=${chr} \
--r2=${clump_r2} \
--models=PT,lassosum,SDPR,PRScs \
--lassosum_ld_blocks=EUR.hg38 \
--thread=$N_THREADS
# Set output directory for STAGE II
twas_dir=TWAS
# STAGE II
# gene-based association test using eQTL-weight trained from P+T, lassosum, SDPR and PRS-CS.
python3 ${OTTERS_DIR}/testing.py \
--OTTERS_dir=${OTTERS_DIR} \
--weight_dir=${OTTERS_DIR}/Example/Results \
--models=P0.001,P0.05,lassosum,SDPR,PRScs \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--out_dir=${twas_dir} \
--gwas_file=${gwas_sst_file} \
--chrom=${chr} \
--thread=$N_THREADS
# get imputed genetically regulated gene expression
impute_dir=GReX
# samples to perform imputation
samples=Exp_samples.txt
# imputation
python3 ${OTTERS_DIR}/imputing.py \
--OTTERS_dir=${OTTERS_DIR} \
--weight_dir=${OTTERS_DIR}/Example/Results \
--models=P0.001,P0.05,lassosum,SDPR,PRScs \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--out_dir=${impute_dir} \
--chrom=${chr} \
--samples=${samples} \
--thread=$N_THREADS
Required Inputs
-
Annotation File should contain following columns with the same name in same order (text file):
| CHROM | GeneStart | GeneEnd | TargetID | |:-----:|:---------:|:-------:|:---------------:| | 1 | 100 | 200 | ENSG0000 |
Example: Example/exp_anno.txt
-
Genotype data from LD reference panel in PLINK binary files :
Example: Example/Exp_geno.bed; Example/Exp_geno.bim; Example/Exp_geno.fam
-
eQTL summary statistics should contain following columns with the same name in same order (text file):
Please sort by chromosome and then by SNP position in ascending order
| CHROM | POS | A1 | A2 | Zscore | TargetID | N | |:-----:|:---:|:--:|:--:|:------:|:-----------:|:----:| | 1 | 100 | C | T | 3 | ENSG0000 | 200 |
To convert a two-sided pvalue and beta (coefficients) to Zscore, in python, we can do:
import numpy as np from scipy.stats import norm Z = np.sign(beta) * abs(norm.ppf(pvalue/2.0))Example: Example/Exp_eQTLSumStats.txt
-
GWAS summary statistics should contain following columns with the same name in same order (text file):
Not required to be sorted
| CHROM | POS | A1 | A2 | Zscore |
|:-----:|:---:|:--:|:--:|:------:| | 1 | 100 | C | T | 2 |Example: Example/Exp_GWASSumStats.txt
Notes:
- OTTERS is based on [PLINK 1.9](https://www.cog-ge
Related Skills
node-connect
352.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
