OTTERS

A powerful TWAS framework leveraging summary-level reference data

Generate Convert Improve

Install / Use

/learn @daiqile96/OTTERS

About this skill

Quality Score

0/100

README

OTTERS: Omnibus Transcriptome Test using Expression Reference Summary data

OTTERS Framework

A powerful TWAS framework leveraging summary-level reference data.

We provide an integrated tool that can

perform LD-clumping
prepare the required inputs (LD reference and eQTL summary statistics) in the required formats for P+T, lassosum, SDPR, and PRS-CS
perform TWAS Stage I: run P+T, lassosum, SDPR, and PRS-CS to train eQTL weights.
perform TWAS Stage II: gene-based association test using GWAS summary statistics.

In this tool, we

integrate TABIX and PLINK 1.9 tools to extract input data per target gene more efficiently
integrate PRS-CS software, lassosum R package, and SDPR software to train eQTL weights
enable parallel computation to train GReX imputation models / perform gene-based association test simultaneously for multiple genes.

Latest Updates

January 14, 2025: Added `--lassosum_ld_blocks` argument

Purpose: Select LD blocks for lassosum analysis
Available options (from Berisa & Pickrell (2015)):
- EUR.hg19, ASN.hg19, AFR.hg19
- EUR.hg38, ASN.hg38, AFR.hg38
Default: EUR.hg38

March 29, 2025: Added support for sex chromosomes (`--chrom=X` and `--chrom=Y`)

Note: In OTTERS outputs:
- Chromosome X → labeled as 23
- Chromosome Y → labeled as 24
Example resources:
- ChromX Analysis Script
  - data to run this example analysis

March 31, 2025: Added VCF/VCF.gz support for LD reference panels

New --geno_type options: vcf
Default remains: plink (binary PLINK format)
Note:
- Require bgzip and tabixed vcf files, e.g. Exp_geno.vcf.gz and Exp_geno.vcf.gz.tbi
- For each gene, OTTERS uses the Tabix tool to extract the relevant region from the VCF, and then calls PLINK to convert the extracted vcf into binary PLINK format. See the functions extract_vcf_region_with_tabix and call_PLINK_extract for details.
Example resources:
- Analysis with VCF.gz genotype data
  - geno_dir=Exp_geno.vcf.gz
  - --geno_type=vcf
  - data to run this example analysis

Getting Started

Download and install following required tools, modules, and packages:

Download OTTERS using:

git clone https://github.com/daiqile96/OTTERS.git

BGZIP and TABIX

Here is my code to download and install BGZIP and TABIX:

wget https://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.6.tar.bz2
tar jxvf tabix-0.2.6.tar.bz2
cd tabix-0.2.6
make

# copy the binary bgzip and tabix to my path: ~/projects/bin
cp bgzip tabix ~/projects/bin

# test if BGZIP and TABIX are successfully installed
export PATH=~/projects/bin:$PATH 
bgzip
tabix

PLINK 1.9

Here is my code to download and install PLINK 1.9:

# Download the latest binary of PLINK 1.9
wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20220402.zip

# Unzip the archive
unzip plink_linux_x86_64_20220402.zip -d ~/projects/bin

# Remove archive
rm plink2_linux_x86_64_latest.zip

# Test if PLINK 1.9 is successfully installed
export PATH=~/projects/bin:$PATH 
plink

To permanently store the path, please add export PATH=~/projects/bin:$PATH to your ~/.bash_profile (or ~/.bashrc).

Then reload file .bash_profile from the command line:
```
source ~/.bash_profile
```
Python modules/libraries:
If you're new to python, here is an example to set up the Python Environment to use OTTERS.
To apply SDPR and lassosum as imputation models, please install:
- SDPR to perform SDPR
  
  Here is my code to download SDPR
```
cd ~/projects/bin
git clone https://github.com/eldronzhou/SDPR.git

# make sure that dynamic libraries gsl/lib/libgsl.so and MKL/lib/libmkl_rt.so are not changed
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/projects/bin/SDPR/MKL/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/projects/bin/SDPR/gsl/lib
```
  Please make sure that dynamic libraries are not changed every time when use SDPR.
- R packages:
  - lassosum to perform lassosum (lassosum requires several R packages, please follow the lassosum installation guidance to install all of them).
  - fdrtool to perform pseudo-validation implemented in lassosum.

Example:

# activate the environment
conda activate otters

# set number of threads to be used
N_THREADS=1

# set up my OTTERS directory and SDPR directory
OTTERS_DIR=/home/qdai8/projects/bin/OTTERS
SDPR_DIR=/home/qdai8/projects/bin/SDPR

# make sure the dynamic libraries of SDPR are not changed (For SDPR)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SDPR_DIR}/MKL/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SDPR_DIR}/gsl/lib

# prevent automatically using  all available cores on a compute node (For SDPR and PRS-CS)
export MKL_NUM_THREADS=$N_THREADS
export NUMEXPR_NUM_THREADS=$N_THREADS
export OMP_NUM_THREADS=$N_THREADS

# Load R to perform lassosum
module load R

# Start to run OTTERS
cd ${OTTERS_DIR}/Example

# Input for OTTERS STAGE I 
# Annotation File 
exp_anno=exp_anno.txt
# Genotype data from LD reference panel
geno_dir=Exp_geno
# eQTL summary statistics 
sst_file=Exp_eQTLSumStats.txt
# Input for OTTERS STAGE II
# GWAS summary statistics 
gwas_sst_file=Exp_GWASSumStats.txt

# Set chromosome number (The example is for chromosome 4)
chr=4
# Set LD-clumping threshold in STAGE I
clump_r2=0.99
# Set output directory for STAGE I
out_dir=Results

# STAGE I
# train eQTL weights using P+T, lassosum, SDPR and PRS-CS. 
# It may take several minutes to complete.
python3 ${OTTERS_DIR}/training.py \
--OTTERS_dir=${OTTERS_DIR} \
--SDPR_dir=${SDPR_DIR} \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--sst_file=${sst_file} \
--out_dir=${out_dir} \
--chrom=${chr} \
--r2=${clump_r2} \
--models=PT,lassosum,SDPR,PRScs \
--lassosum_ld_blocks=EUR.hg38 \
--thread=$N_THREADS

# Set output directory for STAGE II
twas_dir=TWAS

# STAGE II
# gene-based association test using eQTL-weight trained from P+T, lassosum, SDPR and PRS-CS.
python3 ${OTTERS_DIR}/testing.py \
--OTTERS_dir=${OTTERS_DIR} \
--weight_dir=${OTTERS_DIR}/Example/Results \
--models=P0.001,P0.05,lassosum,SDPR,PRScs \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--out_dir=${twas_dir} \
--gwas_file=${gwas_sst_file} \
--chrom=${chr} \
--thread=$N_THREADS

# get imputed genetically regulated gene expression
impute_dir=GReX
# samples to perform imputation
samples=Exp_samples.txt
# imputation
python3 ${OTTERS_DIR}/imputing.py \
--OTTERS_dir=${OTTERS_DIR} \
--weight_dir=${OTTERS_DIR}/Example/Results \
--models=P0.001,P0.05,lassosum,SDPR,PRScs \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--out_dir=${impute_dir} \
--chrom=${chr} \
--samples=${samples} \
--thread=$N_THREADS

Required Inputs

Annotation File should contain following columns with the same name in same order (text file):

| CHROM | GeneStart | GeneEnd | TargetID | |:-----:|:---------:|:-------:|:---------------:| | 1 | 100 | 200 | ENSG0000 |

Example: Example/exp_anno.txt
Genotype data from LD reference panel in PLINK binary files :

Example: Example/Exp_geno.bed; Example/Exp_geno.bim; Example/Exp_geno.fam
eQTL summary statistics should contain following columns with the same name in same order (text file):

Please sort by chromosome and then by SNP position in ascending order

| CHROM | POS | A1 | A2 | Zscore | TargetID | N | |:-----:|:---:|:--:|:--:|:------:|:-----------:|:----:| | 1 | 100 | C | T | 3 | ENSG0000 | 200 |

To convert a two-sided pvalue and beta (coefficients) to Zscore, in python, we can do:
```
import numpy as np
from scipy.stats import norm
Z = np.sign(beta) * abs(norm.ppf(pvalue/2.0))
```
Example: Example/Exp_eQTLSumStats.txt
GWAS summary statistics should contain following columns with the same name in same order (text file):

Not required to be sorted

| CHROM | POS | A1 | A2 | Zscore |
|:-----:|:---:|:--:|:--:|:------:| | 1 | 100 | C | T | 2 |

Example: Example/Exp_GWASSumStats.txt

Notes:

OTTERS is based on [PLINK 1.9](https://www.cog-ge

Related Skills

node-connect

352.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

daiqile96

View profile

View on GitHub

GitHub Stars33

CategoryDevelopment

Updated1mo ago

Forks4

daiqile96/OTTERS

Languages

Python

Security Score

75/100

Audited on Mar 4, 2026

No findings

OTTERS

Install / Use

README

OTTERS: Omnibus Transcriptome Test using Expression Reference Summary data

Latest Updates

January 14, 2025: Added --lassosum_ld_blocks argument

March 29, 2025: Added support for sex chromosomes (--chrom=X and --chrom=Y)

March 31, 2025: Added VCF/VCF.gz support for LD reference panels

Getting Started

Example:

Required Inputs

Notes:

Related Skills

January 14, 2025: Added `--lassosum_ld_blocks` argument

March 29, 2025: Added support for sex chromosomes (`--chrom=X` and `--chrom=Y`)