Exogene

A workflow for identifying viral integrations in both short and long read data

Generate Convert Improve

Install / Use

/learn @zstephens/Exogene

About this skill

Quality Score

0/100

README

Exogene

A workflow for detecting viral integrations from both short read and long read sequencing data.

usage

docker pull zstephens/exogene:v15

docker run -it -v ${HOME}:${HOME} zstephens/exogene:v15

Create human + viral reference sequence:

(from inside the container)

/home/init_ref.sh \
-i /path/to/hg38.fa \
-o /path/to/hg38_plus_viral.fa

Using custom viral references:

If you wish to use viral reference sequences different than what Exogene uses by default, you can use the -v input option to specify a fasta file of viral genomes. Exogene expects that contigs are named in the following format:

>accession_id full_name (space delimited)

For example: >NC_009334 Human herpesvirus 4, complete genome.

Additionally, the viral fasta should be indexed using bwa index

Running Exogene-SR (with BAM input)

/home/Exogene-SR.sh \
-b input.bam \
-r hg38_plus_viral.fa \
-o outDir/

If custom viral sequences were used, the -v input option will be required.

Running Exogene-SR (with FQ input)

/home/Exogene-SR.sh \
-f1 read1.fq.gz \
-f2 read2.fq.gz \
-r hg38_plus_viral.fa \
-o outDir/

Input FASTQ files must be gzipped. Currently Exogene-SR does not support single-end reads. If custom viral sequences were used, the -v input option will be required.

Running Exogene-LR (with FASTQ input, e.g. PacBio HiFi reads)

/home/Exogene-LR.sh \
-f input.fq.gz \
-r hg38_plus_viral.fa \
-m hifi \
-o outDir/

Running Exogene-LR (with FASTA input, e.g. PacBio CLR reads)

/home/Exogene-LR.sh \
-f input.fa.gz \
-r hg38_plus_viral.fa \
-m clr \
-o outDir/

Running Exogene-LR (with BAM input)

/home/Exogene-LR.sh \
-b input.bam \
-r hg38_plus_viral.fa \
-m [hifi/clr] \
-o outDir/

Intersecting Exogene-SR and Exogene-LR results:

python /home/combine_reports.py \
-s Viral_Reads_Report.tsv \
-l Viral_Junctions_LongReads.tsv \
-o combined_report_outDir/ \
-ms minimum_number_of_softclipped_reads_per_site [1] \
-md minimum_number_of_discordant_pairs_per_site [5]

Either -s or -l must be specified (or both, for a combined report). Viral_Reads_Report.tsv is created in the output directory of Exogene-SR, Viral_Junctions_LongReads.tsv is created in the output directory of Exogene-LR.

Test Data:

The Docker container contains a small quantity of test data which can be processed as follows:

/home/Exogene-SR.sh \
-f1 /home/test_data/SRR3104446_1.fq.gz \
-f2 /home/test_data/SRR3104446_2.fq.gz \
-r /path/to/hg38_and_viral.fa \
-o /path/to/out_SR/

/home/Exogene-LR.sh \
-f /home/test_data/a1el_ccs.fq.gz \
-r /path/to/hg38_and_viral.fa \
-m hifi \
-o /path/to/out_LR/

For the included hg38+viral reference, the bwa/pbmm2 alignment steps require ~32GB of memory.

Related Skills

node-connect

352.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。