OncoGAN
A pipeline that accurately simulates high quality publicly cancer genomes (VCFs, CNAs and SVs).
Install / Use
/learn @LincolnSteinLab/OncoGANREADME
OncoGAN
A pipeline that accurately simulates high quality publicly cancer genomes (VCFs, CNAs and SVs) for eight different tumor types: Breast-AdenoCa, CNS-PiloAstro, Eso-AdenoCa, Kidney-RCC, Liver-HCC, Lymph-CLL, Panc-Endocrine and Prost-AdenoCa. OncoGAN offers a solution to current challenges in data accessibility and privacy while also serving as a powerful tool for enhancing algorithm development and benchmarking.
In addition to this pipeline, we have released 200 simulated VCFs for each of the eight studied tumor types, and that are availbale on HuggingFace and Zotero.
Index
Installation
We have created three docker images with all dependencies installed as there are version incompatibility issues between the different modules:
- Training -> Environment and scripts used to train OncoGAN models (CUDA)
- Simulating -> Pipeline for synthetic tumor simulation (CPU only)
- DeepTumour -> Algorithm developed to detect the tumor type of origin based o somatic mutations (Ref)
- fasta2bam -> Module to generate FASTQ/BAM files using OncoGAN's output
However, due to the size of the models, they couldn’t be stored in the Docker images and need to be downloaded separately (see Download models section below).
Docker
If you don't have docker already installed in your system, please follow these instructions.
# Training
docker pull oicr/oncogan:training_v0.2
# Simulating
docker pull oicr/oncogan:simulating_v0.2.1
# DeepTumour
docker pull ghcr.io/lincolnsteinlab/deeptumour:3.0
# fasta2bam
docker pull oicr/oncogan:fasta2bam_v0.1
Singularity
If you don't have singularity already installed in your system, please follow these instructions.
# Training
singularity pull docker://oicr/oncogan:training_v0.2
# Simulating
singularity pull docker://oicr/oncogan:simulating_v0.2.1
# DeepTumour
singularity pull docker://ghcr.io/lincolnsteinlab/deeptumour:3.0
# fasta2bam
singularity pull docker://oicr/oncogan:fasta2bam_v0.1
Download models
OncoGAN trained models for the eight tumor types and DeepTumour models can be found on HuggingFace and Zotero.
Generate synthetic VCFs
OncoGAN needs two external inputs to simulate new samples:
- The directory with OncoGAN models downloaded previously
- hg19 fasta reference genome without the chr prefix
The output is a VCF file (mutations), two TSV files (CNAs and SVs) and a PNG (CNA+SV plot) per donor. Since the PCAWG dataset used for training refers to the hg19 version of the genome, the new mutations are also aligned to that version. The integrated LiftOver version can be used to swicht to hg38.
Tumors with real profiles
# Docker command
docker run --rm -u $(id -u):$(id -g) \
-v $(pwd):/home \
-v /PATH_TO_HG19_DIR/:/reference \
-v /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
-it oicr/oncogan:simulating_v0.2.1 \
vcfGANerator -n 1 --tumor Breast-AdenoCa -r /reference/hs37d5.fa [--hg38]
# Singularity command
singularity exec -H ${pwd}:/home \
-B /PATH_TO_HG19_DIR/:/reference \
-B /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
/PATH_TO/oncogan_simulating_v0.2.1.sif launcher.py \
vcfGANerator -n 1 --tumor Breast-AdenoCa -r /reference/hs37d5.fa [--hg38]
The options for the vcfGANerator function are:
vcfGANerator --help
# Command to simulate mutations (VCF), CNAs and SVs for different tumor types using a GAN model
# Options:
# -@, --cpus INTEGER Number of CPUs to use [default: 1]
# --tumor TEXT Tumor type to be simulated. Run 'availTumors'
# subcommand to check the list of available tumors that
# can be simulated [required]
# -n, --nCases INTEGER Number of cases to simulate [default: 1]
# --NinT FLOAT Normal in Tumor contamination to be taken into account when
# adjusting VAF for CNA-SV events (e.g. 0.20 = 20%) [default: 0.0]
# -r, --refGenome PATH hg19 reference genome in fasta format [required]
# --prefix TEXT Prefix to name the output. If not, '--tumor' option is
# used as prefix
# --outDir DIRECTORY Directory where save the simulations. Default is the
# current directory
# --hg38 Transform the mutations to hg38
# --mut / --no-mut Simulate mutations [default: mut]
# --CNA-SV / --no-CNA-SV Simulate CNA and SV events [default: CNA-SV]
# --plots / --no-plots Save plots [default: plots]
# --version Show the version and exit
# --help Show this message and exit
Tumors with custom profiles
To generate tumors with custom profiles, users can use the template, which contains a list of possible mutation types and signatures to simulate. If no CNA-SV are required, the cna-sv profile can be set to -.
# Docker command
docker run --rm -u $(id -u):$(id -g) \
-v $(pwd):/home \
-v /PATH_TO_HG19_DIR/:/reference \
-v /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
-it oicr/oncogan:simulating_v0.2.1 \
vcfGANerator-custom --template /home/template_custom_simulation.csv -r /reference/hs37d5.fa [--hg38]
# Singularity command
singularity exec -H ${pwd}:/home \
-B /PATH_TO_HG19_DIR/:/reference \
-B /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
/PATH_TO/oncogan_simulating_v0.2.1.sif launcher.py \
vcfGANerator-custom --template /home/template_custom_simulation.csv -r /reference/hs37d5.fa [--hg38]
The options for the vcfGANerator-custom function are:
vcfGANerator-custom --help
# Command to simulate mutations (VCF), CNAs and SVs for personalized tumors using a GAN model
# Options:
# -@, --cpus INTEGER Number of CPUs to use [default: 1]
# --template PATH File in CSV format with the number of each type of
# mutation to simulate for each donor (template
# available on GitHub) [required]
# -r, --refGenome PATH hg19 reference genome in fasta format [required]
# --outDir DIRECTORY Directory where save the simulations. Default is the
# current directory
# --hg38 Transform the mutations to hg38
# --CNA-SV / --no-CNA-SV Simulate CNA and SV events [default: CNA-SV]
# --plots / --no-plots Save plots [default: plots]
# --version Show the version and exit
# --help Show this message and exit
Among all the options offered by docker (docker run --help), we recommend:
--rm: Automatically remove the container when it exits.-u, --user: Specify the user ID and its group ID. It's useful to not run the pipeline as root.-v, --volume: Mount local volumes in the container.- With the option
-v $(pwd):/home/, OncoGAN results will be in your current directory.
- With the option
-i, --interactive: Keep STDIN open even if not attached.-t, --tty: Allocate a pseudo-TTY. When combined with-iit allows you to connect your terminal with the container terminal.
For singularity, the -H and -B options are analogous to -v docker option.
More options
List of available tumors:
docker run --rm -it oicr/oncogan:simulating_v0.2.1 availTumors
# or
singularity exec /PATH_TO/oncogan_simulating_v0.2.1.sif launcher.py availTumors
# This is the list of available tumor types that can be simulated using OncoGAN:
# Breast-AdenoCa CNS-PiloAstro Eso-AdenoCa Kidney-RCC
# Liver-HCC Lymph-CLL Panc-Endocrine Prost-AdenoCA
Train new models
Files used to train OncoGAN models can be found HuggingFace and [Zotero](h
Related Skills
node-connect
329.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
329.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.1kCommit, push, and open a PR
