SkillAgentSearch skills...

OncoGAN

A pipeline that accurately simulates high quality publicly cancer genomes (VCFs, CNAs and SVs).

Install / Use

/learn @LincolnSteinLab/OncoGAN
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

license training simulating fasta2bam zenodo DOI Model on HF Dataset on HF

OncoGAN

A pipeline that accurately simulates high quality publicly cancer genomes (VCFs, CNAs and SVs) for eight different tumor types: Breast-AdenoCa, CNS-PiloAstro, Eso-AdenoCa, Kidney-RCC, Liver-HCC, Lymph-CLL, Panc-Endocrine and Prost-AdenoCa. OncoGAN offers a solution to current challenges in data accessibility and privacy while also serving as a powerful tool for enhancing algorithm development and benchmarking.

In addition to this pipeline, we have released 200 simulated VCFs for each of the eight studied tumor types, and that are availbale on HuggingFace and Zotero.

Index

  1. Installation
  2. Generate synthetic VCFs
  3. Train new models
  4. DeepTumour
  5. Create tumor BAMs

Installation

We have created three docker images with all dependencies installed as there are version incompatibility issues between the different modules:

  • Training -> Environment and scripts used to train OncoGAN models (CUDA)
  • Simulating -> Pipeline for synthetic tumor simulation (CPU only)
  • DeepTumour -> Algorithm developed to detect the tumor type of origin based o somatic mutations (Ref)
  • fasta2bam -> Module to generate FASTQ/BAM files using OncoGAN's output

However, due to the size of the models, they couldn’t be stored in the Docker images and need to be downloaded separately (see Download models section below).

Docker

If you don't have docker already installed in your system, please follow these instructions.

# Training
docker pull oicr/oncogan:training_v0.2

# Simulating
docker pull oicr/oncogan:simulating_v0.2.1

# DeepTumour
docker pull ghcr.io/lincolnsteinlab/deeptumour:3.0

# fasta2bam
docker pull oicr/oncogan:fasta2bam_v0.1

Singularity

If you don't have singularity already installed in your system, please follow these instructions.

# Training
singularity pull docker://oicr/oncogan:training_v0.2

# Simulating
singularity pull docker://oicr/oncogan:simulating_v0.2.1

# DeepTumour
singularity pull docker://ghcr.io/lincolnsteinlab/deeptumour:3.0

# fasta2bam
singularity pull docker://oicr/oncogan:fasta2bam_v0.1

Download models

OncoGAN trained models for the eight tumor types and DeepTumour models can be found on HuggingFace and Zotero.

Generate synthetic VCFs

OncoGAN needs two external inputs to simulate new samples:

  1. The directory with OncoGAN models downloaded previously
  2. hg19 fasta reference genome without the chr prefix

The output is a VCF file (mutations), two TSV files (CNAs and SVs) and a PNG (CNA+SV plot) per donor. Since the PCAWG dataset used for training refers to the hg19 version of the genome, the new mutations are also aligned to that version. The integrated LiftOver version can be used to swicht to hg38.

Tumors with real profiles

# Docker command
docker run --rm -u $(id -u):$(id -g) \
           -v $(pwd):/home \
           -v /PATH_TO_HG19_DIR/:/reference \
           -v /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
           -it oicr/oncogan:simulating_v0.2.1 \
           vcfGANerator -n 1 --tumor Breast-AdenoCa -r /reference/hs37d5.fa [--hg38]

# Singularity command
singularity exec -H ${pwd}:/home \
            -B /PATH_TO_HG19_DIR/:/reference \
            -B /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
            /PATH_TO/oncogan_simulating_v0.2.1.sif launcher.py \
            vcfGANerator -n 1 --tumor Breast-AdenoCa -r /reference/hs37d5.fa [--hg38]

The options for the vcfGANerator function are:

vcfGANerator --help

# Command to simulate mutations (VCF), CNAs and SVs for different tumor types using a GAN model

# Options:
#   -@, --cpus INTEGER      Number of CPUs to use  [default: 1]
#   --tumor TEXT            Tumor type to be simulated. Run 'availTumors'
#                           subcommand to check the list of available tumors that
#                           can be simulated  [required]
#   -n, --nCases INTEGER    Number of cases to simulate  [default: 1]
#   --NinT FLOAT            Normal in Tumor contamination to be taken into account when 
#                           adjusting VAF for CNA-SV events (e.g. 0.20 = 20%) [default: 0.0]
#   -r, --refGenome PATH    hg19 reference genome in fasta format  [required]
#   --prefix TEXT           Prefix to name the output. If not, '--tumor' option is
#                           used as prefix
#   --outDir DIRECTORY      Directory where save the simulations. Default is the
#                           current directory
#   --hg38                  Transform the mutations to hg38
#   --mut / --no-mut        Simulate mutations  [default: mut]
#   --CNA-SV / --no-CNA-SV  Simulate CNA and SV events  [default: CNA-SV]
#   --plots / --no-plots    Save plots  [default: plots]
#   --version               Show the version and exit
#   --help                  Show this message and exit

Tumors with custom profiles

To generate tumors with custom profiles, users can use the template, which contains a list of possible mutation types and signatures to simulate. If no CNA-SV are required, the cna-sv profile can be set to -.

# Docker command
docker run --rm -u $(id -u):$(id -g) \
           -v $(pwd):/home \
           -v /PATH_TO_HG19_DIR/:/reference \
           -v /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
           -it oicr/oncogan:simulating_v0.2.1 \
           vcfGANerator-custom --template /home/template_custom_simulation.csv -r /reference/hs37d5.fa [--hg38]

# Singularity command
singularity exec -H ${pwd}:/home \
            -B /PATH_TO_HG19_DIR/:/reference \
            -B /PATH_TO_ONCOGAN_MODELS/:/oncoGAN/trained_models \
            /PATH_TO/oncogan_simulating_v0.2.1.sif launcher.py \
            vcfGANerator-custom --template /home/template_custom_simulation.csv -r /reference/hs37d5.fa [--hg38]

The options for the vcfGANerator-custom function are:

vcfGANerator-custom --help

# Command to simulate mutations (VCF), CNAs and SVs for personalized tumors using a GAN model

# Options:
#   -@, --cpus INTEGER      Number of CPUs to use  [default: 1]
#   --template PATH         File in CSV format with the number of each type of
#                           mutation to simulate for each donor (template
#                           available on GitHub)  [required]
#   -r, --refGenome PATH    hg19 reference genome in fasta format  [required]
#   --outDir DIRECTORY      Directory where save the simulations. Default is the
#                           current directory
#   --hg38                  Transform the mutations to hg38
#   --CNA-SV / --no-CNA-SV  Simulate CNA and SV events  [default: CNA-SV]
#   --plots / --no-plots    Save plots  [default: plots]
#   --version               Show the version and exit
#   --help                  Show this message and exit

Among all the options offered by docker (docker run --help), we recommend:

  • --rm: Automatically remove the container when it exits.
  • -u, --user: Specify the user ID and its group ID. It's useful to not run the pipeline as root.
  • -v, --volume: Mount local volumes in the container.
    • With the option -v $(pwd):/home/, OncoGAN results will be in your current directory.
  • -i, --interactive: Keep STDIN open even if not attached.
  • -t, --tty: Allocate a pseudo-TTY. When combined with -i it allows you to connect your terminal with the container terminal.

For singularity, the -H and -B options are analogous to -v docker option.

More options

List of available tumors:

docker run --rm -it oicr/oncogan:simulating_v0.2.1 availTumors

# or 

singularity exec /PATH_TO/oncogan_simulating_v0.2.1.sif launcher.py availTumors

# This is the list of available tumor types that can be simulated using OncoGAN:
# Breast-AdenoCa          CNS-PiloAstro           Eso-AdenoCa             Kidney-RCC              
# Liver-HCC               Lymph-CLL               Panc-Endocrine          Prost-AdenoCA

Train new models

Files used to train OncoGAN models can be found HuggingFace and [Zotero](h

Related Skills

View on GitHub
GitHub Stars35
CategoryDevelopment
Updated1mo ago
Forks4

Languages

Python

Security Score

90/100

Audited on Feb 5, 2026

No findings