AmpliconArchitect (AA)

GitHub release (latest by date)

AmpliconArchitect is best used through AmpliconSuite-pipeline

Installation instructions for AmpliconArchitect are provided here, but to prepare the inputs, invoke AA and classify the outputs, please do so by using AmpliconSuite-pipeline.

Recent updates:

July 2023 updates

1.3.r6 adds multiple new features:
- --sv_vcf argument which allows users to augment AA's SV detection with their own SV calls provided in a VCF format.
- Automated protection against improperly-formatted inputs
- Reduces bugs created when AA is rerun into the same directory with existing files having the same sample name but different input files.
- Bugfix for edge case where AA does not properly expand a newly discovered interval if a discovered SV lands exactly on the endpoint of the explored interval.

March 2023 updates:

1.3.r5 provides better compatibility with the AmpliconSuite-pipeline Singularity image and versions of Mosek installed via pip/conda.
1.3.r4 adds a bugfix to coverage plotting, some code reorganization to provide a modest speedup (approx 20% in the average case), automatic testing of the MOSEK license status, and better handling of the coverage stats lookup file.

January 2023 update:

Version 1.3.r3 adds support for Mosek versions 9 and 10. Many thanks to the Mosek team for adding these changes (especially Michal Adamaszek). Our testing revealed that usage of different Mosek versions will slightly change AA copy number estimates between versions (typical difference < 0.02 copies). 1.3.r3 makes text objects in the PDF amplicon plots editable - as a text object instead of text outline (thank you to Kaiyuan Zhu for proposing this improvement). Now adjusting font type and size on AA output figures can be done with much more ease. This update also adds improvements to cached coverage stats lookup and more control when using downsample.py manually.

Older update descriptions are available here.

Introduction

Focal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. Proposed mechanisms for focal amplification include extrachromosomal DNA (ecDNA) formation, breakage-fusion-bridge (BFB) mechanism, tandem duplications, chromothripsis and others. Focally amplified regions are often hotspots for genomic rearrangements. As a result, the focally amplified region may undergo rapid copy number changes and the structure of the focally amplified region may evolve over time contributing to tumor evolution. Furthermore, ecDNA elements may reintegrate back into the genome to form HSRs. The inter-cell heterogeneity in copy number of ecDNA as well as the interchangeability between ecDNA and HSR may allow the tumor to adapt to changing environment, e.g. targetted drug application. As a result, understanding the architecture of the focal amplifications is important to gain insights into cancer biology. AmpliconArchitect (AA) is a tool which can reconstruct the structure of focally amplified regions (>10kbp) in a cancer sample using whole genome sequence short paired-end data.

Please check out the detailed guide on running AA available here to learn about best practices and see some FAQs.

AmpliconArchitect was originally developed by Viraj Deshpande, and is maintained by Jens Luebeck, Viraj Deshpande, and others in Vineet Bafna's lab. A full description of the method can be found in the following publication:

Deshpande, V. et al., Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 10, 392 (2019). PMID: 30674876. (Article)

Recommended way to run AA: AmpliconSuite-pipeline

We provide an end-to-end wrapper, which supports entry from any intermediate step, so users may start with fastqs, or a bam file, and the wrapper enables generation of the CNV calls and amplicon seed regions before running AA. After invoking AA, AmpliconSuite-pipeline calls AmpliconClassifier to enable predictions of ecDNA status, and other modes of focal amplification. AmpliconSuite-pipeline is available at https://github.com/AmpliconSuite/AmpliconSuite-pipeline.

Importantly, AmpliconSuite-pipeline uses all our recommended best practices, and simplifies both upstream preparation and downstream interpretation of results. We strongly recommend AmpliconSuite-pipeline be used to invoke AmpliconArchitect.

Singularity and Docker images containing AmpliconArchitect can be found on the AmpliconSuite-pipeline GitHub page

Installation-free ways to use AA (via AmpliconSuite-pipeline):

- GenePattern Web Interface

In collaboration with the GenePattern team, AmpliconSuite-pipeline can now be used from your web browser. No tool installation required. Visit https://genepattern.ucsd.edu/ to register. After registering and signing in, search for the "AmpliconSuite" module.

- Nextflow

AmpliconSuite can also be run through Nextflow, using the nf-core/circdna pipeline constructed by Daniel Schreyer.

Installation

AA can be installed in three ways:

Conda installation of AmpliconSuite-pipeline (which includes AA and all recommended modules).
Obtain a containerized image of AmpliconSuite-pipeline (Docker or Singularity).
Manual installation from GitHub source code and manual management of dependencies.

Option 1: Conda

Follow the instructions here.

Option 2: Containerized images

Follow the instructions here.

Option 3: Standalone installation

AmpliconSuite-pipeline (including AmpliconArchitect and AmpliconClassifier modules) can be installed manually following the instructions here.

AmpliconArchitect can be installed as a standalone module using the instructions here. Note that this is not recommended, as it removes AA from the modules that prepare and filter the input bed file. Failure to properly filter inputs can lead to extreme runtimes and false-positive calls.

Setting up the AA data repo

This is required regardless of the installation option selected above

To set annotations directory and environment variable AA_DATA_REPO:

mkdir -p data_repo
echo export AA_DATA_REPO=$PWD/data_repo >> ~/.bashrc
cd $AA_DATA_REPO && touch coverage.stats && chmod a+r coverage.stats
source ~/.bashrc

Download and uncompress AA data repo files matching the reference genome(s) needed. Data repo files are available here: https://datasets.genepattern.org/?prefix=data/module_support_files/AmpliconArchitect.

cd $AA_DATA_REPO
wget [url for data repo [hg19/GRCh37/GRCh38/mm10].tar.gz]
tar -xzf [hg19/GRCh37/GRCh38/mm10].tar.gz

Available data repo annotations:

hg19
GRCh37
GRCh38 (hg38)
GRCh38_viral (includes oncoviral sequences)
mm10 (GRCm38)

On the data repo download page, the suffix indexed indicates the BWA index is packaged as well, which is only needed if also using the packaged fasta for alignment.

Running AmpliconArchitect

Please see the example commands here.

AmpliconArchitect output files and command-line arguments

Outputs

AA generates informative output at each step in the algorithm (details below):

Summary file: List of amplicons and corresponding intervals are listed in a summary file.
SV view: A PNG/PDF image for each amplicon displaying all rearrangement signatures. Underlying data is provided in text format as intermediate files.
Graph file: For each amplicon, a text file describing the graph and predicted copy count.
Cycles file: For each amplicon: a text file describing the list of simple cycles predicted.
Cycle view: A web interface with operations for visualizing and modifying the simple cycles.

The user may provide intermediate files as a way to either kickstart AA from an intermediate step or to use alternative intermediate data (e.g. from external tools) for reconstruction.

Required Arguments

| Argument | Type | Description | | ---------- | ---- |--

AmpliconArchitect

Install / Use

README