AmpliconArchitect
AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Install / Use
/learn @virajbdeshpande/AmpliconArchitectREADME
AmpliconArchitect (AA)
AmpliconArchitect is best used through AmpliconSuite-pipeline
Installation instructions for AmpliconArchitect are provided here, but to prepare the inputs, invoke AA and classify the outputs, please do so by using AmpliconSuite-pipeline.
Recent updates:
July 2023 updates
1.3.r6adds multiple new features:--sv_vcfargument which allows users to augment AA's SV detection with their own SV calls provided in a VCF format.- Automated protection against improperly-formatted inputs
- Reduces bugs created when AA is rerun into the same directory with existing files having the same sample name but different input files.
- Bugfix for edge case where AA does not properly expand a newly discovered interval if a discovered SV lands exactly on the endpoint of the explored interval.
March 2023 updates:
-
1.3.r5provides better compatibility with the AmpliconSuite-pipeline Singularity image and versions of Mosek installed via pip/conda. -
1.3.r4adds a bugfix to coverage plotting, some code reorganization to provide a modest speedup (approx 20% in the average case), automatic testing of the MOSEK license status, and better handling of the coverage stats lookup file.
January 2023 update:
Version 1.3.r3 adds support for Mosek versions 9 and 10. Many thanks to the Mosek team for adding these changes (especially Michal Adamaszek). Our testing revealed that usage of different Mosek versions
will slightly change AA copy number estimates between versions (typical difference < 0.02 copies).
1.3.r3 makes text objects in the PDF amplicon plots editable - as a text object instead of text outline (thank you to Kaiyuan Zhu for proposing this improvement).
Now adjusting font type and size on AA output figures can be done with much more ease.
This update also adds improvements to cached coverage stats lookup and more control when using downsample.py manually.
Older update descriptions are available here.
Introduction
Focal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. Proposed mechanisms for focal amplification include extrachromosomal DNA (ecDNA) formation, breakage-fusion-bridge (BFB) mechanism, tandem duplications, chromothripsis and others. Focally amplified regions are often hotspots for genomic rearrangements. As a result, the focally amplified region may undergo rapid copy number changes and the structure of the focally amplified region may evolve over time contributing to tumor evolution. Furthermore, ecDNA elements may reintegrate back into the genome to form HSRs. The inter-cell heterogeneity in copy number of ecDNA as well as the interchangeability between ecDNA and HSR may allow the tumor to adapt to changing environment, e.g. targetted drug application. As a result, understanding the architecture of the focal amplifications is important to gain insights into cancer biology. AmpliconArchitect (AA) is a tool which can reconstruct the structure of focally amplified regions (>10kbp) in a cancer sample using whole genome sequence short paired-end data.
Please check out the detailed guide on running AA available here to learn about best practices and see some FAQs.
AmpliconArchitect was originally developed by Viraj Deshpande, and is maintained by Jens Luebeck, Viraj Deshpande, and others in Vineet Bafna's lab. A full description of the method can be found in the following publication:
Deshpande, V. et al., Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 10, 392 (2019). PMID: 30674876. (Article)
Table of contents:
- AmpliconSuite-pipeline
- Installation
- Usage
- AA outputs and arguments
- The AA Algorithm
- Checkpointing and modular integration with other tools
Recommended way to run AA: AmpliconSuite-pipeline
We provide an end-to-end wrapper, which supports entry from any intermediate step, so users may start with fastqs, or a bam file, and the wrapper enables generation of the CNV calls and amplicon seed regions before running AA. After invoking AA, AmpliconSuite-pipeline calls AmpliconClassifier to enable predictions of ecDNA status, and other modes of focal amplification. AmpliconSuite-pipeline is available at https://github.com/AmpliconSuite/AmpliconSuite-pipeline.
Importantly, AmpliconSuite-pipeline uses all our recommended best practices, and simplifies both upstream preparation and downstream interpretation of results. We strongly recommend AmpliconSuite-pipeline be used to invoke AmpliconArchitect.
Singularity and Docker images containing AmpliconArchitect can be found on the AmpliconSuite-pipeline GitHub page
Installation-free ways to use AA (via AmpliconSuite-pipeline):
- GenePattern Web Interface
In collaboration with the GenePattern team, AmpliconSuite-pipeline can now be used from your web browser. No tool installation required. Visit https://genepattern.ucsd.edu/ to register. After registering and signing in, search for the "AmpliconSuite" module.
- Nextflow
AmpliconSuite can also be run through Nextflow, using the nf-core/circdna pipeline constructed by Daniel Schreyer.
Installation
AA can be installed in three ways:
- Conda installation of AmpliconSuite-pipeline (which includes AA and all recommended modules).
- Obtain a containerized image of AmpliconSuite-pipeline (Docker or Singularity).
- Manual installation from GitHub source code and manual management of dependencies.
Option 1: Conda
Option 2: Containerized images
Option 3: Standalone installation
AmpliconSuite-pipeline (including AmpliconArchitect and AmpliconClassifier modules) can be installed manually following the instructions here.
AmpliconArchitect can be installed as a standalone module using the instructions here. Note that this is not recommended, as it removes AA from the modules that prepare and filter the input bed file. Failure to properly filter inputs can lead to extreme runtimes and false-positive calls.
Setting up the AA data repo
This is required regardless of the installation option selected above
- To set annotations directory and environment variable
AA_DATA_REPO:
mkdir -p data_repo
echo export AA_DATA_REPO=$PWD/data_repo >> ~/.bashrc
cd $AA_DATA_REPO && touch coverage.stats && chmod a+r coverage.stats
source ~/.bashrc
- Download and uncompress AA data repo files matching the reference genome(s) needed. Data repo files are available here: https://datasets.genepattern.org/?prefix=data/module_support_files/AmpliconArchitect.
cd $AA_DATA_REPO
wget [url for data repo [hg19/GRCh37/GRCh38/mm10].tar.gz]
tar -xzf [hg19/GRCh37/GRCh38/mm10].tar.gz
Available data repo annotations:
- hg19
- GRCh37
- GRCh38 (hg38)
- GRCh38_viral (includes oncoviral sequences)
- mm10 (GRCm38)
On the data repo download page, the suffix indexed indicates the BWA index is packaged as well, which is only needed if also using the packaged fasta for alignment.
Running AmpliconArchitect
Please see the example commands here.
AmpliconArchitect output files and command-line arguments
Outputs
AA generates informative output at each step in the algorithm (details below):
- Summary file: List of amplicons and corresponding intervals are listed in a summary file.
- SV view: A PNG/PDF image for each amplicon displaying all rearrangement signatures. Underlying data is provided in text format as intermediate files.
- Graph file: For each amplicon, a text file describing the graph and predicted copy count.
- Cycles file: For each amplicon: a text file describing the list of simple cycles predicted.
- Cycle view: A web interface with operations for visualizing and modifying the simple cycles.
The user may provide intermediate files as a way to either kickstart AA from an intermediate step or to use alternative intermediate data (e.g. from external tools) for reconstruction.
Required Arguments
| Argument | Type | Description | | ---------- | ---- |--
