SkillAgentSearch skills...

Machina

Framework for Metastatic And Clonal History INtegrative Analysis

Install / Use

/learn @raphael-group/Machina
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MACHINA - Metastatic And Clonal History INtegrative Analysis

MACHINA is a computational framework for inferring migration patterns between a primary tumor and metastases using DNA sequencing data. Overview of MACHINA

Contents

  1. Installation
  2. Usage instructions

<a name="installation"></a>

Installation

<a name="bioconda"></a>

bioconda

  1. Install Anaconda or Miniconda if you do not already have one installed.
  2. (recommended) Create a new conda environment for machina and activate it:
conda create -n machina
conda activate machina
  1. Set up conda channels for bioconda (once per Anaconda/Miniconda installation):
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
  1. Install machina from bioconda:
conda install machina

<a name="compilation"></a>

Manual compilation

Note that binaries for macOS and linux are available here. These binaries require a valid Gurobi installation and license key. License key location can be specified via the environment variable GRB_LICENSE_KEY. In addition, installation of Gurobi in a non-standard location will require updating LD_LIBRARY_PATH (linux) and DYLD_LIBRARY_PATH (macOS).

Also note that to run the below examples, you must either provide the full path to the executable (e.g., /path/to/machina/build/pmh_sankoff) or add the build directory to your PATH.

<a name="dep"></a>

Dependencies

MACHINA is written in C++11 and thus requires a modern C++ compiler (GCC >= 4.8.1, or Clang). In addition, MACHINA has the following dependencies.

Graphviz is required to visualize the resulting DOT files, but is not required for compilation.

Gurobi is a commercial ILP solver with two licensing options: (1) a single-host license where the license is tied to a single computer and (2) a network license for use in a compute cluster. Both options are freely available for users in academia.

In case doxygen is available, extended source code documentation will be generated.

<a name="comp"></a>

Compilation

To compile MACHINA, execute the following commands from the root of the repository:

$ mkdir build
$ cd build
$ cmake ..
$ make

In case CMake fails to detect LEMON or Gurobi, run the following command with adjusted paths:

$ cmake -DLIBLEMON_ROOT=~/lemon \
-DGUROBI_HOME=/path/to/gurobiXXX

where XXX is the 3-digit version of gurobi.

The compilation results in the following files in the build directory:

COMMAND | DESCRIPTION -----------|------------- cluster | Cluster mutations using a combinatorial algorithm that models variant read counts using a binomial distribution. generatemigrationtrees | Generates all migration trees given anatomical site labels. These migration trees can be used to constrain the search space of the pmh, pmh_pr and pmh_cti algorithms. generatemutationtrees | Generates all mutation trees given a frequency matrix. pmh_sankoff | Enumerates all minimum-migration vertex labelings given a clone tree. pmh | Solves the Parsimonious Migration History (PMH) problem given a migration pattern restriction and a clone tree. pmh_tr | Solves the Parsimonious Migration History with Polytomy Resolution (PMH-PR) problem given a migration pattern restriction and a clone tree. pmh_ti | Solves the Parsimonious Migration History and Tree Inference (PMH-TI) given a migration pattern restriction, a mutation tree and a frequency matrix. simulate | Simulates a metastatic tumor. visualizeclonetree | Visualizes a clone tree and optional vertex labeling. visualizemigrationgraph | Visualizes the migration graph given a clone tree and vertex labeling.

<a name="usage"></a>

Usage instructions

<a name="io"></a>

I/O formats

Below we describe the various formats used by the algorithms of the MACHINA framework.

<a name="clonetree"></a>

Clone tree

A clone tree is provided as an edge list. Each line specifies an edge by listing the labels of the incident vertices separated by a space or tab character. For example:

A A1
A A2
A A3
A A4
A A5
A A6
...

See patient1.tree for the complete clone tree.

<a name="leaflabeling"></a>

Leaf labeling

A leaf labeling assigns an anatomical site label to each leaf of a clone tree. Each line contains two values, the leaf label and the anatomical site label separated by a space or tab character. For example:

A1 Om
A2 SBwl
A3 LFTB
A4 LOv
A5 ApC
A6 RFTA
...

See patient1.labeling for the complete leaf labeling.

<a name="vertexlabeling"></a>

Vertex labeling

A vertex labeling assigns an anatomical site label to each vertex of a clone tree (including the leaves). Each line contains two values, the vertex label and the anatomical site label separated by a space or tab character. For example:

A ROv
B SBwl
D ROv
F ROv
H ROv
A1 Om
A2 SBwl
...

See patient1.reported.labeling for the complete vertex labeling.

Frequencies

A frequency file encodes the frequency of every mutation (cluster) in an anatomical site (sample). It is a tab separated file. The first line lists the number of anatomical sites followed by the number of samples and then the number of mutations, each on separate lines. The fourth line is ignored but describes the format of the rest of the file. Each subsequent line encodes the cell frequency of a mutation in a sample: first the sample 0-based index is given, followed by the label of the sample, the 0-based index of the anatomical site, the anatomical site label, the 0-based index of the mutation, the label of the mutation, the frequency lower bound and upper bound.

6 #anatomical sites							
6 #samples							
10 #mutation clusters							
#sample_index	sample_label	anatomical_site_index	anatomical_site_label	character_index	character_label	f_lb	f_ub
0	breast	0	breast	0	1	0.503628522	0.545237495
0	breast	0	breast	1	2	0	0.01213794
...

See F.tsv for the complete frequency file. For an example on how obtain this file from read data, please see: process_A7_new.ipynb. Specifically, you will need to process the bulk DNA sequencing data by first calling single-nucleotide variants and copy-number aberrations. Then, SNVs that occur in copy-neutral regions need to be clustered (e.g., using SciClone or PyClone). Confidence intervals can then be obtained by first pooling for each sample the read counts of the mutations that belong to the same cluster followed by using a beta distribution. Please see the supplement of the MACHINA paper for more details.

<a name="pmh"></a>

Parsimonious Migration History (pmh_sankoff and pmh)

In the parsimonious migration history we are given a clone tree T whose leaves are labeled by anatomical sites. The task is to label the inner vertices of T such that the resulting migration graph G has minimum migration number and comigration number. Additionally, it is possible to specify constraints on the topology of the migration graph.

PATTERN | DESCRIPTION --------|------------ PS (parallel single-source seeding) | Each metastatic site is seeded directly from the primary tumor, i.e. G is a multi-tree such that the primary P is the only vertex with out-degree greater than 1. S (single-source seeding) | Each metastatic site is seeded from only one other anatomical site, i.e. G is a multi-tree. M (multi-source seeding) | A metastatic site may be seeded from multiple anatomical sites, but no directed cycles are introduced. That is, G is multi-DAG. R (reseeding) | Directed cycles in G are allowed.

In our algorithms we allow for the following restrictions on the migration pattern:

  1. Unrestricted: PS, S, M and R
  2. No reseeding: PS, S, M
  3. No reseeding and no multi-source seeding: PS and S
  4. No reseeding, no multi-source seeding and no single-source seeding: PS

The unconstrained PMH problem can be solved by running pmh_sankoff, which is an adaptation of the Sankoff algorithm and enumerates all migration histories:

Usage:
pmh_sankoff [--help|-h|-help] [-c str] [-o str] [-p str] T leaf_labeling
Where:
T
    Clone tree
leaf_labeling
    Leaf labeling
--help|-h|-help
    Print a short help message
-c str
    Color map file
-o str
    Output prefix
-p str
    Primary anatomical sites separated by commas (if omitted, every
    anatomical site will be considered iteratively as the primary)

An example execution of the pmh_sankoff algorithm (executed from the root directory of the MACHINA repository):

$ mkdir patient1
$ pmh_sankoff -p LOv,ROv -c data/mcpherson_2016/coloring.txt data/mcpherson_2016/patient1.tree \
data/mcpherson_2016/patient1.labeling -o patient1/ 2> patient1/result.

Related Skills

View on GitHub
GitHub Stars36
CategoryDevelopment
Updated1y ago
Forks12

Languages

Jupyter Notebook

Security Score

80/100

Audited on Apr 2, 2025

No findings