Machina
Framework for Metastatic And Clonal History INtegrative Analysis
Install / Use
/learn @raphael-group/MachinaREADME
MACHINA - Metastatic And Clonal History INtegrative Analysis
MACHINA is a computational framework for inferring migration patterns between a primary tumor and metastases using DNA sequencing data.

Contents
<a name="installation"></a>
Installation
<a name="bioconda"></a>
bioconda
- Install Anaconda or Miniconda if you do not already have one installed.
- (recommended) Create a new conda environment for
machinaand activate it:
conda create -n machina
conda activate machina
- Set up
condachannels forbioconda(once per Anaconda/Miniconda installation):
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
- Install
machinafrom bioconda:
conda install machina
<a name="compilation"></a>
Manual compilation
Note that binaries for macOS and linux are available here. These binaries require a valid Gurobi installation and license key. License key location can be specified via the environment variable GRB_LICENSE_KEY. In addition, installation of Gurobi in a non-standard location will require updating LD_LIBRARY_PATH (linux) and DYLD_LIBRARY_PATH (macOS).
Also note that to run the below examples, you must either provide the full path to the executable (e.g., /path/to/machina/build/pmh_sankoff) or add the build directory to your PATH.
<a name="dep"></a>
Dependencies
MACHINA is written in C++11 and thus requires a modern C++ compiler (GCC >= 4.8.1, or Clang). In addition, MACHINA has the following dependencies.
Graphviz is required to visualize the resulting DOT files, but is not required for compilation.
Gurobi is a commercial ILP solver with two licensing options: (1) a single-host license where the license is tied to a single computer and (2) a network license for use in a compute cluster. Both options are freely available for users in academia.
In case doxygen is available, extended source code documentation will be generated.
<a name="comp"></a>
Compilation
To compile MACHINA, execute the following commands from the root of the repository:
$ mkdir build
$ cd build
$ cmake ..
$ make
In case CMake fails to detect LEMON or Gurobi, run the following command with adjusted paths:
$ cmake -DLIBLEMON_ROOT=~/lemon \
-DGUROBI_HOME=/path/to/gurobiXXX
where XXX is the 3-digit version of gurobi.
The compilation results in the following files in the build directory:
COMMAND | DESCRIPTION
-----------|-------------
cluster | Cluster mutations using a combinatorial algorithm that models variant read counts using a binomial distribution.
generatemigrationtrees | Generates all migration trees given anatomical site labels. These migration trees can be used to constrain the search space of the pmh, pmh_pr and pmh_cti algorithms.
generatemutationtrees | Generates all mutation trees given a frequency matrix.
pmh_sankoff | Enumerates all minimum-migration vertex labelings given a clone tree.
pmh | Solves the Parsimonious Migration History (PMH) problem given a migration pattern restriction and a clone tree.
pmh_tr | Solves the Parsimonious Migration History with Polytomy Resolution (PMH-PR) problem given a migration pattern restriction and a clone tree.
pmh_ti | Solves the Parsimonious Migration History and Tree Inference (PMH-TI) given a migration pattern restriction, a mutation tree and a frequency matrix.
simulate | Simulates a metastatic tumor.
visualizeclonetree | Visualizes a clone tree and optional vertex labeling.
visualizemigrationgraph | Visualizes the migration graph given a clone tree and vertex labeling.
<a name="usage"></a>
Usage instructions
<a name="io"></a>
I/O formats
Below we describe the various formats used by the algorithms of the MACHINA framework.
<a name="clonetree"></a>
Clone tree
A clone tree is provided as an edge list. Each line specifies an edge by listing the labels of the incident vertices separated by a space or tab character. For example:
A A1
A A2
A A3
A A4
A A5
A A6
...
See patient1.tree for the complete clone tree.
<a name="leaflabeling"></a>
Leaf labeling
A leaf labeling assigns an anatomical site label to each leaf of a clone tree. Each line contains two values, the leaf label and the anatomical site label separated by a space or tab character. For example:
A1 Om
A2 SBwl
A3 LFTB
A4 LOv
A5 ApC
A6 RFTA
...
See patient1.labeling for the complete leaf labeling.
<a name="vertexlabeling"></a>
Vertex labeling
A vertex labeling assigns an anatomical site label to each vertex of a clone tree (including the leaves). Each line contains two values, the vertex label and the anatomical site label separated by a space or tab character. For example:
A ROv
B SBwl
D ROv
F ROv
H ROv
A1 Om
A2 SBwl
...
See patient1.reported.labeling for the complete vertex labeling.
Frequencies
A frequency file encodes the frequency of every mutation (cluster) in an anatomical site (sample). It is a tab separated file. The first line lists the number of anatomical sites followed by the number of samples and then the number of mutations, each on separate lines. The fourth line is ignored but describes the format of the rest of the file. Each subsequent line encodes the cell frequency of a mutation in a sample: first the sample 0-based index is given, followed by the label of the sample, the 0-based index of the anatomical site, the anatomical site label, the 0-based index of the mutation, the label of the mutation, the frequency lower bound and upper bound.
6 #anatomical sites
6 #samples
10 #mutation clusters
#sample_index sample_label anatomical_site_index anatomical_site_label character_index character_label f_lb f_ub
0 breast 0 breast 0 1 0.503628522 0.545237495
0 breast 0 breast 1 2 0 0.01213794
...
See F.tsv for the complete frequency file. For an example on how obtain this file from read data, please see: process_A7_new.ipynb. Specifically, you will need to process the bulk DNA sequencing data by first calling single-nucleotide variants and copy-number aberrations. Then, SNVs that occur in copy-neutral regions need to be clustered (e.g., using SciClone or PyClone). Confidence intervals can then be obtained by first pooling for each sample the read counts of the mutations that belong to the same cluster followed by using a beta distribution. Please see the supplement of the MACHINA paper for more details.
<a name="pmh"></a>
Parsimonious Migration History (pmh_sankoff and pmh)
In the parsimonious migration history we are given a clone tree T whose leaves are labeled by anatomical sites. The task is to label the inner vertices of T such that the resulting migration graph G has minimum migration number and comigration number. Additionally, it is possible to specify constraints on the topology of the migration graph.
PATTERN | DESCRIPTION
--------|------------
PS (parallel single-source seeding) | Each metastatic site is seeded directly from the primary tumor, i.e. G is a multi-tree such that the primary P is the only vertex with out-degree greater than 1.
S (single-source seeding) | Each metastatic site is seeded from only one other anatomical site, i.e. G is a multi-tree.
M (multi-source seeding) | A metastatic site may be seeded from multiple anatomical sites, but no directed cycles are introduced. That is, G is multi-DAG.
R (reseeding) | Directed cycles in G are allowed.
In our algorithms we allow for the following restrictions on the migration pattern:
- Unrestricted: PS, S, M and R
- No reseeding: PS, S, M
- No reseeding and no multi-source seeding: PS and S
- No reseeding, no multi-source seeding and no single-source seeding: PS
The unconstrained PMH problem can be solved by running pmh_sankoff, which is an adaptation of the Sankoff algorithm and enumerates all migration histories:
Usage:
pmh_sankoff [--help|-h|-help] [-c str] [-o str] [-p str] T leaf_labeling
Where:
T
Clone tree
leaf_labeling
Leaf labeling
--help|-h|-help
Print a short help message
-c str
Color map file
-o str
Output prefix
-p str
Primary anatomical sites separated by commas (if omitted, every
anatomical site will be considered iteratively as the primary)
An example execution of the pmh_sankoff algorithm (executed from the root directory of the MACHINA repository):
$ mkdir patient1
$ pmh_sankoff -p LOv,ROv -c data/mcpherson_2016/coloring.txt data/mcpherson_2016/patient1.tree \
data/mcpherson_2016/patient1.labeling -o patient1/ 2> patient1/result.
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
