SkillAgentSearch skills...

Pasta

PASTA (Practical Alignment using SATe and Transitivity)

Install / Use

/learn @smirarab/Pasta
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

This is an implementation of the PASTA (Practical Alignment using Saté and TrAnsitivity) algorithm published in RECOMB-2014 and JCB:

  • Mirarab S, Nguyen N, Warnow T. PASTA: ultra-large multiple sequence alignment. Sharan R, ed. Res Comput Mol Biol. 2014:177-191.
  • Mirarab S, Nguyen N, Guo S, Wang L-S, Kim J, Warnow T. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. J Comput Biol. 2015;22(5):377-386. doi:10.1089/cmb.2014.0156.

The latest version includes a new decomposition technique described here:

  • Balaban, Metin, Niema Moshiri, Uyen Mai, and Siavash Mirarab. “TreeCluster : Clustering Biological Sequences Using Phylogenetic Trees.” BioRxiv, 2019, 591388. doi:10.1101/591388.

Contact:

All questions and inquires should be addressed to our user email group: pasta-users@googlegroups.com. Please check our Tutorial and previous posts before sending new requests.

Developers

  • The code and the algorithm are developed by Siavash Mirarab and Tandy Warnow, with help from Nam Nguyen. The latest version of the code includes a new code decomposition designed and implemented by Uyen Mai.

  • The current PASTA code is heavily based on the SATé code developed by Mark Holder's group at KU. Refer to sate-doc directory for documentation of the SATé code, including the list of authors, license, etc.

  • Niema Moshiri has contributed to the import to dendropy 4 and python 3 and to the Docker image.

Documentation

In addition to this README file, you can consult this Tutorial.

INSTALLATION

You have three options

1. From Source Code

  • The current version of PASTA has been developed and tested entirely on Linux and MAC.
  • Windows won't work currently (future versions may or may not support Windows).

You need to have:

  • Python (version 2.7 or later, including python 3)
  • Dendropy (but the setup script should automatically install dendropy for you if you don't have it)
  • Java (only required for using OPAL)
  • wxPython - only required if you want to use the GUI. The setup script does not automatically install this.

Installation steps:

  1. Open a terminal and create a directory where you want to keep PASTA and go to this directory. For example:

    mkdir ~/pasta-code
    cd ~/pasta-code`
    
  2. Clone the PASTA code repository from our github repository. For example you can use:

    git clone https://github.com/smirarab/pasta.git
    

    If you don't have git, you can directly download a zip file from the repository and decompress it into your desired directory.

  3. A. Clone the relevant "tools" directory (these are also forked from the SATé project). There are different repositories for linux and MAC. You can use

    git clone https://github.com/smirarab/sate-tools-linux.git #for Linux
    

    or

    git clone https://github.com/smirarab/sate-tools-mac.git. #for MAC
    

    Or you can directly download these as zip files for Linux or MAC and decompress them in your target directory (e.g. pasta-code).

    • Note that the tools directory and the PASTA code directory should be under the same parent directory.
    • When you use the zip files instead of using git, after decompressing the zip file you may get a directory called sate-tools-mac-master or sate-tools-linux-master instead of sate-tools-mac or sate-tools-linux. You need to rename these directories and remove the -master part.
    • Those with 32-bit Linux machines need to be aware that the master branch has 64-bit binaries. 32-bit binaries are provided in the 32bit branch of sate-tools-linux git project (so download this zip file instead).
  4. B. (Optional) Only if you want to use MAFFT-Homologs within PASTA: cd sate-tools-linux or cd sate-tools-mac Use git clone https://github.com/koditaraszka/pasta-databases or download directly at https://github.com/koditaraszka/pasta-databases.git

    • Be sure to leave this directory cd .. before starting the next step
  5. cd pasta (or cd pasta-master if you used the zip file instead of clonning the git repository)

  6. Then run:

     sudo python setup.py develop 
    

    If you don't have root access, use:

    python setup.py develop  --user
    

    Common Problems:

    • Could not find SATé tools bundle directory: this means you don't have the right tools directory at the right location. Maybe you downloaded MAC instead of Linux? Or, maybe you didn't put the directory in the parent directory of where pasta code is? Most likely, you used the zip files and forgot to remove teh -master from the directory name. Run mv sate-tools-mac-master sate-tools-mac on MAC or mv sate-tools-linux-master sate-tools-linux to fix this issue.
    • The setup.py script is supposed to install setuptools for you if you don't have it. This sometimes works and sometimes doesn't. If you get an error with a message like invalid command 'develop', it means that setuptools is not installed. To solve this issue, you can manually install setup tools. For example, on Linux, you can run curl https://bootstrap.pypa.io/ez_setup.py -o - | sudo python (but note there are other ways of installing setuptools as well).
  7. Pasta now includes additional aligners for Linux and MAC users: mafft-ginsi, mafft-homologs, contralign (version 1), and probcons. In order to use mafft-homologs and contralign, the user must set the environment variable CONTRALIGN_DIR=/dir/to/sate-tools-linux. You can use export CONTRALIGN_DIR=/dir/to/sate-tools-linux or just edit ~/.bashrc to have CONTRALIGN_DIR=dir/to/sate-tools-linux.

    • To use these aligners, add the following to your pasta execution --aligner=NAME_OF_ALIGNER, where NAME_OF_ALIGNER now includes (ginsi, homologs, contralign, and probcons)

2. From Docker

  1. Make sure you have Docker installed

  2. Run

    docker pull smirarab/pasta
    

You are done. You can test using

 docker run smirarab/pasta run_pasta.py -h

3. Conda:

Please see https://anaconda.org/bioconda/pasta

You should be good with:

conda install bioconda::pasta

Email pasta-users@googlegroups.com for installation issues.

EXECUTION

To run PASTA using the command-line:

python run_pasta.py -i input_fasta [-t starting_tree] 

PASTA by default picks the appropriate configurations automatically for you. The starting tree is optional. If not provided, PASTA estimates a starting tree.

Run

python run_pasta.py --help

to see PASTA's various options and descriptions of how they work.

To run the GUI version,

  • if you have installed from the source code, cd into your installation directory and run
python run_pasta_gui.py

on some machines you may instead need to use

pythonw run_pasta_gui.py

To run PASTA using Docker, run

docker run -v [path to the directory with your input files]:/data smirarab/pasta run_pasta.py -i input_fasta [-t starting_tree] 

On Windows, you may have to enable drive sharing; see Shared Drives on this page.

Options

PASTA estimates alignments and maximum likelihood (ML) trees from unaligned sequences using an iterative approach. In each iteration, it first estimates a multiple sequence alignment and then a ML tree is estimated on (a masked version of) the alignment. By default PASTA performs 3 iterations, but a host of options enable changing that behavior. In each iteration, a divide-and-conquer strategy is used for estimating the alignment. The set of sequences is divided into smaller subsets, each of which is aligned using an external alignment tool (the default is MAFFT-L-ins-i). These subset alignments are then pairwise merged (by default using Opal) and finally the pairwise merged alignments are merged into a final alignment using transitivity merge. The division of the dataset into smaller subsets and selecting which alignments should be pairwise merged is guided by the tree from the previous iteration. The first step therefore needs an initial tree.

When GUI is used, a limited set of important options can be adjusted. The command line also allows you to alter the behavior of the algorithm, and provides a larger sets of options that can be adjusted.

Options can also be passed in as configuration files with the format:

[commandline]
option-name = value

[sate]
option-name = value

With every run, PASTA saves the configuration file for that run as a temporary file called [jobname]_temp_pasta_config.txt in your output directory.

Multiple configuration files can be provided. Configuration files are read in the order they occur as arguments (with values in later files replacing previously read values). Options specified in the command line are read last. Thus, these values "overwrite" any settings from the configuration files.

Note: the use of --auto option can overwrite some of the other options provided by comman

View on GitHub
GitHub Stars95
CategoryDevelopment
Updated1mo ago
Forks28

Languages

Python

Security Score

95/100

Audited on Feb 12, 2026

No findings