PhyloSofS
A tool to model the evolution and structural impact of alternative splicing
Install / Use
/learn @PhyloSofS-Team/PhyloSofSREADME
PhyloSofS
A tool to model the evolution and structural impact of alternative splicing
Status |Linux, OSX |Windows
:-------------------------:|:-------------------------:|:-------------------------
|
|
PhyloSofS (Phylogenies of Splicing isoforms Structures) is a
fully automated computational tool that infers plausible evolutionary scenarios
explaining a set of transcripts observed in several species and models the
three-dimensional structures of the produced protein isoforms.
The phylogenetic reconstruction algorithm relies on a combinatorial approach
and the maximum parsimony principle. The generation of the isoforms' 3D models
is performed using comparative modeling.
Case study
PhyloSofS was applied to the c-Jun N-terminal kinase (JNK) family (60 transcripts in 7 species). It enabled to date the appearance of an alternative splicing event (ASE) resulting in substrate affinity modulation in the ancestor common to mammals, amphibians and fishes, and to identify key residues responsible for such modulation. It also highlighted a new ASE inducing a large deletion, yet conserved across several species. The resulting isoform is stable in solution and could play a role in the cell. More details about this case study, together with the algorithm description, can be found in the PhyloSofS' preprint available at bioRxiv.
Installation
1. Download
You can clone this PhyloSofS package using git:
git clone https://github.com/PhyloSofS-Team/PhyloSofS.git
2. Install
Then, you can access the cloned PhyloSofS folder and install the package
using Python 3's pip:
cd PhyloSofS
python -m pip install .
3. Install dependencies
Phylogenetic inference
To run the phylogenetic module of PhyloSofS, you need to have
Graphviz installed.
The easiest way to install Graphviz in...
- Debian/Ubuntu is:
sudo apt-get install graphviz - Windows is using Chocolatey:
choco install graphviz - macOS is using Homebrew:
brew install graphviz
Molecular modelling
The molecular modelling pipeline depends on Julia, HH-suite3 and MODELLER. This module can only run on Unix systems (because of the HH-suite). To alleviate that, we offer a Docker image with all these dependencies installed (see the Docker section for more details).
Julia
You can download Julia 1.1.1 binaries from its site.
LibZ
Some BioJulia packages can need LibZ to precompile. If you found a related
error, you can install LibZ from its site.
In Ubuntu 18.04 you can install it by doing: sudo apt-get install zlib1g-dev
HH-suite3
Clone our HH-suite fork at AntoineLabeeuw/hh-suite and follow the Compilation instructions in its README.md file.
MODELLER
PhyloSofS needs MODELLER version 9.21. Follow the instructions in the MODELLER site to install it and get the license key.
Databases
To run the molecular modelling module you need the HH-suite databases:
- Sequence database:
uniclust30_yyyy_mm_hhsuite.tar.gz(we have tested PhyloSofS using20180_08asyyyy_mm) - Structural database:
pdb70_from_mmcif_latest.tar.gz
The needed mmCIF PDB files for MODELLER are downloaded on demand, if there are not present, in an indicated folder.
To set up the databases, you can use the script setup_databases (recommended).
Alternatively, a manual installation can be performed following the instructions
in docs/get_databases.md.
Using the setup_databases script
The setup_databases downloads and decompress the needed databases. It creates
the following folder structure that can be easily used by PhyloSofS with the
--databases argument:
databases
├── pdb
├── pdb70
└── uniclust
You can do setup_databases -h to know more about the script and its arguments.
Docker (without installation)
You can directly use PhyloSofS via Docker without cloning this GitHub repository. To run PhyloSofS' Docker image you need to install Docker following these instructions.
The following example is going to run PhyloSofS' Docker image using
Windows PowerShell. Databases for the molecular modelling module stored in
D:\databases are going to be mounted in /databases and the local directory
in /project. The actual folder is ${PWD} in Windows PowerShell, %cd% in
Windows Command Line (cmd), and $(pwd) in Unix.
docker run -ti --rm --mount type=bind,source=d:\databases,target=/databases --mount type=bind,source=${PWD},target=/project diegozea/phylosofs
After this, we have access to the bash terminal of an Ubuntu 18.04 image
with PhyloSofS and all its dependencies installed. You only need to indicate
your MODELLER license key to
use PhyloSofS. To do that, you run the following command after replacing
license_key with your MODELLER license key:
sed -i 's/xxx/license_key/' /usr/lib/modeller9.21/modlib/modeller/config.py
Homology modelling example using Docker CE in Ubuntu
After installing Docker CE following these instructions, you can create a folder to work with the app, e.g.:
mkdir phylosofs
And then go into that folder and run the PhyloSofS Docker image bind-mounting
the local folder into /project:
cd phylosofs
sudo docker run -ti --rm --mount type=bind,source=$(pwd),target=/project diegozea/phylosofs
This starts a bash console with PhyloSofS and all its dependencies installed. The sources are taken from diegozea/phylosofs.
First, change xxx by your MODELLER license key using sed as indicated in the banner.
Then, you can use the setup_databases script the first time to install the needed
databases into the project folder. The databases are going to need some time to
download and decompress depending on your internet connection and disk speed.
You need almost 129 Gb in your disk before download and decompress them:
setup_databases
This has created a databases folder in /project
(and therefore in the phylosofs folder of your system) with the needed sequence
and structure databases for the homology modelling step.
To test the molecular modelling suite, we are going to create an example input
pir file in a GeneName folder. PhyloSofS is going to look for transcripts.pir files
in the indicated folder and its subfolders:
mkdir GeneName
echo ">P1;gene transcript ABCDE" >> ./GeneName/transcripts.pir
echo "AAAAAABBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCDDDDDDDDDEEEEEEEEE" >> ./GeneName/transcripts.pir
echo "ACTNEFCASCPTFLRMDNNAAAIKALELYKINAKLCDPHPSKKGASSAYLENSKGAPNNS*" >> ./GeneName/transcripts.pir
Where the pir annotation below the id is used to indicate the exon (A, B, C...)
to which belong each residue of the protein isoform.
phylosofs -M -i GeneName --databases databases
Note: Installing databases with Windows as a host
If you are using the PhyloSofS' Docker image, you must know that errors can
occur when very large files are being written to bind-mounted NTFS file
systems. This happens particularly when setup_databases is run because it
tries to download and decompress large files. To avoid this problem, you can
install PhyloSofS on Windows and run setup_databases.exe to set up the
databases before using the docker image.
Running PhyloSofS
You can run phylosofs -h to see the help and the list of arguments.
1. Phylogenetic Inference
phylosofs -P -s 100 --tree path_to_newick_tree --transcripts path_to_transcripts
2. Molecular modelling
If databases where installed using setup_databases and the HH-Suite3
scripts and programs are in the executable paths, then you can run:
phylosofs -M -i path_to_input_dir --databases path_to_databases_folder
PhyloSofS is going to look for transcripts.pir files in the folder and
sub-folders of path_to_input_dir to perform the homology modelling of each
sequence in those files.
If you have a more manual installation of the databases and/or the HH-Suite3 scripts and programs are not in the path:
phylosofs -M -i path_to_input_dir --hhlib path_to_hhsuite_folder --hhdb path_to_uniclust_database/uniclust_basename --structdb path_to_pdb70/pdb70 --allpdb path_mmcif_pdb_cache_folder
Please note that for the databases --hhdb and --structdb, you need to
provide the path to the folder and also the basename of the files in it.
For example, if the database uniclust30_2018_08 is
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
