Evolocity
Evolutionary velocity with protein language models
Install / Use
/learn @brianhie/EvolocityREADME
Evolocity
Evolocity is a Python package that implements evolutionary velocity, which constructs landscapes of protein evolution by using the local evolutionary predictions enabled by language models to predict the directionality of evolution and is described in the paper "Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins" by Brian Hie, Kevin Yang, and Peter Kim. This repository also contains the analysis code and links to the data for reproducing the results in the paper.
Evolocity is a fork of the scVelo tool for RNA velocity by Bergen et al. and relies on many aspects of the Scanpy library for high-dimensional biological data analysis. Like Scanpy and scVelo, evolocity makes use of anndata, a convenient way to store and organize biological data. Our main implementation is based on the ESM-1b language model by Rives et al.
Documentation
For in-depth API documentation, go to https://evolocity.readthedocs.io.
Installation
You should be able to install evolocity using pip:
pip install evolocity
API example and tutorials
Below is a quick Python example of using evolocity to load and analyze sequences in a FASTA file.
import evolocity as evo
import scanpy as sc
# Load sequences and compute language model embeddings.
fasta_fname = 'data.fasta'
adata = evo.pp.featurize_fasta(fasta_fname)
# Construct sequence similarity network.
evo.pp.neighbors(adata)
# Run evolocity analysis.
evo.tl.velocity_graph(adata)
# Embed network and velocities in two-dimensions and plot.
sc.tl.umap(adata)
evo.tl.velocity_embedding(adata)
evo.pl.velocity_embedding_grid(adata)
evo.pl.velocity_embedding_stream(adata)
More detailed documentation is provided here.
Tutorials are also available in the documentation and also on Google Colab for influenza A nucleoprotein and cytochrome c.
Testing
Unit tests require using pytest and can be run with the command
python -m pytest tests/
from the top-level directory.
Experiments
Below are scripts for reproducing the experiments in our paper. To apply evolocity to your own sequence data, we also encourage you to check out the tutorials in the documentation. Our experiments were run with Python version 3.7 on Ubuntu 20.04.
Data
You can download the relevant datasets using the commands
wget https://zenodo.org/record/5590361/files/data.tar.gz
tar xvf data.tar.gz
ln -s data/target/ target
within the same directory as this repository. Be sure to move the target/ directory one level up or create a symlink to it (as done above).
Dependencies
Before running the scripts below, we encourage you to use the conda environment in environment-evolocity.yml using
conda env create --file environment-evolocity.yml
Evolocity analysis
Our main evolocity analyses can be reproduced using the command
bash bin/main.sh
which will create new log files and figures in a new figures/ directory. Analyses should fit within 100 GB of CPU RAM and 8 GB of GPU RAM, and should finish within a few hours.
Benchmark results are generated by the commands
python bin/benchmark.py
python bin/benchmark_downsample.py
Benchmarking results can be reproduced with the commands below, but can take several days to complete if run in serial.
bash bin/benchmark.sh
bash bin/benchmark_downsample.sh
Scripts for other analyses
Phylogenetic tree reconstruction of NP and ancient proteins can be done with the commands below (you will first need to install PhyML and FastTree):
bash bin/phylo_np.sh > phylo_np.log 2>&1
bash bin/phylo_eno.sh > phylo_eno.log 2>&1
bash bin/phylo_pgk.sh > phylo_pgk.log 2>&1
bash bin/phylo_ser.sh > phylo_ser.log 2>&1
Deep mutational scan benchmarking can be done with the command
python bin/dms.py esm1b > dms_esm1b.log 2>&1
python bin/dms.py tape > dms_tape.log 2>&1
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
