Foldseek
Foldseek enables fast and sensitive comparisons of large structure sets.
Install / Use
/learn @steineggerlab/FoldseekREADME
Foldseek
Foldseek enables fast and sensitive comparisons of large protein structure sets, supporting monomer and multimer searches, as well as clustering. It runs on CPU, supports GPU acceleration for faster searches, and optionally allows ultra-fast and sensitive comparisons directly from protein sequence inputs using a language model, bypassing the need for structures.
<p align="center"><img src="https://github.com/steineggerlab/foldseek/blob/master/.github/foldseek.png" height="250"/></p>Publications
Table of Contents
- Foldseek
- Table of Contents
Webserver
Search your protein structures against the AlphaFoldDB and PDB in seconds using the Foldseek webserver (code): search.foldseek.com 🚀
Installation
# Linux AVX2 build (check using: cat /proc/cpuinfo | grep avx2)
wget https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz; tar xvzf foldseek-linux-avx2.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH
# Linux ARM64 build
wget https://mmseqs.com/foldseek/foldseek-linux-arm64.tar.gz; tar xvzf foldseek-linux-arm64.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH
# Linux AVX2 & GPU build (req. glibc >= 2.17 and nvidia driver >=525.60.13)
wget https://mmseqs.com/foldseek/foldseek-linux-gpu.tar.gz; tar xvfz foldseek-linux-gpu.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH
# MacOS
wget https://mmseqs.com/foldseek/foldseek-osx-universal.tar.gz; tar xvzf foldseek-osx-universal.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH
# Conda installer (Linux and macOS)
conda install -c conda-forge -c bioconda foldseek
Other precompiled binaries are available at https://mmseqs.com/foldseek.
[!NOTE] We recently added support for GPU-accelerated protein sequence and profile searches. This requires an NVIDIA GPU of the Ampere generation or newer for full speed, however, also works at reduced speed for Turing-generation GPUs. The bioconda- and precompiled binaries will not work on older GPU generations (e.g. Volta or Pascal).
Memory requirements
For optimal software performance, consider three options based on your RAM and search requirements:
-
With Cα info (default). Use this formula to calculate RAM -
(6 bytes Cα + 1 3Di byte + 1 AA byte) * (database residues). The 54M AFDB50 entries require 151GB. -
Without Cα info. By disabling
--sort-by-structure-bits 0, RAM requirement reduces to 35GB. However, this alters hit rankings and final scores but not E-values. Structure bits are mostly relevant for hit ranking for E-value > 10^-1. -
Single query searches. Use the
--prefilter-mode 1, which isn't memory-limited and computes all optimal ungapped alignments. This option optimally utilizes foldseek's multithreading capabilities for single queries and supports GPU acceleration.
Tutorial Video
A Foldseek tutorial covering the webserver and command-line usage is available here. <a href="https://www.youtube.com/watch?v=k5Rbi22TtOA"><img src="https://img.shields.io/youtube/views/k5Rbi22TtOA?style=social"></a>
Documentation
Many of Foldseek's modules (subprograms) rely on MMseqs2. For more information about these modules, refer to the MMseqs2 wiki. For documentation specific to Foldseek, checkout the Foldseek wiki here.
Quick start
Search
The easy-search module allows to query one or more single-chain proteins, formatted in as protein structures in PDB/mmCIF format (flat or gzipped) or as protein sequnece in fasta, against a target database, folder or individual single-chain protein structures (for multi-chain proteins see complexsearch). The default alignment information output is a tab-separated file but Foldseek also supports Superposed Cα PDBs and HTML.
foldseek easy-search example/d1asha_ example/ aln tmpFolder
Output Search
Tab-separated
The default output fields are: query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits but they can be customized with the --format-output option e.g., --format-output "query,target,qaln,taln" returns the query and target accessions and the pairwise alignments in tab-separated format. You can choose many different output columns.
| Code | Description | | --- | --- | |query | Query sequence identifier | |target | Target sequence identifier | |qca | Calpha coordinates of the query | |tca | Calpha coordinates of the target | |alntmscore | TM-score of the alignment | |qtmscore | TM-score normalized by the query length | |ttmscore | TM-score normalized by the target length | |u | Rotation matrix (computed to by TM-score) | |t | Translation vector (computed to by TM-score) | |lddt | Average LDDT of the alignment | |lddtfull | LDDT per aligned position | |prob | Estimated probability for query and target to be homologous (e.g. being within the same SCOPe superfamily) |
Check out the MMseqs2 documentation for additional output format codes.
Superpositioned Cα only PDB files
Foldseek's --format-mode 5 generates PDB files with all target Cα atoms superimposed onto the query structure based on the aligned coordinates.
For each pairwise alignment it will write its own PDB file, so be careful when using this options for large searches.
Interactive HTML
Locally run Foldseek can generate an HTML search result, similar to the one produced
Related Skills
node-connect
348.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
