SkillAgentSearch skills...

Foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.

Install / Use

/learn @steineggerlab/Foldseek
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Foldseek

Foldseek enables fast and sensitive comparisons of large protein structure sets, supporting monomer and multimer searches, as well as clustering. It runs on CPU, supports GPU acceleration for faster searches, and optionally allows ultra-fast and sensitive comparisons directly from protein sequence inputs using a language model, bypassing the need for structures.

<p align="center"><img src="https://github.com/steineggerlab/foldseek/blob/master/.github/foldseek.png" height="250"/></p>

Publications

van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, and Steinegger M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)

Barrio-Hernandez I, Yeo J, Jänes J, Mirdita M, Gilchrist CLM, Wein T, Varadi M, Velankar S, Beltrao P and Steinegger M. Clustering predicted structures at the scale of the known protein universe. Nature, doi:10.1038/s41586-023-06510-w (2023)

Kim W, Mirdita M, Levy Karin E, Gilchrist CLM, Schweke H, Söding J, Levy E, and Steinegger M. Rapid and sensitive protein complex alignment with Foldseek-Multimer. Nature Methods, doi:10.1038/s41592-025-02593-7 (2025)

Kallenborn F, Chacon A, Hundt C, Sirelkhatim H, Didi K, Cha S, Dallago C, Mirdita M, Schmidt B, Steinegger M: GPU-accelerated homology search with MMseqs2. bioRxiv, doi: 10.1101/2024.11.13.623350 (2024)

BioConda Install Github All Releases Biocontainer Pulls Build Status

Table of Contents

Webserver

Search your protein structures against the AlphaFoldDB and PDB in seconds using the Foldseek webserver (code): search.foldseek.com 🚀

Installation

# Linux AVX2 build (check using: cat /proc/cpuinfo | grep avx2)
wget https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz; tar xvzf foldseek-linux-avx2.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH

# Linux ARM64 build
wget https://mmseqs.com/foldseek/foldseek-linux-arm64.tar.gz; tar xvzf foldseek-linux-arm64.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH

# Linux AVX2 & GPU build (req. glibc >= 2.17 and nvidia driver >=525.60.13)
wget https://mmseqs.com/foldseek/foldseek-linux-gpu.tar.gz; tar xvfz foldseek-linux-gpu.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH

# MacOS
wget https://mmseqs.com/foldseek/foldseek-osx-universal.tar.gz; tar xvzf foldseek-osx-universal.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH

# Conda installer (Linux and macOS)
conda install -c conda-forge -c bioconda foldseek

Other precompiled binaries are available at https://mmseqs.com/foldseek.

[!NOTE] We recently added support for GPU-accelerated protein sequence and profile searches. This requires an NVIDIA GPU of the Ampere generation or newer for full speed, however, also works at reduced speed for Turing-generation GPUs. The bioconda- and precompiled binaries will not work on older GPU generations (e.g. Volta or Pascal).

Memory requirements

For optimal software performance, consider three options based on your RAM and search requirements:

  1. With Cα info (default). Use this formula to calculate RAM - (6 bytes Cα + 1 3Di byte + 1 AA byte) * (database residues). The 54M AFDB50 entries require 151GB.

  2. Without Cα info. By disabling --sort-by-structure-bits 0, RAM requirement reduces to 35GB. However, this alters hit rankings and final scores but not E-values. Structure bits are mostly relevant for hit ranking for E-value > 10^-1.

  3. Single query searches. Use the --prefilter-mode 1, which isn't memory-limited and computes all optimal ungapped alignments. This option optimally utilizes foldseek's multithreading capabilities for single queries and supports GPU acceleration.

Tutorial Video

A Foldseek tutorial covering the webserver and command-line usage is available here. <a href="https://www.youtube.com/watch?v=k5Rbi22TtOA"><img src="https://img.shields.io/youtube/views/k5Rbi22TtOA?style=social"></a>

Documentation

Many of Foldseek's modules (subprograms) rely on MMseqs2. For more information about these modules, refer to the MMseqs2 wiki. For documentation specific to Foldseek, checkout the Foldseek wiki here.

Quick start

Search

The easy-search module allows to query one or more single-chain proteins, formatted in as protein structures in PDB/mmCIF format (flat or gzipped) or as protein sequnece in fasta, against a target database, folder or individual single-chain protein structures (for multi-chain proteins see complexsearch). The default alignment information output is a tab-separated file but Foldseek also supports Superposed Cα PDBs and HTML.

foldseek easy-search example/d1asha_ example/ aln tmpFolder

Output Search

Tab-separated

The default output fields are: query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits but they can be customized with the --format-output option e.g., --format-output "query,target,qaln,taln" returns the query and target accessions and the pairwise alignments in tab-separated format. You can choose many different output columns.

| Code | Description | | --- | --- | |query | Query sequence identifier | |target | Target sequence identifier | |qca | Calpha coordinates of the query | |tca | Calpha coordinates of the target | |alntmscore | TM-score of the alignment | |qtmscore | TM-score normalized by the query length | |ttmscore | TM-score normalized by the target length | |u | Rotation matrix (computed to by TM-score) | |t | Translation vector (computed to by TM-score) | |lddt | Average LDDT of the alignment | |lddtfull | LDDT per aligned position | |prob | Estimated probability for query and target to be homologous (e.g. being within the same SCOPe superfamily) |

Check out the MMseqs2 documentation for additional output format codes.

Superpositioned Cα only PDB files

Foldseek's --format-mode 5 generates PDB files with all target Cα atoms superimposed onto the query structure based on the aligned coordinates. For each pairwise alignment it will write its own PDB file, so be careful when using this options for large searches.

Interactive HTML

Locally run Foldseek can generate an HTML search result, similar to the one produced

Related Skills

View on GitHub
GitHub Stars1.2k
CategoryDevelopment
Updated14h ago
Forks148

Languages

C

Security Score

100/100

Audited on Apr 4, 2026

No findings