SkillAgentSearch skills...

Peptriever

Bi-Encoder approach for large-scale protein-peptide binding search

Install / Use

/learn @RoniGurvich/Peptriever
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Peptriever

demo Paper CI Code style: black Code Quality Preprint

About

This repo contains all the code needed in order to train Peptriever end to end.

Local Setup

The dependencies are managed using Poetry.

You can set up your local virtual environment with all the dependencies by running:

make setup

System Architecture Diagram

flowchart TD
    subgraph legend[Legend]
        data[Data]
        process{{Process}}
    end

    subgraph data_sources[Data Sources]

        subgraph pdb_seq[PDB Sequences]
            pdb_dump[PDB Data Dump] --> extract_sequences{{Extract Sequences}} --> pdb_sequences[PDB Sequences]
            click pdb_sequences "https://huggingface.co/datasets/ronig/pdb_sequences" "huggingface dataset"
        end
        
        subgraph binding[Binding]
            huang_data[Huang Lab Data]
            propedia_data[Propedia Data]
            yapp_data[YAPP-Cd]
            huang_data --> preprocess_train_data{{Prepare Binding Training Set}}
            propedia_data --> preprocess_train_data
            yapp_data --> preprocess_train_data
            preprocess_train_data --> binding_sequences[Binding Sequences]
            click binding_sequences "https://huggingface.co/datasets/ronig/protein_binding_sequences" "huggingface dataset"
        end
        
    end

    subgraph pretraining[Pretraining]
        pdb_sequences --> train_tokenizer{{Train Tokenizer}} --> tokenizer[Tokenizer]
        tokenizer --> mlm_pretraining{{Masked Language Pretraining}}
        mlm_pretraining --> pretrained_mlm[Pretrained Models]
        click tokenizer "https://huggingface.co/ronig/pdb_bpe_tokenizer_1024_mlm" "huggingface model"
    end
    
    subgraph training[Training]
        pretrained_mlm --> finetune{{Finetune Models}}
        binding_sequences --> finetune
        finetune --> trained_model[Trained Model]
        click trained_model "https://huggingface.co/ronig/protein_biencoder" "huggingface model"
    end
    
    subgraph indexing[Indexing]
        trained_model --> build_index{{Build Index}}
        pdb_sequences --> build_index
        build_index --> vector_db[(Vector Database)]
        vector_db --> publish_index_model{{Publish Index and Model}}
    end

    publish_index_model --> search_app((Search App))
    click search_app "https://peptriever.app" "Peptriever App"

Model Details

Model Architecture

Peptriever is a Bi Encoder Bert model, combined with a Byte-Pair Encoding tokenizer.

flowchart TD
    protein_sequence[Protein Sequence] --> protein_encoder[Protein BERT] --> protein_vector[Protein Vector]
    peptide_sequence[Peptide Sequence] --> peptide_encoder[Peptide BERT] --> peptide_vector[Peptide Vector]
    peptide_vector --> euclidean[Euclidean Distance == Binding Score] 
    protein_vector --> euclidean

Evaluation Results

The model was evaluated on the test set from Johansson-Akhe et al.

Precision-Recall ROC

View on GitHub
GitHub Stars13
CategoryDevelopment
Updated19d ago
Forks2

Languages

Python

Security Score

90/100

Audited on Mar 12, 2026

No findings