SkillAgentSearch skills...

HydraRNA

HydraRNA is a full-length RNA language model.

Install / Use

/learn @GuipengLi/HydraRNA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

HydraRNA is a full-length RNA language model. HydraRNA employs a hybrid architecture with 12 layers. Each layer contains a Hydra module except the 6th and 12th layer, which contains a MHA module. It's pre-trained on both non-coding and protien-coding RNAs. It supports as long as 10K nt of RNA sequences as input.

This repository contains codes and pre-trained models for RNA feature extraction and secondary structure prediction model.

Overview

We use fairseq sequence modeling framework to train HydraRNA. HydraRNA is based on Hydra and FlashAttention. We appreciate these excellent works!

Create Environment and install

Installation via Script

This automated installation script is designed for Linux systems only.

Prerequisites:

  • Ensure you have Conda installed.

  • Ensure there is no existing Conda environment named HydraRNA.

To install, simply run:

./install_hydrarna_env.sh

Manual Install

First, make sure you have CUDA-11.8 installed. Then create the environment.

conda create -n HydraRNA python==3.9.12
conda activate HydraRNA

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1  --index-url https://download.pytorch.org/whl/cu118

pip install --no-build-isolation mamba-ssm[causal-conv1d]==2.2.2
pip install flash-attn
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/csrc/fused_dense_lib/ && pip install .

pip install pip==24.0

pip install pandas tqdm tensorboardX pysam

pip install transformers==4.44.0

Then, download the repository and install it.

git clone https://github.com/GuipengLi/HydraRNA.git
cd HydraRNA/fairseq
pip install --editable ./

Download the pre-trained models.

You can download the models from our google drive. Then put these pt files in the weights directory. The HydraRNA_model_V2.pt file is only for fine-tuning the RNA secondary structure prediction task. We further pre-trained the HydraRNA model based on HydraRNA_model.pt with more ncRNAs to get the HydraRNA_model_V2.pt weights.

Usage

We provide example scripts to show how to use HydraRNA to extract embedding and fine-tuning for downstream tasks. As a special case, we also provide example script to show how to use HydraRNA for RNA secondary structure prediction.

1. Embedding Extraction

We provide example script to show how to use HydraRNA to extract embedding

python extract_HydraAttRNA12_5UTRMRL.py

The corresponding feature extraction code is inside this file. For a sequence of length N, the funciton model.encoder.extract_features will return a tensor of size 1 x (N+2) x 1024. We recommend using the mean embedding excluding the special tokens at both ends.

2. Fine-tuning

We provide example script to show how to use HydraRNA to fine-tune for downstream tasks.

python finetune_HydraAttRNA12_mlp_5UTRMRL_scaled.py

3. Secondary structure prediction.

mkdir predict

python predict_HydraAttRNA12_RNA_SecStruct12.py

We used bpRNA datasets prepared by RiNALMo directly, which needs to be installed.The prediction head is also from this repository. This script will output the RNA secondary structures predicted by the HydraRNA in TS0 dataset and a summary file bpRNA_test_HydraRNA_predict_resulst_allow_flexible_pairings.csv.

Citations

If you find these models useful, please cite our work:

Li G, Jiang F, Zhu J, Cui H, Wang Z, Chen W. HydraRNA: a hybrid architecture based full-length RNA language model. Genome Biol. 2025;26:383.

License

This source code is licensed under the MIT license.

View on GitHub
GitHub Stars17
CategoryDevelopment
Updated12h ago
Forks1

Languages

Python

Security Score

90/100

Audited on Apr 8, 2026

No findings