HydraRNA
HydraRNA is a full-length RNA language model.
Install / Use
/learn @GuipengLi/HydraRNAREADME
HydraRNA is a full-length RNA language model. HydraRNA employs a hybrid architecture with 12 layers. Each layer contains a Hydra module except the 6th and 12th layer, which contains a MHA module. It's pre-trained on both non-coding and protien-coding RNAs. It supports as long as 10K nt of RNA sequences as input.
This repository contains codes and pre-trained models for RNA feature extraction and secondary structure prediction model.

We use fairseq sequence modeling framework to train HydraRNA. HydraRNA is based on Hydra and FlashAttention. We appreciate these excellent works!
Create Environment and install
Installation via Script
This automated installation script is designed for Linux systems only.
Prerequisites:
-
Ensure you have Conda installed.
-
Ensure there is no existing Conda environment named HydraRNA.
To install, simply run:
./install_hydrarna_env.sh
Manual Install
First, make sure you have CUDA-11.8 installed. Then create the environment.
conda create -n HydraRNA python==3.9.12
conda activate HydraRNA
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
pip install --no-build-isolation mamba-ssm[causal-conv1d]==2.2.2
pip install flash-attn
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/csrc/fused_dense_lib/ && pip install .
pip install pip==24.0
pip install pandas tqdm tensorboardX pysam
pip install transformers==4.44.0
Then, download the repository and install it.
git clone https://github.com/GuipengLi/HydraRNA.git
cd HydraRNA/fairseq
pip install --editable ./
Download the pre-trained models.
You can download the models from our google drive. Then put these pt files in the weights directory. The HydraRNA_model_V2.pt file is only for fine-tuning the RNA secondary structure prediction task. We further pre-trained the HydraRNA model based on HydraRNA_model.pt with more ncRNAs to get the HydraRNA_model_V2.pt weights.
Usage
We provide example scripts to show how to use HydraRNA to extract embedding and fine-tuning for downstream tasks. As a special case, we also provide example script to show how to use HydraRNA for RNA secondary structure prediction.
1. Embedding Extraction
We provide example script to show how to use HydraRNA to extract embedding
python extract_HydraAttRNA12_5UTRMRL.py
The corresponding feature extraction code is inside this file. For a sequence of length N, the funciton model.encoder.extract_features will return a tensor of size 1 x (N+2) x 1024. We recommend using the mean embedding excluding the special tokens at both ends.
2. Fine-tuning
We provide example script to show how to use HydraRNA to fine-tune for downstream tasks.
python finetune_HydraAttRNA12_mlp_5UTRMRL_scaled.py
3. Secondary structure prediction.
mkdir predict
python predict_HydraAttRNA12_RNA_SecStruct12.py
We used bpRNA datasets prepared by RiNALMo directly, which needs to be installed.The prediction head is also from this repository. This script will output the RNA secondary structures predicted by the HydraRNA in TS0 dataset and a summary file bpRNA_test_HydraRNA_predict_resulst_allow_flexible_pairings.csv.
Citations
If you find these models useful, please cite our work:
License
This source code is licensed under the MIT license.
