RNAstructure.jl
Julia interface to the RNAstructure program suite for RNA structure prediction and analysis
Install / Use
/learn @marcom/RNAstructure.jlREADME
RNAstructure.jl
Unofficial Julia interface to the RNAstructure program suite for RNA structure prediction and analysis. Please cite the appropriate publications listed on the RNAstructure website if you use this library.
Installation
Enter the package mode from the Julia REPL by pressing ] and then
install with
add RNAstructure
Usage
using RNAstructure
Note: sequence conventions
Sequences passed to RNAstructure use the following convention:
- uppercase character: normal nucleotide, U equivalent to T
- lowercase character: nucleotide cannot form basepairs
- X or N character: unknown base or base that cannot interact with others (cannot pair or stack)
See the RNAstructure manual section for sequences for more details.
Some programs make exceptions to these rules, check the manual pages of the RNAstructure programs for details on any differences.
Note: Overriding energy parameter directories
The environment variables RNASTRUCTURE_JL_DATAPATH can be set to
override the directory where energy parameters are read from. For the
cyclefold_* functions the environment variable is called
RNASTRUCTURE_JL_CYCLEFOLD_DATAPATH.
In the original RNAstructure program these environment variables are
called DATAPATH and CYCLEFOLD_DATAPATH. RNAstructure.jl (this
package) sets these environment variables automatically to the
corresponding installation directory of the RNAstructure_jll binary
package. The names of the env vars were changed to avoid clashes with
possible settings you might already have in your shell startup files
from a pre-existing manual RNAstructure installation, which could be a
different version and have different parameters. In this way, you can
be sure that this package uses the correct parameters, while still
allowing to override them if necessary.
Minimum free energy (MFE) and structure
The mfe function calculates the minimum free energy and the
corresponding minimum free energy structure of an RNA
sequence. Internally, this function calls the Fold program from
RNAstructure.
Additional information on the Fold program and possible command-line
options that can be passed via args can be found at the
RNAstructure Fold
documentation.
# returns mfe and structure
mfe("GGGAAACCC") # -> (-1.2 kcal mol^-1, "(((...)))")
# set temperature to 300 K
mfe(seq; args=`-T 300`) # -> (-1.9 kcal mol^-1, "(((...)))")
# show possible options for args
mfe(""; args=`-h`)
Suboptimal structures
Generate suboptimal structures for a nucleic acid
sequence. Internally, this function calls the Fold program from
RNAstructure.
Additional information on the Fold program and possible command-line
options that can be passed via args can be found at the
RNAstructure Fold
documentation.
subopt("GGGAAACCC")
subopt("GGGGAAACCCC"; args=`-w 0 -p 100`)
# show possible options for args
subopt(""; args=`-h`)
All suboptimal structures in an energy range
Generate all suboptimal structures in an energy range for a nucleic
acid sequence using the AllSub program from RNAstructure.
Additional information on the AllSub program and possible
command-line options that can be passed via args can be found at
the RNAstructure AllSub
documentation.
subopt_all("GGGAAACCC")
# maximum absolute energy difference of 10 kcal/mol to the MFE, up to
# 500 percent relative difference to MFE
subopt_all("GGGGAAACCCC"; args=`-a 10 -p 500`)
# set temperature to 300 K
subopt_all("GGGGAAACCCC"; args=`-T 300`)
# show possible options for args
subopt_all(""; args=`-h`)
Partition function (ensemble energy)
The partfn function calculates the partition function and returns
the ensemble free energy for a nucleotide sequence.
Additional information on the EnsembleEnergy program and possible
command-line options that can be passed via args can be found at
the RNAstructure EnsembleEnergy
documentation.
partfn("GGGAAACCC")
partfn("GGGAAACCC"; args=`--DNA`)
# show possible options for args_partition, args_maxexpect
partfn(""; args=`-h`)
Probability of a structure
The prob_of_structure function calculates the probability of a
secondary structure for a given nucleotide sequence.
The supported args are those common to energy and partfn.
prob_of_structure("GGGAAACCC", "(((...)))")
Maximum expected accuracy (MEA) structure
The mea function predicts the maximum expected accuracy structure
(and possibly suboptimals) for a nucleotide sequence.
Additional information on the partition program and possible
command-line options that can be passed via args_partition can be
found at the RNAstructure partition
documentation.
Additional information on the MaxExpect program and possible
command-line options that can be passed via args_maxexpect can be
found at the RNAstructure MaxExpect
documentation.
mea("GGGAAACCC")
mea("GGGAAACCC"; args_partition=`-T 300`, args_maxexpect=`-s 10 -w 0`)
# show possible options for args_partition, args_maxexpect
mea(""; args_partition=`-h`)
Free energy of folding
The energy function calls the efn2 program and parses its
output. It calculates the folding free energy and experimental
uncertainty of a sequence and one or more secondary structures.
Additional information on the efn2 program and possible command-line
options that can be passed via args can be found at the
RNAstructure efn2
documentation.
# returns energy and experimental uncertainty
energy("GGGAAACCC",
"(((...)))")
# pseudoknot
energy("GGGAAAAGGGAAAACCCAAAACCC",
"(((....[[[....)))....]]]")
# set temperature to 300 K
energy("GGGAAAAGGGAAAACCCAAAACCC",
"(((....[[[....)))....]]]";
args=`-T 300`)
# multiple structures, returns array of results
energy("GGGAAACCC",
["(((...)))",
"((.....))"])
# show possible options for args
energy("", ""; args=`-h`)
Basepair probabilities
The bpp function calls the partition and ProbabilityPlot
programs from RNAstructure to calculate the basepair probabilities for
an RNA sequence.
bpp("GGGAAACCC") # -> 9x9 Matrix
# show possible options for args
bpp(""; args=`-h`)
Sampling structures
Sample secondary structures from the Boltzmann ensemble of secondary structures.
Additional information on the stochastic program and possible
command-line options that can be passed via args can be found at
the RNAstructure stochastic
documentation.
# returns a 1000-element Vector{String}
sample_structures("GGGAAACCC")
# show possible options for args
sample_structures(""; args=`-h`)
Nucleotide cyclic motif model (CycleFold)
The cyclefold_* functions call the CycleFold program from
RNAstructure, which uses the nucleotide cyclic motif model by
(Parisien & Major, 2008). This model allows for non-canonical and
canonical basepairs.
NOTE: use the energy with caution --- i think the energy unit is kJ/mol, but i am not sure.
Additional information on the CycleFold program and possible
command-line options that can be passed via args can be found at the
RNAstructure CycleFold
documentation.
cyclefold_mea("GGGAAACCC") # -> [9, 8, 7, 6, 0, 4, 3, 2, 1]
cyclefold_mfe("GGGAAACCC") # -> (-7.8305 kJ mol^-1, [9, 8, 7, 6, 0, 4, 3, 2, 1])
cyclefold_bpp("GGGAAACCC") # -> 9×9 Matrix{Float64}
# show possible options for args
cyclefold_mea(""; args=`-h`)
Sequence design
The design function calls the design program from RNAstructure.
Additional information on the design program and possible
command-line options that can be passed via args can be found at
the RNAstructure design
documentation.
target = "(((...)))"
# returns designed sequence and random seed used for design
design(target)
# set the random number seed used by the design process
seed = 42
design(target; args=`-s $seed`)
# show possible options for args
design(""; args=`-h`)
Ensemble defect
The ensemble_defect function calls the EDcalculator program from
RNAstructure. It calculates the ensemble defect and normalised
ensemble defect of a sequence and one or more secondary structures.
Additional information on the EDcalculator program and possible
command-line options that can be passed via args can be found at
the RNAstructure EDcalculator
documentation.
seq = "GGGAAACCC"
dbn = "(((...)))"
dbns = [dbn, "((.....))"]
ensemble_defect(seq, dbn)
ensemble_defect(seq, dbns)
ensemble_defect("AAACCCTTT", "(((...)))"; args=`-a dna`)
# show possible options for args
ensemble_defect("", ""; args=`-h`)
Remove pseudoknots
The remove_pseudoknots function returns the pseudoknot-free
substructure with the maximum possible basepairs.
remove_pknots("(((...[[[[...)))...]]]]") # -> "......((((.........))))"
dbn2ct: convert dot-bracket notation to ct format
This
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
