SkillAgentSearch skills...

RNAstructure.jl

Julia interface to the RNAstructure program suite for RNA structure prediction and analysis

Install / Use

/learn @marcom/RNAstructure.jl
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

RNAstructure.jl

Build Status Aqua QA

Unofficial Julia interface to the RNAstructure program suite for RNA structure prediction and analysis. Please cite the appropriate publications listed on the RNAstructure website if you use this library.

Installation

Enter the package mode from the Julia REPL by pressing ] and then install with

add RNAstructure

Usage

using RNAstructure

Note: sequence conventions

Sequences passed to RNAstructure use the following convention:

  • uppercase character: normal nucleotide, U equivalent to T
  • lowercase character: nucleotide cannot form basepairs
  • X or N character: unknown base or base that cannot interact with others (cannot pair or stack)

See the RNAstructure manual section for sequences for more details.

Some programs make exceptions to these rules, check the manual pages of the RNAstructure programs for details on any differences.

Note: Overriding energy parameter directories

The environment variables RNASTRUCTURE_JL_DATAPATH can be set to override the directory where energy parameters are read from. For the cyclefold_* functions the environment variable is called RNASTRUCTURE_JL_CYCLEFOLD_DATAPATH.

In the original RNAstructure program these environment variables are called DATAPATH and CYCLEFOLD_DATAPATH. RNAstructure.jl (this package) sets these environment variables automatically to the corresponding installation directory of the RNAstructure_jll binary package. The names of the env vars were changed to avoid clashes with possible settings you might already have in your shell startup files from a pre-existing manual RNAstructure installation, which could be a different version and have different parameters. In this way, you can be sure that this package uses the correct parameters, while still allowing to override them if necessary.

Minimum free energy (MFE) and structure

The mfe function calculates the minimum free energy and the corresponding minimum free energy structure of an RNA sequence. Internally, this function calls the Fold program from RNAstructure.

Additional information on the Fold program and possible command-line options that can be passed via args can be found at the RNAstructure Fold documentation.

# returns mfe and structure
mfe("GGGAAACCC")              # -> (-1.2 kcal mol^-1, "(((...)))")

# set temperature to 300 K
mfe(seq; args=`-T 300`)    # -> (-1.9 kcal mol^-1, "(((...)))")

# show possible options for args
mfe(""; args=`-h`)

Suboptimal structures

Generate suboptimal structures for a nucleic acid sequence. Internally, this function calls the Fold program from RNAstructure.

Additional information on the Fold program and possible command-line options that can be passed via args can be found at the RNAstructure Fold documentation.

subopt("GGGAAACCC")
subopt("GGGGAAACCCC"; args=`-w 0 -p 100`)

# show possible options for args
subopt(""; args=`-h`)

All suboptimal structures in an energy range

Generate all suboptimal structures in an energy range for a nucleic acid sequence using the AllSub program from RNAstructure.

Additional information on the AllSub program and possible command-line options that can be passed via args can be found at the RNAstructure AllSub documentation.

subopt_all("GGGAAACCC")

# maximum absolute energy difference of 10 kcal/mol to the MFE, up to
# 500 percent relative difference to MFE
subopt_all("GGGGAAACCCC"; args=`-a 10 -p 500`)

# set temperature to 300 K
subopt_all("GGGGAAACCCC"; args=`-T 300`)

# show possible options for args
subopt_all(""; args=`-h`)

Partition function (ensemble energy)

The partfn function calculates the partition function and returns the ensemble free energy for a nucleotide sequence.

Additional information on the EnsembleEnergy program and possible command-line options that can be passed via args can be found at the RNAstructure EnsembleEnergy documentation.

partfn("GGGAAACCC")

partfn("GGGAAACCC"; args=`--DNA`)

# show possible options for args_partition, args_maxexpect
partfn(""; args=`-h`)

Probability of a structure

The prob_of_structure function calculates the probability of a secondary structure for a given nucleotide sequence.

The supported args are those common to energy and partfn.

prob_of_structure("GGGAAACCC", "(((...)))")

Maximum expected accuracy (MEA) structure

The mea function predicts the maximum expected accuracy structure (and possibly suboptimals) for a nucleotide sequence.

Additional information on the partition program and possible command-line options that can be passed via args_partition can be found at the RNAstructure partition documentation.

Additional information on the MaxExpect program and possible command-line options that can be passed via args_maxexpect can be found at the RNAstructure MaxExpect documentation.

mea("GGGAAACCC")

mea("GGGAAACCC"; args_partition=`-T 300`, args_maxexpect=`-s 10 -w 0`)

# show possible options for args_partition, args_maxexpect
mea(""; args_partition=`-h`)

Free energy of folding

The energy function calls the efn2 program and parses its output. It calculates the folding free energy and experimental uncertainty of a sequence and one or more secondary structures.

Additional information on the efn2 program and possible command-line options that can be passed via args can be found at the RNAstructure efn2 documentation.

# returns energy and experimental uncertainty
energy("GGGAAACCC",
       "(((...)))")

# pseudoknot
energy("GGGAAAAGGGAAAACCCAAAACCC",
       "(((....[[[....)))....]]]")

# set temperature to 300 K
energy("GGGAAAAGGGAAAACCCAAAACCC",
       "(((....[[[....)))....]]]";
       args=`-T 300`)

# multiple structures, returns array of results
energy("GGGAAACCC",
      ["(((...)))",
       "((.....))"])

# show possible options for args
energy("", ""; args=`-h`)

Basepair probabilities

The bpp function calls the partition and ProbabilityPlot programs from RNAstructure to calculate the basepair probabilities for an RNA sequence.

bpp("GGGAAACCC")  # -> 9x9 Matrix

# show possible options for args
bpp(""; args=`-h`)

Sampling structures

Sample secondary structures from the Boltzmann ensemble of secondary structures.

Additional information on the stochastic program and possible command-line options that can be passed via args can be found at the RNAstructure stochastic documentation.

# returns a 1000-element Vector{String}
sample_structures("GGGAAACCC")

# show possible options for args
sample_structures(""; args=`-h`)

Nucleotide cyclic motif model (CycleFold)

The cyclefold_* functions call the CycleFold program from RNAstructure, which uses the nucleotide cyclic motif model by (Parisien & Major, 2008). This model allows for non-canonical and canonical basepairs.

NOTE: use the energy with caution --- i think the energy unit is kJ/mol, but i am not sure.

Additional information on the CycleFold program and possible command-line options that can be passed via args can be found at the RNAstructure CycleFold documentation.

cyclefold_mea("GGGAAACCC")  # -> [9, 8, 7, 6, 0, 4, 3, 2, 1]
cyclefold_mfe("GGGAAACCC")  # -> (-7.8305 kJ mol^-1, [9, 8, 7, 6, 0, 4, 3, 2, 1])
cyclefold_bpp("GGGAAACCC")  # -> 9×9 Matrix{Float64}

# show possible options for args
cyclefold_mea(""; args=`-h`)

Sequence design

The design function calls the design program from RNAstructure.

Additional information on the design program and possible command-line options that can be passed via args can be found at the RNAstructure design documentation.

target = "(((...)))"

# returns designed sequence and random seed used for design
design(target)

# set the random number seed used by the design process
seed = 42
design(target; args=`-s $seed`)

# show possible options for args
design(""; args=`-h`)

Ensemble defect

The ensemble_defect function calls the EDcalculator program from RNAstructure. It calculates the ensemble defect and normalised ensemble defect of a sequence and one or more secondary structures.

Additional information on the EDcalculator program and possible command-line options that can be passed via args can be found at the RNAstructure EDcalculator documentation.

seq = "GGGAAACCC"
dbn = "(((...)))"
dbns = [dbn, "((.....))"]
ensemble_defect(seq, dbn)
ensemble_defect(seq, dbns)
ensemble_defect("AAACCCTTT", "(((...)))"; args=`-a dna`)

# show possible options for args
ensemble_defect("", ""; args=`-h`)

Remove pseudoknots

The remove_pseudoknots function returns the pseudoknot-free substructure with the maximum possible basepairs.

remove_pknots("(((...[[[[...)))...]]]]")  # -> "......((((.........))))"

dbn2ct: convert dot-bracket notation to ct format

This

Related Skills

View on GitHub
GitHub Stars11
CategoryDevelopment
Updated9mo ago
Forks1

Languages

Julia

Security Score

87/100

Audited on Jun 8, 2025

No findings