Rhmmer
Simple R utilities for working with HMMER
Install / Use
/learn @arendsee/RhmmerREADME
rhmmer
HMMER is a powerful package for profile HMM analysis. If
you want to interface with the web server through R, for example to search for
domains in a small number of proteins, consider using the
Bio3D package.rhmmer is
specifically designed for working with the standalone HMMER tool.
Installation
rhmmer is available on CRAN
install.packages('rhmmer')
Alternatively, you may install the github development version
library(devtools)
install_github('arendsee/rhmmer')
Examples
rhmmer currently exports exactly two functions: read_domtblout and
read_tblout. These read the hmmscan outputs specified by the --domtblout
and --tblout arguments, respectively.
The domtblout files have the format:
# --- full sequence --- -------------- this domain ------------- hmm coord ali coord env coord
# target name accession tlen query name accession qlen E-value score bias # of c-Evalue i-Evalue score bias from to from to from to acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
Rer1 PF03248.12 171 AT2G18240.1 - 221 2.3e-61 206.4 10.4 1 2 4.3e-65 7.1e-61 204.8 11.2 4 168 24 181 21 183 0.96 Rer1 family
Rer1 PF03248.12 171 AT2G18240.1 - 221 2.3e-61 206.4 10.4 2 2 0.43 7.2e+03 -3.6 0.0 55 67 187 199 184 217 0.69 Rer1 family
DUF4220 PF13968.5 352 AT5G45530.1 - 798 4.6e-78 263.0 0.0 1 1 1.5e-81 1.3e-77 261.5 0.0 1 344 51 396 51 427 0.78 Domain of unknown function (DUF4220)
DUF594 PF04578.12 55 AT5G45530.1 - 798 4.7e-24 83.6 1.5 1 1 1.3e-27 1.1e-23 82.4 1.5 3 55 726 778 724 778 0.96 Protein of unknown function, DUF594
DEAD PF00270.28 176 AT1G27880.2 - 890 6.7e-20 71.5 0.1 1 1 3.4e-23 1.9e-19 70.0 0.1 2 171 272 433 271 438 0.83 DEAD/DEAH box helicase
...
#
# Program: hmmscan
# Version: 3.1b2 (February 2015)
# Pipeline mode: SCAN
# Query file: five.faa
# Target file: /home/z/db/Pfam-A.hmm
# Option settings: hmmscan --tblout x.tblout --domtblout x.domtblout --pfamtblout x.pfamtblout --noali /home/z/db/Pfam-A.hmm five.faa
# Current dir: /home/z/src/git/rhmmer/tests/testthat/sample-data
# Date: Fri Dec 15 02:09:00 2017
# [ok]
This is tricky to parse. It is mostly space delimited, but spaces appear freely
in the description of target column. The column names, as given, cannot be
directly used since they 1) contain illegal characters and 2) are not unique
unless information from two rows is considered (e.g. ali_from versus
env_from). The metadata at the end of the file I do not currently extract,
though I will likely add handling for this in the future.
library(rhmmer)
domtblout <- system.file('extdata', 'example.domtblout.txt', package='rhmmer')
read_domtblout(domtblout)
## # A tibble: 70 x 23
## domain_name domain_accession domain_len query_name query_accession qlen
## <chr> <chr> <int> <chr> <chr> <int>
## 1 Rer1 PF03248.12 171 AT2G18240… - 221
## 2 Rer1 PF03248.12 171 AT2G18240… - 221
## 3 DUF4220 PF13968.5 352 AT5G45530… - 798
## 4 DUF594 PF04578.12 55 AT5G45530… - 798
## 5 DEAD PF00270.28 176 AT1G27880… - 890
## 6 Helicase_C PF00271.30 111 AT1G27880… - 890
## 7 Helicase_C PF00271.30 111 AT1G27880… - 890
## 8 ResIII PF04851.14 171 AT1G27880… - 890
## 9 ResIII PF04851.14 171 AT1G27880… - 890
## 10 TIR PF01582.19 176 AT1G56520… - 897
## 11 NB-ARC PF00931.21 288 AT1G56520… - 897
## 12 LRR_3 PF07725.11 20 AT1G56520… - 897
## 13 LRR_3 PF07725.11 20 AT1G56520… - 897
## 14 LRR_8 PF13855.5 61 AT1G56520… - 897
## 15 LRR_8 PF13855.5 61 AT1G56520… - 897
## 16 LRR_8 PF13855.5 61 AT1G56520… - 897
## 17 LRR_8 PF13855.5 61 AT1G56520… - 897
## 18 LRR_8 PF13855.5 61 AT1G56520… - 897
## 19 LRR_8 PF13855.5 61 AT1G56520… - 897
## 20 LRR_4 PF12799.6 43 AT1G56520… - 897
## 21 LRR_4 PF12799.6 43 AT1G56520… - 897
## 22 LRR_4 PF12799.6 43 AT1G56520… - 897
## 23 LRR_4 PF12799.6 43 AT1G56520… - 897
## 24 LRR_4 PF12799.6 43 AT1G56520… - 897
## 25 TIR_2 PF13676.5 102 AT1G56520… - 897
## 26 TIR_2 PF13676.5 102 AT1G56520… - 897
## 27 AAA_16 PF13191.5 170 AT1G56520… - 897
## 28 AAA_18 PF13238.5 130 AT1G56520… - 897
## 29 AAA_22 PF13401.5 137 AT1G56520… - 897
## 30 ATPase_2 PF01637.17 233 AT1G56520… - 897
## 31 PhoH PF02562.15 205 AT1G56520… - 897
## 32 PhoH PF02562.15 205 AT1G56520… - 897
## 33 NACHT PF05729.11 166 AT1G56520… - 897
## 34 AAA_23 PF13476.5 200 AT1G56520… - 897
## 35 AAA_23 PF13476.5 200 AT1G56520… - 897
## 36 ABC_tran PF00005.26 137 AT1G56520… - 897
## 37 NTPase_1 PF03266.14 168 AT1G56520… - 897
## 38 AAA_29 PF13555.5 61 AT1G56520… - 897
## 39 AAA_29 PF13555.5 61 AT1G56520… - 897
## 40 LRR_1 PF00560.32 23 AT1G56520… - 897
## 41 LRR_1 PF00560.32 23 AT1G56520… - 897
## 42 LRR_1 PF00560.32 23 AT1G56520… - 897
## 43 LRR_1 PF00560.32 23 AT1G56520… - 897
## 44 LRR_1 PF00560.32 23 AT1G56520… - 897
## 45 LRR_1 PF00560.32 23 AT1G56520… - 897
## 46 LRR_1 PF00560.32 23 AT1G56520… - 897
## 47 LRR_1 PF00560.32 23 AT1G56520… - 897
## 48 DAO PF01266.23 352 AT3G10370… - 629
## 49 DAO PF01266.23 352 AT3G10370… - 629
## 50 DAO_C PF16901.4 126 AT3G10370… - 629
## 51 FAD_bindin… PF00890.23 417 AT3G10370… - 629
## 52 FAD_bindin… PF00890.23 417 AT3G10370… - 629
## 53 FAD_oxidor… PF12831.6 433 AT3G10370… - 629
## 54 FAD_oxidor… PF12831.6 433 AT3G10370… - 629
## 55 Pyr_redox_2 PF07992.13 292 AT3G10370… - 629
## 56 FAD_bindin… PF01494.18 356 AT3G10370… - 629
## 57 FAD_bindin… PF01494.18 356 AT3G10370… - 629
## 58 GIDA PF01134.21 392 AT3G10370… - 629
## 59 GIDA PF01134.21 392 AT3G10370… - 629
## 60 Pyr_redox PF00070.26 81 AT3G10370… - 629
## 61 HI0933_like PF03486.13 409 AT3G10370… - 629
## 62 AAA_30 PF13604.5 192 AT3G10370… - 629
## 63 AAA_30 PF13604.5 192 AT3G10370… - 629
## 64 Pyr_redox_3 PF13738.5 305 AT3G10370… - 629
## 65 DUF4179 PF13786.5 88 AT3G10370… - 629
## 66 DUF4179 PF13786.5 88 AT3G10370… - 629
## 67 3HCDH_N PF02737.17 180 AT3G10370… - 629
## 68 3HCDH_N PF02737.17 180 AT3G10370… - 629
## 69 NAD_bindin… PF13450.5 68 AT3G10370… - 629
## 70 NAD_bindin… PF13450.5 68 AT3G10370… - 629
## sequence_evalue sequ
