HubPredictor

HubPredictor is the R function to run Bayesian Additive Regression Trees (BART) model for predicting chromatin interaction hubs using histone marks information.

Generate Convert Improve

Install / Use

/learn @huangjialiangcn/HubPredictor

About this skill

Quality Score

0/100

README

HubPredictor v1.0 beta

Overview

HubPredictor is the R function to run Bayesian Additive Regression Trees (BART) model for predicting chromatin interactions hubs using histone marks information. HubPredictor can also be used to predict topologically associated domain (TAD) boundaries.

Systems Requirements

HubPredictor was written in R language. It required the R package bartMachine, which is available in http://cran.r-project.org/web/packages/bartMachine/index.html.

Usage

Unzip the package. Change the current directory in R to the folder containing the scripts. HubPredictor.R is the R function to run BART model for predicting chromatin interactions hubs using histone marks information. HubPredictor contains two main steps: (1) read TrainingSet data & build BART regression model; (2) read input data & prediction using trained BART model. It takes three parameters: (1) file_in <required> is the input file containing histone marks density for each chromatin anchors; (2) file_out <required> is the output file, which generated the prediction information; (3) file_trainingset [optional] is an the dataset for training the BART model, which takes the file 'TrainingSet/Hub.txt' by default.

Example

source("HubPredictor.R")

HubPredictor(file_in='input.txt',file_out='output.txt',file_trainingset = 'TrainingSet/Hub.txt')

File Format

(1) file_in

A tab-delimited text file provides histone marks density for each chromatin anchors information, which were used as input for model prediction. Specifically, given a chromatin anchor, we summarized the local pattern for each histone mark by averaging the sequence reads over a 300kb window (about twice the average distance between an anchor and its target site) centered at the anchor location. It can be calculated using Bedtools coverage.

(2) file_out

A tab-delimited text file provides the prediciton information (in red), where group represents class predictions, and probability represents probability predictions.

(3) file_trainingset

A tab-delimited text file provides the training dataset to build BART model, which includes histone marks density of each chromatin anchor (same as file_in) and a column containing the group information. By default, the code will use the hubs defined based on the Hi-C dataset used in our paper ('TrainingSet/ Hub.txt'), which is used in our paper, to build BART model. User can define the hubs based on Hi-C dataset according to the method described in our paper. For TAD boundaries prediction, we also provided the TAD boundaries defined in the Hi-C dataset used in our paper ('TrainingSet/TADBoundary.txt'), to build BART model.

Reference

J. Huang, E. Marco, L. Pinello, G.C. Yuan. Predicting chromatin organization using histone marks. Genome Biology. 2015.

Contact

Jialiang Huang (jhuang at jimmy.harvard.edu or huangjialiangcn at gmail.com)

Related Skills

node-connect

346.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

346.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

346.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。