Converters
Format Converters: bam2bigwig, FitHiC2bigInteract/longRange, HiCPro2JuiceBox, bedGrapgh2bigwig; GTF/BAM deduplicates
Install / Use
/learn @zhaoshuoxp/ConvertersREADME
Format Converters
This repository has the following combined shell/awk/python scripts which can be used for format converting with common high-throughput sequencing data.
- bam2bigwig.sh: BAM to bigWig for genome browser visualization.
- BedGraph2bigwig.sh: BedGraph(output of MACS2) to bigWig for genome browser visualization.
- loop2bigInteract.sh: FitHiC and HiCCUPs output to bigInteract format for WashU Epigenome Browser visualization.
- HiCpro2Juicebox.sh: HiCPro output to Juicebox for HiC/HiChIP interaction visualization.
- GTF_rmdup.sh: deduplicate transcripts in GTF format.
- rmdup_rdm.sh: deduplicate alignments RANDOMLY by picard in BAM format.
- mus2hum.R: convert mouse gene symbols to human by homological search.
Requirements: awk, python3, bedtools, picard.jar, bgzip, tabix, UCSC Genome Browser utility:bedGraphToBigWig, bedItemOverlapCount, gtfToGenePred, genePredToBed, bedClip, bedToBigBed, R, biomaRt.
bam2bigwig.sh
This script is separated from ChIPseq.sh
Usage
./bam2bigwig.sh input.bam
Output
- input.bw
BedGraph2bigwig.sh
This script is from macs2
Usage
./BedGraph2bigwig.sh input.bam hg19_len
hg19_len can be download by:
curl -s ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz | gunzip -c > hg19_len
Output
- input.bw
loop2bigInteract.sh
This script converts FitHiC or HiCCUPS output to bigInteract format for Genome Browser visualization. q value (FitHiC) and raw reads counts (HiCCUPs) will be used for color depth.
####Input
FitHiC output (q value filtered), i.e all_10k_2m.spline_pass2.significants.txt
HiCCUPs output, i.e. merged_loops.bedpe
Options
help message can be shown by loop2bigInteract.sh -h
Usage: loop2bigInteract.sh -h | [-f -r <res>] <input file>
### INPUT: FitHiC or HiCCUPS output files ###
All results will be store in current (./) directory.
### bedToBigBed/sortBed required ###
Options:
-f input is FitHiC
-r [int] resolution of FitHiC
-c input is HiCCUPs
-h Print this help message
Example
# FitHiC output
loop2bigInteract.sh -f -r 5000 all_10k_2m.spline_pass2.significants.txt
# HiCCUPS output
loop2bigInteract.sh -c merged_loops.bedpe
Output
- bigInteract file (.bb) with prefix kept will be stored in the current directory.
HiCpro2Juicebox.sh
This script comes from HiCPro.
Usage
./HiCpro2Juicebox.sh -i test.allValidPairs -g hg19 -j /path/to/juicer_tools.jar
-r|--resfrag somehow doesn't work. See more.
Output
- .hic file
GTF_rmdup.sh
This script removes transcript duplicates by converting to BED12 and sorting by column1,2,3,11,12.
Usage
./GTF_rmdup.sh input.gtf
Output
- input_uniq.gtf
rmdup_rdm.sh
This script removes alignment duplicates RANDOMLY (no SNP bias) by picard.jar. BAM file has to be sorted.
Usage
./rmdup_rdm.sh sort.bam sort_rm.bam
Output
- sort_rm.bam
mus2hum.R
This script converts mouse gene symbols to human by homological search. R and biomaRt are required.
Usage
./mus2hum.R input.txt
Input
Text file of mouse gene symbols, a gene per row at first column.
Output
input2hum.txt. The homolog human gene symbols are added to the beginning of each row.
Author @zhaoshuoxp
Nov 7 2023
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
