SkillAgentSearch skills...

Converters

Format Converters: bam2bigwig, FitHiC2bigInteract/longRange, HiCPro2JuiceBox, bedGrapgh2bigwig; GTF/BAM deduplicates

Install / Use

/learn @zhaoshuoxp/Converters
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Format Converters


This repository has the following combined shell/awk/python scripts which can be used for format converting with common high-throughput sequencing data.

  • bam2bigwig.sh: BAM to bigWig for genome browser visualization.
  • BedGraph2bigwig.sh: BedGraph(output of MACS2) to bigWig for genome browser visualization.
  • loop2bigInteract.sh: FitHiC and HiCCUPs output to bigInteract format for WashU Epigenome Browser visualization.
  • HiCpro2Juicebox.sh: HiCPro output to Juicebox for HiC/HiChIP interaction visualization.
  • GTF_rmdup.sh: deduplicate transcripts in GTF format.
  • rmdup_rdm.sh: deduplicate alignments RANDOMLY by picard in BAM format.
  • mus2hum.R: convert mouse gene symbols to human by homological search.

Requirements: awk, python3, bedtools, picard.jar, bgzip, tabix, UCSC Genome Browser utility:bedGraphToBigWig, bedItemOverlapCount, gtfToGenePred, genePredToBed, bedClip, bedToBigBed, R, biomaRt.

996.icu LICENSE


bam2bigwig.sh

This script is separated from ChIPseq.sh

Usage

./bam2bigwig.sh input.bam 

Output

  • input.bw

BedGraph2bigwig.sh

This script is from macs2

Usage

./BedGraph2bigwig.sh input.bam hg19_len

hg19_len can be download by:

curl -s ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz | gunzip -c > hg19_len

Output

  • input.bw

loop2bigInteract.sh

This script converts FitHiC or HiCCUPS output to bigInteract format for Genome Browser visualization. q value (FitHiC) and raw reads counts (HiCCUPs) will be used for color depth.

####Input

FitHiC output (q value filtered), i.e all_10k_2m.spline_pass2.significants.txt

HiCCUPs output, i.e. merged_loops.bedpe

Options

help message can be shown by loop2bigInteract.sh -h

Usage: loop2bigInteract.sh -h | [-f -r <res>] <input file>

### INPUT: FitHiC or HiCCUPS output files ###
All results will be store in current (./) directory.
### bedToBigBed/sortBed required ###

  Options:
    -f input is FitHiC
    -r [int] resolution of FitHiC
    -c input is HiCCUPs
    -h Print this help message

Example

# FitHiC output
loop2bigInteract.sh -f -r 5000 all_10k_2m.spline_pass2.significants.txt
# HiCCUPS output
loop2bigInteract.sh -c merged_loops.bedpe 

Output

  • bigInteract file (.bb) with prefix kept will be stored in the current directory.

HiCpro2Juicebox.sh

This script comes from HiCPro.

Usage

./HiCpro2Juicebox.sh -i test.allValidPairs -g hg19 -j /path/to/juicer_tools.jar

-r|--resfrag somehow doesn't work. See more.

Output

  • .hic file

GTF_rmdup.sh

This script removes transcript duplicates by converting to BED12 and sorting by column1,2,3,11,12.

Usage

./GTF_rmdup.sh input.gtf

Output

  • input_uniq.gtf

rmdup_rdm.sh

This script removes alignment duplicates RANDOMLY (no SNP bias) by picard.jar. BAM file has to be sorted.

Usage

./rmdup_rdm.sh sort.bam sort_rm.bam

Output

  • sort_rm.bam

mus2hum.R

This script converts mouse gene symbols to human by homological search. R and biomaRt are required.

Usage

./mus2hum.R input.txt

Input

Text file of mouse gene symbols, a gene per row at first column.

Output

input2hum.txt. The homolog human gene symbols are added to the beginning of each row.


Author @zhaoshuoxp
Nov 7 2023

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated5mo ago
Forks3

Languages

Shell

Security Score

67/100

Audited on Oct 3, 2025

No findings