DiTing

DiTing: A pipeline to infer and compare biogeochemical pathways in metagenomic data

Generate Convert Improve

Install / Use

/learn @xuechunxu/DiTing

About this skill

Quality Score

0/100

README

DiTing

[!NOTE] 🚀 Major Update (v2.0): DiTing has been boosted to version 2! Compared to the initially published version, this release introduces several significant upgrades.

Rewrite using Snakemake: The entire pipeline has been rewritten using Snakemake, providing robust workflow management, better parallelization, and the ability to resume execution from breakpoints.

Upgrade annotation engine (kofamscan): We have replaced the manual hmmsearch parsing system with the standardized kofamscan engine for KEGG annotations. This effectively resolves the frequent parsing errors and compatibility issues previously encountered with raw hmmsearch outputs.

Bioconda: DiTing is now available on Bioconda, making it easier to install and manage dependencies.

Check out the new version in this fork.

Etymology

DiTing is a Chinese mythical creature who knows everything when he puts ears on the earth's surface. Parallelly, this program is developed to recognize biogeochemical cycles from environmental omic data accurately and efficiently.
谛听(DiTing) 若伏在地下，一霎时，便可将四大部洲山川社稷、洞天福地之间，蠃虫、鳞虫、毛虫、羽虫、昆虫，天仙、地仙、神仙、人仙、鬼仙，顾鉴善恶，察听贤愚。

Citation

To cite DiTing please use

Xue CX, Lin H, Zhu XY, Liu J, Zhang Y, Rowley G, Todd JD, Li M, Zhang XH. DiTing: A Pipeline to Infer and Compare Biogeochemical Pathways From Metagenomic and Metatranscriptomic Data. Front Microbiol. 2021 Aug 2;12:698286. doi: 10.3389/fmicb.2021.698286.

Introduction

DiTing is designed to determine the relative abundance of metabolic and biogeochemical functional pathways in a set of given metagenomic/metatranscriptomic data. The input is expected to be a folder containing a group of paired-end clean reads. These reads will be assembled, annotated, and parsed for producing a table of relative abundance of elemental/biogeochemical cycling pathways (e.g., Nitrogen, Carbon, Sulfur) in each sample. Sketch maps and heatmaps will also be produced accordingly for comparing biogeochemical functions visually.

Procedure

Dependencies

Megahit
SPAdes
Prodigal
bwa
BBMap
HMMER3
python3
Python modules:
- Pandas
- matplotlib
- opencv
- Pillow
- seaborn
KofamKOALA hmm database (ftp://ftp.genome.jp/pub/db/kofam/)
- ko_list.gz (ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz)
- profiles.tar.gz (ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz)

Installation

Recommended configuration:

CPU threads ≥ 8  
RAM ≥ 64 Gb

Option 1: Conda (recommended)

Configure conda environment

# order matters
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels silentgene

Set up a Diting environments

conda create -n diting-env

Activate diting-env and install DiTing program

conda activate diting-env
conda install -c silentgene diting

Deactivate diting-env

conda deactivate

Option 2: Repository from GitHub

Step 1. Download main scripts

git clone https://github.com/xuechunxu/DiTing.git or click the green button Clone or download and select download ZIP to download the repo and unzip manually.

Step 2. Download databases

DiTing requires KofamKOALA hmm database. This database will be downloaded and unzipped automatically on the first run. You can also download the database manually. This database should be stored in the same directory with the diting.py scripts.

# At the home directory of this program
mkdir kofam_database
cd kofam_database
wget -c ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz 
wget -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz 
gzip -d ko_list.gz
tar zxvf profiles.tar.gz

Step 3. Install the Dependencies

The Dependencies are required to be installed and added to the system $PATH

Running

1. One step running

diting.py -r <clean_reads_dir> -o <output_dir>  
diting.py -r <clean_reads_dir> -a <metagenomic_assembly> -o <output_dir>

Example reads run:

#download the example reads  
Google Drive:  
URL: https://drive.google.com/file/d/132605rtKuA-Xx--eh3aC7i5WIExNWl5k/view?usp=sharing
after download, run:
unzip Clean-reads_interleaved.zip

OR If you are in China, you can download from Baiduyun:  
URL: https://pan.baidu.com/s/1gFtJnz1G3pdEqBSFnUqFJw  
Password: diti

# run Example
diting.py -r Clean-reads_interleaved -o Clean-reads_interleaved.diting.out

The input is the <clean_reads_dir> folder containing a group of paired-end metagenomic clean reads, looks like:

sample_one_1.fastq
sample_one_2.fastq
sample_two_1.fastq
sample_two_2.fastq
sample_three_1.fastq
sample_three_2.fastq

The paired-end metagenomic clean reads should end with .fq, .fq.gz, .fastq, or .fastq.gz. The interleaved reads are also supported.

2. Optional parameter

2.1 --spades

Using metaSPAdes instead of megahit to assemble reads

Consider setting memory limitation by -m when usign SPAdes as assembler

-m(--memory) <int> default: 50 (in Gb)

2.2 -a (--assembly) metagenomic assembly

Path to a folder containing metagenomic assemblies corresponding to the provided reads, which is expected to have the same base name as the reads. The reads will not be assembled when this parameter was used.

python diting.py -r <clean_reads_dir> -a <metagenomic_assembly> -o <output_dir>

The <metagenomic_assembly> folder looks like:

sample_one.fa
sample_two.fa
sample_three.fa

2.3 Using interleaved paired-end reads

DiTing supports interleaved paired-end fastq files. Note that the reads type must be all interleaved or all separated.

e.g. [clean_reads_dir] content:
samples1.fq.gz
samples2.fq.gz
samples3.fq.gz
samples4.fq.gz

2.4 -n (--threads) number of threads

Number of threads to run (default: 4)

diting.py -r <clean_reads_Dir> -a <metagenomic_assembly> -o <output_dir> -n 20

2.5 --noclean

The sam files would be retained if this flag was used.

diting.py -r <clean_reads_dir> -a <metagenomic_assembly> -o <output_dir> -n 12 --noclean

2.6 -vis (--visualization) pathways_relative_abundance.tab

Visualization can also be executed independently, which allows users to adjust the final result table (e.g., merge some similar samples) before the visualization.

diting.py -vis <pathways_relative_abundance.tab>

3. Output

3.1 Table

pathways_relative_abundance.tab :The final result with the relative abundance of pathways in each sample.
ko_abundance_among_samples.tab : A table with the relative abundance of each k_number of KEGG annotation is produced in KEGG_annotation folder.

3.2 Visualization

carbon_cycle_sketch.png, nitrogen_cycle_sketch.png, DMSP_cycle_sketch.png and sulfur_cycle_sketch.png Sketch maps regarding carbon, nitrogen and sulfur cycles
carbon_cycle_heatmap.pdf, nitrogen_cycle_heatmap.pdf, sulfur_cycle_heatmap.pdf and other_cycle_heatmap.pdf Heatmaps regarding carbon, nitrogen, sulfur cycles and other pathways

Example: sketchlook like: <img src="./example/diting.out/sketch.png" width="792" height="624">

heatmaplook like: <img src="./example/diting.out/heatmap.png" width="792" height="627">

Copyright

Xue Chunxu, xuechunxu (at) outlook.com
Heyu Lin, heyu.lin (at) qut.edu.au
Xiaoyu Zhu, xiaoyuzhu321 (at) 126.com
Xiao-Hua Zhang, xhzhang (at) ouc.edu.cn
Lab of Microbial Oceanography
College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China

Related Skills

node-connect

354.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。