CompRanking
Metagenomic resistome risk ranking pipeline. It can comprehensively ranking the antimicrobial resistance risk of environmental metagenomic samples, also known as CompRanking.
Install / Use
/learn @GaoyangLuo/CompRankingREADME
🧬 CompRanking
CompRanking is a pipeline designed for the comprehensive assessment of antimicrobial resistance (AMR) in metagenomic samples. It ranks the resistome risk by analyzing:
- Abundance of antibiotic resistance genes (ARGs)
- Mobility of ARGs via mobile genetic elements (MGEs)
- The potential of ARGs to be acquired by pathogens at the contig level
CompRanking integrates these features by identifying the co-occurrence of ARGs and MGEs on the same contigs, and estimating their pathogenic potential.
📌 Citation
📰 News: CompRanking has been published in Environmental Science & Technology!
If you use CompRanking in your research, please cite the following paper:
Luo, G., et al. (2025). Determining Antimicrobial Resistance in the Plastisphere: Lower Risks of Nonbiodegradable vs Higher Risks of Biodegradable Microplastics. Environmental Science & Technology. https://doi.org/10.1021/acs.est.5c00246
🚀 Getting Started
🔧 Installing
Step 1: Change the current working directory to the location where you want the cloned CompRanking directory to be made. Step 2: Clone the repository using git command
git clone https://github.com/GaoyangLuo/CompRanking.git
cd CompRanking
🌎 Environment settings
1. Set conda path
CompRanking relies on multi conda environments. Before run the demo test, conda bin path should be pre-requisit. Please set your absolute bin path of miniconda. For example, your absolute bin path is /home/username/miniconda3/bin.
Edit the test_yaml.yaml file
vi test_yaml.yaml
Update path, re-write the real path of miniconda/bin
CompRanking:
abs_path_to_conda_bin: /your_real_path/miniconda/bin #don't use "~" or "./", please use absolute path
❗ Please do not use relative paths like ~ or ./.
2. Create environment
Please firstly set up all the environment by the following commands. These commands will help to config all the environment needed.
conda env create -f CompRanking.yaml
conda activate CompRanking
bash setup.sh
pip install MicrobeCensus
📦 Databse download
You can download the databases via:
🔗 https://doi.org/10.5281/zenodo.8073486.
Or run the command lines below.
wget https://zenodo.org/record/8073486/files/CompRanking_database_v1.tar.gz?download=1
wget https://zenodo.org/record/8073486/files/localDB.zip?download=1
tar -zxvf CompRanking_database_v1.tar.gz && mv CompRanking_database_v1 databases
unzip localDB.zip
🧪 Demo test
We provided a set of data for test.
python cpr_multiprocess.py -i test_data -t 4 -r 1 -p test_demo
🔍 Running AMR risk ranking
Step 1: Gene prediction can generate contextural information of AMR and pathogen information of the whole metagenome. Run the command line below:
python cpr_multiprocess.py -i <input_dir> -t <threads> -r <if_restart> -p <project_name_prefix>
Parameters:
- -i, <input_dir> contains all the fastq files and fasta files. Files of the the sample should be named using identical
<prefix>. For example,FileNameOne_1.fq,FileNameOne_2.fqandFileNameOne.farepresents the pair-end reads fastq files (after quality control) and the assembly file (containing contigs and pleast do not cut into your customed length, default_min_length=500, which cannot be altered). - -t, <threads>, the threads you want to use to run the process (Default=16).
- -r, <if_restart> 0 or 1. 0 means continue to run after the last break up point. 1 means re-start from the begeining.
- -p, <project_name_prefix> You should set a project name here, or use the default name CompRanking.
🧮 Step 2: Generate a risk score and corresponding valuse of each sample. In this step, you can acquire various parameters such as how many ARGs-carried contigs or phage- or plasmids-related contigs in your samples. Please run the command line below:
python ./compranking/baseInfoExtra_nContigs.py -i <input_dir> -p <project_name_prefix>
<div style="overflow-x: auto;">
<table>
<tr>
<th>sample_name/index</th>
<th>nContigs</th>
<th>nARGs_contigs</th>
<th>nMGEs_contig</th>
<th>nMGEs_plasmid_contig</th>
<th>nMGEs_phage_contigs</th>
<th>nPAT_contigs</th>
<th>nARGs_MGEs_contig</th>
<th>nARGs_MGEs_plasmid_contigs</th>
<th>nARGs_MGEs_phage_contigs</th>
<th>nARGs_MGEs_PAT_contigs </th>
<th>fARG </th>
<th>fMGE </th>
<th>fMGE_plasmid </th>
<th>fMGE_phage </th>
<th>fPAT</th>
<th>fARG_MGE </th>
<th>fARG_MGE_plasmid </th>
<th>fARG_MGE_phage</th>
<th>fARG_MGE_PAT</th>
<th>score_pathogenic</th>
<th>score_phage</th>
<th>score_plasmid</th>
<!-- Add more columns as needed -->
</tr>
<tr>
<td>Sample2</td>
<td>433650 </td>
<td> 482</td>
<td> 327780</td>
<td> 270746</td>
<td> 32238</td>
<td> 397572</td>
<td> 357</td>
<td> 304</td>
<td> 28</td>
<td> 61</td>
<td> 0.0011</td>
<td> 0.7558</td>
<td> 0.6247</td>
<td> 0.0746</td>
<td> 0.9168</td>
<td> 0.0008</td>
<td> 0.0007</td>
<td> 6.4568</td>
<td> 0.0001</td>
<td> 23.1492</td>
<td> 20.7351</td>
<td> 22.7372</td>
<!-- Add more data as needed -->
<tr>
<td>Sample1</td>
<td>433650 </td>
<td> 482</td>
<td> 327780</td>
<td> 270746</td>
<td> 32238</td>
<td> 397572</td>
<td> 357</td>
<td> 304</td>
<td> 28</td>
<td> 61</td>
<td> 0.0011</td>
<td> 0.7558</td>
<td> 0.6247</td>
<td> 0.0746</td>
<td> 0.9168</td>
<td> 0.0008</td>
<td> 0.0007</td>
<td> 6.4568</td>
<td> 0.0001</td>
<td> 23.1492</td>
<td> 20.7351</td>
<td> 22.7372</td>
<!-- Add more rows as needed -->
</tr>
<!-- Add more rows as needed -->
</table>
</div>
Additional functions
🔍 Quantification of genes
🧬 Step 1: Calculating scg and AGS Before Step2 we, we need to generate average geneome equivalents (AGS) and single copy genes (scg) files.
❗Note: We recomend use per genome equivalents (AGS) or per cell copy (scg) as normalization bases to quantification genes and do cross-sample comparison. Details please refer to our article and supporting information:
Luo, G., et al. (2025). Determining Antimicrobial Resistance in the Plastisphere: Lower Risks of Nonbiodegradable vs Higher Risks of Biodegradable Microplastics. Environmental Science & Technology. https://doi.org/10.1021/acs.est.5c00246
To generate AGS files, following the command lines below:
cd $PATH_TO_CompRanking
vi $PATH_TO_CompRanking/scripts/AGS.sh #change your input_dir in the workdir
bash $PATH_TO_CompRanking/scripts/AGS.sh
📈 Step 2: After finishing all the prediction steps, we should calculate the relative abundance of functional genes, run the command line below:
python ./compranking/multiGeneCal_metagenome_rpkg_scg_geneName.py
-i <input_dir>
-p <project_prefix>
-n AGS
-t 16
-d <pth2KK2db> #this option is for cell copy normalized by sequence abundance, need to run multiGeneCal_16s.py
The output demo is like below:
| ARG_name | Class | Database | MGE_type |Sample_1 |Sample_2 |Sample_3 |
|----------|---------|----------|----------|---------|---------|---------|
| AAC(2')-I | aminoglycoside | DeepARG | Unknown |0.003 |0.004 |0.006 |
| ERMB | macrolide | RGI | phage/plasmid |0.002 |0.003 |0.004 |
| SUL3 | sulfonamide | SARG | plasmid |0.001 |0.003 |0.005 |
note: phage/plasmid means ARGs found to be co-located with phage- or plasmid-like contig in one sample (microbial community).
plasmid means only found to be co-located with plasmid-like contig. Unknown means not to be found co-located with any MGEs, but not representing it is not co-located with any MGEs, probably due to the accuracy and recall of identification method.
📊 How to calculate each ARG class and their carriers counts
Use the jupyter notebook MGE_carried_ARGs_type_count.ipynb to calculate. The metadata record the number of five types of elements that co-exist with ARGs: plasmid, phage, unclassified (can be any type of sequences, including chromosome or other unknown or unidentified MGEs), IS (Insertion Sequence), IE (Integrated Elements). Table will be generated like this:
| |Sample1_x |Sample2_y |Sample3_z |Sample3_m |Sample3_n |
|---------|----------|----------|----------|----------|----------|
| aminoglycoside |33 |35 |36 |37 |38 |
| macrolide |44 |45 |46 |37 |38 |
| tetracycline |22 |23 |24 |37 |38 |
Legend for suffixes:
sampleName_x: #plasmid
sampleName_y: #phage
sampleName_z: #unclassified
sampleName_m: #IS
sampleName_n: #IE
🔁 Re-running if pipeline halted
Every process will generte a checkpointing file in the repo checkdone, with file name like <Your_Project_Name>.index_build.done. If you want to re-run the pipeline from the last broken step, you can set the parameter -r as 0, which means don't re-run from the beginning. If you set 1, means you want to re-run from the beginning. You can also delete the .done file if you want to re-run the speicific step. We make this pipeline able to identify which step you have run and which one is not completed.
- To continue from the last step: set -r 0
- To restart from scratch: set -r 1 or delete the corresponding .done file
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
