Plasmer

An accurate and sensitive bacterial plasmid identification tool based on deep machine-learning of shared k-mers and genomic features.

Generate Convert Improve

Install / Use

/learn @nekokoe/Plasmer

About this skill

Quality Score

0/100

README

Plasmer

An accurate and sensitive bacterial plasmid identification tool based on deep machine-learning of shared k-mers and genomic features.

System Requirements

Currently tested on CentOS 7 and Ubuntu 20.04, should be working on other Linux releases
A minimum of 32GB system memory is required for kmer-db to load the databases
The AVX instruction is required (required by kmer-db)

Before Running

Please download and decompress our pre-built database.

The pre-built database are available at Zenodo and Google Drive.

The link contains two file, plasmerMainDB.tar.xz and customizedKraken2DB.tar.xz.

Check the sha1sum:

$ sha1sum plasmerMainDB.tar.xz 
0b08f5c30d60b137f54de6024ab7557031850db6  plasmerMainDB.tar.xz

$ sha1sum customizedKraken2DB.tar.xz 
b14efdd9232fd5f6d066716bd8e3e6ca80c9c0de  customizedKraken2DB.tar.xz

Extract the contents into the same directory, and provide the absolute path of the directory to the -d parameter on the command line.

Installation

We recommend run Plasmer with Docker, with Docker you do not need to figure out how to install Plasmer. However, run Plasmer in shell directly on Linux is also feasible.

Install Plasmer using conda

You can simply install Plasmer using conda:

conda install -c iskoldt -c bioconda -c conda-forge -c defaults plasmer

Install Plasmer from scratch

If you do not use conda, here is the tutorial for you to install Plasmer from scratch:

Be sure you installed all the required dependencies first, the required dependencies:

seqkit 2.2.0
python 3.10.4 （gzip; os; sys; Bio）
perl v5.26.2
kmer-db 1.9.4
Prodigal V2.6.3
HMMER 3.3.2
BLAST 2.10.1+
INFERNAL 1.1.4
diamond v2.0.8.146
GNU Parallel 20220722
Kraken version 2.1.2
R version 4.2.0 （hash; randomForest 4.7-1.1）

Then download Plasmer from GitHub:

git clone https://github.com/nekokoe/Plasmer.git
cd Plasmer
export PATH=$PATH:$(pwd)

Add the current directory to your PATH environment variable permanently:

echo 'export PATH=$PATH:'$(pwd) >> ~/.bashrc && source ~/.bashrc

Usage

Plasmer -g input_fasta -p out_prefix -d db -t threads -m minimum_length -l length -o outpath

The parameters:

-h	--help				Print the help info and exit.

-v	--version			Print the version info.

-g	--genome			The input fasta. [required]

-p	--prefix			The prefix for intermediate files and results. [Default: output]

-d	--db				The path of pre-built Plasmer databases. [required]

-t	--threads			Number of threads. [Default: 8]

-m	--minimum_length	The minimum length(bp) of sequences, the sequences shorter than the length will be dropped. [Default: 500]
		
-l	--length			The length(bp) threshold of sequences as chromosome to filtered. If set 0, no sequence are filtered, all sequences will be predicted. [Default: 500000]

-o	--outpath			The outpath. [required]

Run Plasmer with Docker

With docker, you don't have to install any of the dependencies. See more about Docker

Download the Docker image first:

docker pull nekokoe/plasmer:latest

Assuming the input FASTA file was deposited in {inputfilepath}/input.fasta

Run the following command to get result in {outputfilepath}

You can replace input.fasta with the actual name of your file.

docker run -d --rm --name plasmer \
	-v {inputfilepath}:/input \
	-v {outputfilepath}:/output \
	-v {databasepath}:/db \
	 nekokoe/plasmer:latest \
	/bin/sh /scripts/Plasmer \
	-g /input/input.fasta \
	-p {prefix} \
	-d /db \
	-t {threadnumber} \
	-m 500 \
	-l 500000 \
	-o /output

Replace with your own input: {inputfilepath} : Absolute path contains input.fasta in your file system

{outputfilepath} : Absolute path for output in your file system

{databasepath} : Absolute path for the downloaded pre-built Plasmer database

{prefix} : Prefix for intermediate and output files

{threadnumber} : Number of CPUs wish to use

dockerrun_batch.sh

We also provide a bash shell script that runs the Docker for you, if you have many input files in a directory.

bash dockerrun_batch.sh /input/files/path /output/files/path /database/path CPU_threads minimum_length length

Output

In the outpath/results, 5 files are generated, including:

prefix.plasmer.predProb.tsv

prefix.plasmer.predClass.tsv

prefix.plasmer.predPlasmids.taxon

prefix.plasmer.predPlasmids.fa

prefix.plasmer.shorterM.fasta

Have a look at result_example folder of the Github repository:

The example.plasmer.predProb.tsv: The probability of each contig classified to chromosome and plasmid.

| Contig | chromosome | plasmid | | --- | --- | --- | | contig_1 | 0.832 | 0.168 | | contig_2 | 0.952 | 0.048 | | contig_3 | 0.022 | 0.978 | | contig_4 | 0.984 | 0.016 | | contig_5 | 0 | 1 | | contig_6 | 0 | 1 | | contig_7 | 0.906 | 0.094 | | contig_8 | 0 | 1 | | contig_9 | 0.84 | 0.16 | | contig_10 | 0 | 1 |

The example.plasmer.predClass.tsv: The class of each contig.

| Contig | Type | | --- | --- | | contig_1 | chromosome | | contig_2 | chromosome | | contig_3 | plasmid | | contig_4 | chromosome | | contig_5 | plasmid | | contig_6 | plasmid | | contig_7 | chromosome | | contig_8 | plasmid | | contig_9 | chromosome | | contig_10 | plasmid |

The example.plasmer.predPlasmids.taxon: The taxonomy of each predicted plasmid contig.

| Contig | Taxonomy ID | | --- | --- | | contig_1 | Enterococcus faecium (taxid 1352) | | contig_2 | Enterococcus faecium (taxid 1352) | | contig_3 | Enterococcus faecium (taxid 1352) | | contig_4 | Enterococcus faecium (taxid 1352) | | contig_5 | Enterococcus faecium (taxid 1352) | | contig_6 | Enterococcus faecium Aus0085 (taxid 1305849) | | contig_7 | Enterococcus faecium (taxid 1352) | | contig_8 | Enterococcus faecium (taxid 1352) | | contig_9 | Enterococcus faecium (taxid 1352) | | contig_10 | Enterococcus faecium (taxid 1352) |

The example.plasmer.predPlasmids.fa: The sequences of predicted plasmid contigs.

The prefix.plasmer.shorterM.fasta contains the sequences filtered out by the -m parameter.

Prediction results of other tools

Download the results of other tools from Zenodo or Google Drive.

Feedback

Your feedback, bug-report and suggestions are welcomed to nekokoe (at) qq.com and husn (at) im.ac.cn

License

This project is licensed under the terms of the MIT license.

Related Skills

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>

last30days-skill

13.4k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary