GetOrganelle

notice: please update to 1.7.5+, which fixed the bug on the multiplicity estimation of self-loop vertices.

This toolkit assemblies organelle genome from genomic skimming data.

It achieved the best performance overall both on simulated and real data and was recommended as the default for chloroplast genome assembly in a third-party comparison paper (Freudenthal et al. 2020. Genome Biology).

Please denote the version of GetOrganelle as well as the dependencies in your manuscript for reproducible science.

Citation: Jian-Jun Jin*, Wen-Bin Yu*, Jun-Bo Yang, Yu Song, Claude W. dePamphilis, Ting-Shuang Yi, De-Zhu Li. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21, 241 (2020). https://doi.org/10.1186/s13059-020-02154-5

License: GPL https://www.gnu.org/licenses/gpl-3.0.html

Please also cite the dependencies if used:

SPAdes: Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A. and Korobeynikov, A. 2020. Using SPAdes de novo assembler. Current protocols in bioinformatics, 70(1), p.e102.

Bowtie2: Langmead, B. and S. L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357-359.

BLAST+: Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer and T. L. Madden. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10: 421.

Bandage: Wick, R. R., M. B. Schultz, J. Zobel and K. E. Holt. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31: 3350-3352.

Installation & Initialization

GetOrganelle is currently maintained under Python 3.7.0, but designed to be compatible with versions higher than 3.5.1 and 2.7.11. It was built for Linux and macOS. Windows Subsystem Linux is currently not supported, we are working on this.

The easiest way to install GetOrganelle and its dependencies is using conda:
```
conda install -c bioconda getorganelle
```
You have to install Anaconda or Miniconda before using the above command. If you don't like conda, or want to follow the latest updates, you can find more installation options here (my preference).
After installation of GetOrganelle v1.7+, please download and initialize the database of your preferred organelle genome type (embplant_pt, embplant_mt, embplant_nr, fungus_mt, fungus_nr, animal_mt, and/or other_pt). Supposing you are assembling chloroplast genomes:
```
get_organelle_config.py --add embplant_pt,embplant_mt
```
If connection keeps failing, please manually download the latest database from GetOrganelleDB and initialization from local files.

The database will be located at ~/.GetOrganelle by default, which can be changed via the command line parameter --config-dir, or via the shell environment variable GETORG_PATH (see more here).

Test

Download a simulated Arabidopsis thaliana WGS dataset:

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz

then verify the integrity of downloaded files using md5sum:

md5sum Arabidopsis_simulated.*.fq.gz
# 935589bc609397f1bfc9c40f571f0f19  Arabidopsis_simulated.1.fq.gz
# d0f62eed78d2d2c6bed5f5aeaf4a2c11  Arabidopsis_simulated.2.fq.gz
# Please re-download the reads if your md5 values unmatched above

then do the fast plastome assembly (memory: ~600MB, CPU time: ~60s):

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10

You are going to get a similar running log as here and the same result as here.

Find more real data examples at GetOrganelle/wiki/Examples, GetOrganelleGallery and GetOrganelleComparison.

Instruction

Find more organelle genome assembly instruction at GetOrganelle/wiki.

In most cases, what you actually need to do is just typing in one simple command as suggested in <a href="#recipes">Recipes</a >. But you are still highly recommended reading the following minimal introductions:

Starting from Reads

The green workflow in the flowchart below shows the processes of get_organelle_from_reads.py.

Input data

Currently, get_organelle_from_reads.py was written for illumina pair-end/single-end data (fastq or fastq.gz). We recommend using adapter-trimmed raw reads without quality control. Usually, >1G per end is enough for plastome for most normal angiosperm samples, and >5G per end is enough for mitochondria genome assembly. Since v1.6.2, get_organelle_from_reads.py will automatically estimate the read data it needs, without user assignment nor data reducing (see flags --reduce-reads-for-coverage and --max-reads).
Main Options
- -w The value word size, like the kmer in assembly, is crucial to the feasibility and efficiency of this process. The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors. By default, GetOrganelle would automatically estimate a proper word size based on the data characters. Although the automatically-estimated word size value does not ensure the best performance nor the best result, you do not need to adjust this value (-w) if a complete/circular organelle genome assembly is produced, because the circular result generated by GetOrganelle is highly consistent under different options and seeds. The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation, fo

GetOrganelle

Install / Use

README

GetOrganelle

Installation & Initialization

Test

Instruction

Starting from Reads