SkillAgentSearch skills...

CloudBrush

A De Novo Next Generation Genomic Sequence Assembler Based on String Graph and MapReduce Cloud Computing Framework

Install / Use

/learn @CSCLabTW/CloudBrush
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CloudBrush

CloudBrush is a de novo genome assembler based on the string graph and mapreduce framework, it is released under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) as a Free and Open Source Software Project.

More details about CloudBrush Project you can get under: https://github.com/ice91/CloudBrush


requirement: hadoop cluster

step 1: convert *.fastq to *.sfa

e.g. perl data/preprocessor.pl -renum data/Ec10k.sim.fastq > data/Ec10k.sim.sfa

step 2: upload *.sfa to HDFS

e.g. hadoop fs -put data/Ec10k.sim.sfa Ec10k

(You can use CloudRS(ReadStackCorrector) as preprocesser to correct sequencing errors and prepare *.sfa in HDFS. More details about CloudRS Project you can get under: https://github.com/ice91/ReadStackCorrector)

step 3 perform the CloudBrush using hadoop cluster

e.g. hadoop jar CloudBrush.jar -reads Ec10k -asm Ec10k_Brush -k 21 -readlen 36

e.g. hadoop jar CloudBrush.jar -reads Ec10k -asm Ec10k_Brush -k 21 -readlen 36 -javaopts -Djava.util.Arrays.useLegacyMergeSort=True

step 4 download *.fasta from HDFS

e.g. hadoop fs -cat Ec10k_Brush/* > Ec10k_Brush.fasta

[Reference] Yu-Jung Chang, Chien-Chih Chen, Chuen-Liang Chen and Jan-Ming Ho, "De Novo Next Generation Genomic Sequence Assembler Based on String Graph and MapReduce Cloud Computing Framework," BMC Genomics, volume 13, number Suppl 7, pages S28, December 2012.

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated6y ago
Forks5

Languages

Java

Security Score

70/100

Audited on Aug 13, 2019

No findings