SkillAgentSearch skills...

Grabb

GRAbB (Genome Region Assembly by Baiting) is a program designed to assemble selected regions of the genome or transcriptome using reference sequences and NGS data.

Install / Use

/learn @b-brankovics/Grabb
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

GRAbB

GRAbB (Genome Region Assembly by Baiting) is program designed to assemble selected regions of the genome or transcriptome using reference sequences and NGS data.


Table of contents

  1. Usage
    1. Installation
    2. Documentation
    3. Examples
  2. Prerequisites
    1. mirabait
    2. Edena
    3. Velvet
    4. Seqtk
    5. exonerate
    6. PRINSEQ lite
  3. Helper programs
    1. fastq2fasta
    2. get_overlaps
    3. interleaved2pairs
    4. merge_contigs
    5. rename_fastq
    6. single2pairs
    7. uniform_length
    8. fasta_shift
    9. pairwise_alignment_score
    10. reverse_complement
  4. Algorithm overview
    1. Main loop
    2. Creating the bait
    3. Baiting
    4. Collecting reads
    5. Assembly
    6. Testing completion
    7. Modes
  5. Arguments
    1. ref
    2. bait
    3. reads
    4. folder
    5. prefix
    6. single
    7. min_length
    8. type
    9. arg1
    10. arg2
    11. assembler
    12. clean
  6. Using custom assembler program
    1. Adding to the source code of GRAbB
    2. Using external_skeleton
    3. Using SPAdes as assembler
  7. Citation
  8. Contact

Usage

Installation

Asciinema casts:

  • Ubuntu: asciicast
  • Centos: asciicast
  • Fedora: asciicast

Steps:

  1. Install prerequisites (If this step is skipped, then configure_GRAbB.pl tries to use prerequisites included in the package)

    • Minimal set:

      • Baiting program: mirabait (recommended) OR kmer_bait.pl (no installation needed)
      • Read collecting program: seqtk (recommended) OR create_readpool.pl (no installation needed)
      • Assembler: Edena OR Vevlet OR external_scaffold.pl (needs to be modified and requires a working installation of an assembler)
    • Assemblers:

      • Edena: default assembler for GRAbB.pl
      • Vevlet
      • Other assembler: external_scaffold.pl has to be also edited
    • Exonerate:

      • Required for running GRAbB.pl in exonerate mode
  2. Configure GRAbB.pl

     ./configure_GRAbB.pl
    

    Recommended to add prerequisites to the Path Or add the absolute path of the executables to the source code of GRAbB.pl before running configure_GRAbB.pl

    Configured GRAbB.pl can be found in bin directory

    Bug: On some systems the exonerate binary included in the package runs extremely slow. Configure can get stuck at 'Testing exonerate' block, then issue Ctrl + c. On these systems exonerate has to be installed or build from source code (See Prerequisites::exonerate, before rerunning the configuration script.

  3. To test installation run the following (assembler has to be adjusted unless GRAbB.pl is configured without Edena)

     bin/GRAbB.pl --ref for_testing/assembly.fas --reads for_testing/read* --folder test --prefix test
    

OR

Use Docker (See Docker.md for more detailed instructions)

  • Either download the docker repository via docker interface docker pull brankovics/grabb

  • Or create a local docker image:

      git clone https://github.com/b-brankovics/grabb
      cd grabb/docker
      docker build -t localhost:5000/$USER/grabb .
    

Documentation

Run GRAbB.pl without any arguments and it prints the Usage information

The documentation is this file and the files mentioned at the examples.

Examples

See the wiki or the files Docker.md, Examples.md and Tutorial.md.


Prerequisites

mirabait

  1. Download MIRA (4.0) assembler (http://sourceforge.net/projects/mira-assembler/)

  2. Extract it. The executable files are in the bin folder.

  3. Copy/move or symlink 'mirabait' into somewhere in the path or add the folder to the path (This program uses only mirabait)

    Warning: the name of the executable file has to be mirabait!

    Bug: mira 4.0.2 does not work properly, but mira 4.0 does

OR you may also use kmer_bait.pl, which is less efficient, but uses only perl and standard Unix commands

Edena

  1. Download EDENA and extract it or use the copy in the 3rd_party_programs
  2. Change to the directory
  3. Type make on the command line (g++ needs to be installed, on ubuntu type sudo apt-get install g++)
  4. Copy/move or symlink 'edena' into somewhere in the path or add the folder to the path (The files in the bin folder)

Velvet

  1. Download Velvet and extract it or use the copy in the 3rd_party_programs
  2. Change to the directory. First zlib needs to be installed
  3. Change to 'third-party/zlib-1.2.3/'
  4. Type make on the command line
  5. Type sudo make install on the command line
  6. Go back to the parent directory (cd ../..)
  7. Type make on the command line
  8. Copy/move or symlink 'velveth' and 'velvetg' into somewhere in the path or add the folder to the path

Seqtk

  1. Download Seqtk from github and uncompress it or git clone https://github.com/lh3/seqtk.git
  2. Change to the directory
  3. Type make on the command line (zlib needs to be installed, see 1.3) for instructions)
  4. Copy/move or symlink 'seqtk' into somewhere in the path or add the folder to the path (The files in the bin folder)

OR you may also use create_readpool.pl, which is less efficient, but uses only perl and standard Unix commands

exonerate

For Ubuntu run sudo apt-get install exonerate

Else:

  1. Download exonerate from the EBI website and uncompress it or use the version included in the GRAbB package (3rd_party_programs)

  2. Change to the directory

  3. Type the following commands (The following packages have to be installed on the system before running ./configure: gcc, make and glib2)

     ./configure
     make
     make check
     make install
    
  4. The executable is found at src/program/exonerate. Copy/move or symlink 'exonerate' into somewhere in the path or add the folder to the path (The files in the src/program folder)

PRINSEQ lite

  1. Download PRINSEQ lite and extract it or use the copy in the 3rd_party_programs
  2. Copy/move or symlink 'prinseq-lite.pl' into somewhere in the path or add the folder to the path

Helper programs

fastq2fasta

This program creates a FASTA format read file for each FASTQ read file specified.

Usage:

./fastq2fasta <reads_1.fastq> <reads_2.fastq>

get_overlaps

This program reads the contigs from a fasta file and checks if they are overlapping with each other by using a minimal overlap size that is specified at invocation. Finally, prints all the overlaps found.

Usage:

./get_overlaps <contigs.fasta> <overlap>

interleaved2pairs

This program creates a forward and reverse read file from an interleaved file

Usage:

./interleaved2pairs <reads.fastq>

merge_contigs

This program reads the contigs from a fasta file and checks if they are overlapping with each other by using a minimal overlap size that is specified at invocation. Afterwards it loops through all the contigs and merges contig pairs that only overlap with each other at the given side. In the end it saves the contigs that were created into a file.

Usage:

./merge_contigs <contigs.fasta> <overlap> <output.fas>

rename_fastq

This program creates a FASTA format read file for each FASTQ read file

Usage:

./rename_fastq <reads_1.fastq> <reads_2.fastq>

single2pairs

This program creates paired-end read files from single-end files

  1. Selects the reads that are at least as long as the specified length (<int>)
  2. Gets the first <int> characters to be used as forward read
  3. Gets the last <int> characters to be used as reverse read (reverse complement)

All the produced reads are $len long.

The output file will be created in the current working directory

Usage:

./single2pairs <int> <reads.fastq>

uniform_length

This program creates a read file with reads with uniform length

  1. Selects the reads that are at least as long as the specified length (<int>)
  2. Trims the reads to the specified length

All the produced reads are $len long

The output file will be created in the current working directory

Usage:

./uniform_length <int> <reads.fastq>

fasta_shift

This program takes a fasta file and shifts the sequence in it according to a position value or a reference file the output is printed to STDOUT

  1. Using position value n is the position value

    Usage:

     ./fasta_shift -i <input.fas> -p <int> ><output.fas>
    
  2. Using a reference file: the output will start wit

Related Skills

View on GitHub
GitHub Stars13
CategoryDesign
Updated1y ago
Forks7

Languages

C

Security Score

75/100

Audited on Feb 18, 2025

No findings