SkillAgentSearch skills...

FMAP

Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies

Install / Use

/learn @jiwoongbio/FMAP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

FMAP

Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies

Some example results are available at the homepage: https://qbrc.swmed.edu/FMAP/.

Features

  • FMAP provides a more sensible reference protein sequence database based on UniRef.

  • Identification of differentially-abundant genes KEGG Orthology

  • Mapping differentially-abundant genes to pathways and modules (KEGG Pathway and KEGG Module)

  • Mapping differentially-abundant genes to operons (ODB (v3))

Requirements

  • Perl - scripting language

  • R - statistical computing

  • Statistics::R - Perl interface with the R statistical program

    • Use CPAN to install the module
    perl -MCPAN -e 'install Statistics::R'
    
    • or download the source and compile manually
    wget 'http://search.cpan.org/CPAN/authors/id/F/FA/FANGLY/Statistics-R-0.34.tar.gz'
    tar zxf Statistics-R-0.33.tar.gz
    cd Statistics-R-0.33
    perl Makefile.PL
    make
    make test
    make install
    
  • Mapping program providing BLASTX search of sequencing reads: DIAMOND or USEARCH

  • Linux commands: wget, cat, sort

  • Bio::DB::Taxonomy - Access to a taxonomy database (which is required only if you want to build a custom database.)

  • XML::LibXML - Perl Binding for libxml2 (which is required only if you want to download genome sequences.)

Command usages

  • FMAP_database.pl
    • Process
    • Input
      1. UniRef sequence identity (50, 90, or 100)
      2. (optional) NCBI taxonomy IDs (integer)
    • Require Bio::DB::Taxonomy.
    • The following data files will be downloaded through FTP connection. If you have a problem in the FTP connection, please download the files through another method and copy them into "FMAP_data" directory before executing "FMAP_database.pl" command.
      ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
      ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz
      ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref100/uniref100.fasta.gz
      or ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz
      or ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz
    • Require HTTP connection for KEGG API.
Usage:   perl FMAP_database.pl [options] 50|90|100 [NCBI_TaxID [...]]

Options: -h       display this help message
         -s       switch database
         -r       redownload data
  • FMAP_prepare.pl
Usage:   perl FMAP_prepare.pl [options]

Options: -h       display this help message
         -r       redownload data
         -m FILE  executable file path of mapping program, "diamond" or "usearch" [diamond]
         -k       download prebuilt KEGG files
  • FMAP_assembly.pl
    • Process
      • Read mapping: nucleotide sequence alignment using BWA
      • ORF mapping: protein sequence alignment using DIAMOND
    • Input
      1. Prefix of output files
      2. De novo assembled sequences in FASTA format
        • A FASTA file can be generated by metagenome assemblers such as SPAdes and MetaVelvet.
        • A FASTA file containing target genome sequences can be input instead.
      3. Whole metagenomic/metatranscriptomic shotgun sequencing reads in FASTQ or FASTA format
        • Multiple read files can be specified.
        • Paired-end read files must be specified comma-separated like "input.R1.fastq,input.R2.fastq".
        • The read files can be compressed by gzip.
    • Output
      1. Prefix.region.abundance.txt (abundances of ORF regions mapping to KEGG orthologies)
      2. Prefix.abundance.txt (abundances of KEGG orthologies)
Usage:   perl FMAP_assembly.pl [options] output.prefix assembly.fasta [input.fastq|input.R1.fastq,input.R2.fastq [...]] > summary.txt

Options: -h       display this help message
         -A STR   prepared assembly prefix
         -B       input indexed sorted BAM file instead of FASTQ file
         -m FILE  executable file path of mapping program, "diamond" or "usearch" [diamond]
         -p INT   number of threads [1]
         -e FLOAT maximum e-value to report alignments [10]
         -t DIR   directory for temporary files [$TMPDIR or /tmp]
         -a FLOAT search acceleration for ublast [0.5]
         -C STR   codon and translation e.g. ATG=M [NCBI genetic code 11 (Bacterial, Archaeal and Plant Plastid)]
         -S STR   comma-separated start codons [GTG,ATG,CTG,TTG,ATA,ATC,ATT]
         -T STR   comma-separated termination codons [TAG,TAA,TGA]
         -l INT   minimum translation length [10]
         -c FLOAT minimum coverage [0.8]
         -q INT   minimum mapping quality [0]
         -s STR   strand specificity, "f" or "r"
         -P STR   contig prefix used for abundance estimation
  • FMAP_assembly_centrifuge.pl
    • Require Centrifuge.
    • Input
      1. FMAP_assembly.region.txt (ORF regions mapping to KEGG orthologies generated by FMAP_assembly)
      2. De novo assembled sequences in FASTA format
      3. Centrifuge index filename prefix (minus trailing .X.cf)
    • Output: FMAP_assembly.region.taxon.txt (FMAP_assembly.region.txt including a column of NCBI taxonomy IDs (integer))
Usage:   perl FMAP_assembly_taxon.pl [options] FMAP_assembly.region.txt assembly.fasta centrifuge.index

Options: -h       display this help message
         -p INT   number of threads [1]
  • FMAP_assembly_heatmap.pl
    • Require Bio::DB::Taxonomy.
    • Input: FMAP_assembly.abundance.txt (abundances generated by FMAP_assembly)
    • Output: HTML format of abundance heatmap table
Usage:   perl FMAP_assembly_heatmap.pl [options] [name=]FMAP_assembly.abundance.txt [...] > FMAP_assembly_heatmap.html

Options: -h       display this help message
         -c FILE  comparison output file including orthology and filter columns
         -f INT   HTML font size
         -w INT   HTML table cell width
  • FMAP_assembly_operon.pl
    • Input: FMAP_assembly.region.txt (ORF regions mapping to KEGG orthologies generated by FMAP_assembly)
    • Output: FMAP_assembly_operon.txt (ODB (v3) known operons consisting of orthologies located together on an assembled contig/scaffold/transcript)
Usage:   perl FMAP_assembly_operon.pl [options] FMAP_assembly.region.txt > FMAP_assembly_operon.txt

Options: -h       display this help message
         -a       print single-gene operons as well
  • FMAP_download_genome.pl
Usage:   perl FMAP_download_genome.pl [options] NCBI_TaxID [...] > genome.fasta

Options: -h       display this help message
         -a       assembly instead of genome
  • FMAP_download.pl
Usage:   perl FMAP_download.pl [options]

Options: -h       display this help message
         -m FILE  executable file path of mapping program, "diamond" or "usearch" [diamond]
         -k       download prebuilt KEGG files
         -x       download only KEGG files
  • FMAP_mapping.pl
    • Input: whole metagenomic (or metatranscriptomic) shotgun sequencing reads in FASTQ or FASTA format
    • Output: best-match hits in NCBI BLAST ‑m8 (= NCBI BLAST+ ‑outfmt 6) format
Usage:   perl FMAP_mapping.pl [options] input1.fastq|input1.fasta [input2.fastq|input2.fasta [...]] > blastx_hits.txt

Options: -h       display this help message
         -m FILE  executable file path of mapping program, "diamond" or "usearch" [diamond]
         -p INT   number of threads [1]
         -e FLOAT maximum e-value to report alignments [10]
         -t DIR   directory for temporary files [$TMPDIR or /tmp]
         -a FLOAT search acceleration for ublast [0.5]
  • FMAP_quantification.pl
    • Input: output of "FMAP_mapping.pl"
    • Output: abundances (RPKM) of KEGG orthologies
    • Output columns: KEGG Orthology ID, orthology definition, abundance (RPKM)
Usage:   perl FMAP_quantification.pl [options] blast_hits1.txt [blast_hits2.txt [...]] > abundance.txt

Options: -h       display this help message
         -c       use CPM values instead of RPKM values
         -i FLOAT minimum percent identity [80]
         -l FILE  tab-delimited text file with the first column having protein names and the second column having the sequence lengths
         -o FILE  tab-delimited text file with the first column having protein names and the second column having the orthology names
         -d FILE  tab-delimited text file with the first column having orthology names and the second column having the definitions
         -w FILE  tab-delimited text file with the first column having read names and the second column having the weights
  • FMAP_table.pl
    • Input: outputs of "FMAP_quantification.pl"
    • Output: abundance table
    • Output columns: [KEGG Ortho
View on GitHub
GitHub Stars26
CategoryDevelopment
Updated1y ago
Forks13

Languages

Perl

Security Score

60/100

Audited on May 15, 2024

No findings