SkillAgentSearch skills...

Bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs

Install / Use

/learn @pmelsted/Bifrost
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Bifrost

Parallel construction, indexing and querying of colored and compacted de Bruijn graphs

  • Build, index, color and query the compacted de Bruijn graph
  • Reads or assembled genomes as input
  • Output graph in GFA (can be visualized with Bandage), FASTA or binary
  • Graph cleaning: short tip clipping, etc.
  • Multi-threaded
  • No parameters to estimate with other tools
  • Exact or approximate k-mer search of queries
  • C++ API available:
    • Associate your data with vertices
    • Add or remove (sub-)sequences / k-mers / colors
    • Find unitigs containing queried k-mers

Other tools integrating or using Bifrost: Kallisto, Ratatosk, ggCaller, popIns2, PLAST and more.

Table of Contents

Requirements

It is highly recommended to install Bifrost from source. However, a Conda installation is possible (see Section Installation). Bifrost requirements are pre-installed by default on most OS:

In case you are missing on or more of those:

  • Ubuntu/Debian:
sudo apt-get install build-essential cmake zlib1g-dev
brew install --with-toolchain llvm
brew install cmake zlib
  • Windows: Bifrost does not run natively on Windows but you can install the Windows Subsystem for Linux (WSL) and run it from there. Bifrost will be slower on WSL compare to a native Linux installation. From the WSL:
sudo apt-get install build-essential cmake zlib1g-dev

Installation

  • From source

    git clone https://github.com/pmelsted/bifrost.git
    cd bifrost && mkdir build && cd build
    cmake ..
    make
    make install
    

    By default, the installation creates:

    • a binary (Bifrost)
    • a dynamic library (libbifrost.so for Unix or libbifrost.dylib for MacOS)
    • a static library (libbifrost.a)

    Advanced

    • make install might require sudo (sudo make install) to proceed.
    • To install in a non-default path /some/path/, add the option -DCMAKE_INSTALL_PREFIX=/some/path/ to the cmake command.
    • Bifrost compiles by default with -march=native: the compiler targets architecture instructions specific to the machine Bifrost is compiled on. Hence, the binary and library produced might not work on a different machine. Native compilation can be disabled by adding the option -DCOMPILATION_ARCH=OFF to the cmake command (disables all AVX2 optimizations too). Alternatively, you can use this option to specify the architecture you want to target: x86-64, knl, etc. Default is -DCOMPILATION_ARCH=native.
    • Bifrost uses AVX2 instructions during graph construction which can be disabled by adding the option -DENABLE_AVX2=OFF to the cmake command.

    If you encounter any problem during the installation, see the Troubleshooting section.

Large k-mers

The default maximum k-mer size supported is 31. To work with larger k in the binary, you must install Bifrost from source and replace MAX_KMER_SIZE with a larger multiple of 32. This can be done in two ways:

  • By adding the following option to the cmake command:
-DMAX_KMER_SIZE=64
  • By replacing MAX_KMER_SIZE in CMakeLists.txt:
SET(MAX_KMER_SIZE "64" CACHE STRING "MAX_KMER_SIZE")

Actual maximum k-mer size is MAX_KMER_SIZE-1, e.g maximum k is 63 for MAX_KMER_SIZE=64. Increasing MAX_KMER_SIZE increases Bifrost memory usage (k=31 uses 8 bytes of memory per k-mer while k=63 uses 16 bytes of memory per k-mer).

The maximum size of minimizers (g-mers) MAX_GMER_SIZE can be adjusted the same way as MAX_KMER_SIZE. This is especially useful if you want to use a large k-mer size but a small g-mer size. By default, MAX_GMER_SIZE is equal to MAX_KMER_SIZE.

To work with larger k when using the Bifrost API, the new value MAX_KMER_SIZE must be given to the compiler and linker as explained in Section API

Binary usage:

Bifrost

displays the command line interface:

Bifrost x.y.z

Highly parallel construction, indexing and querying of colored and compacted de Bruijn graphs

Usage: Bifrost [COMMAND] [PARAMETERS]

[COMMAND]:

   build                   Build a compacted de Bruijn graph, with or without colors
   update                  Update a compacted (colored) de Bruijn graph with new sequences
   query                   Query a compacted (colored) de Bruijn graph

[PARAMETERS]: build

   > Mandatory with required argument:

   -s, --input-seq-file     Input sequence file in fasta/fastq(.gz) format
                            Multiple files can be provided as a list in a text file (one file per line)
                            K-mers with exactly 1 occurrence in the input sequence files will be discarded
   -r, --input-ref-file     Input reference file in fasta/fastq(.gz) or gfa(.gz) format
                            Multiple files can be provided as a list in a text file (one file per line)
                            All k-mers of the input reference files are used
   -o, --output-file        Prefix for output file(s)

   > Optional with required argument:

   -t, --threads            Number of threads (default: 1)
   -k, --kmer-length        Length of k-mers (default: 31)
   -m, --min-length         Length of minimizers (default: auto)
   -B, --bloom-bits         Number of Bloom filter bits per k-mer (default: 24)
   -T, --tmp-dir            Path for tmp directory (default: creates tmp directory in output directory)
   -l, --load-mbbf          Input Blocked Bloom Filter file, skips filtering step (default: no input)
   -w, --write-mbbf         Output Blocked Bloom Filter file (default: no output)

   > Optional with no argument:

   -c, --colors             Color the compacted de Bruijn graph
   -i, --clip-tips          Clip tips shorter than k k-mers in length
   -d, --del-isolated       Delete isolated contigs shorter than k k-mers in length
   -f, --fasta-out          Output file in fasta format (only sequences) instead of gfa (unless graph is colored)
   -b, --bfg-out            Output file in bfg/bfi format (Bifrost graph/index) instead of gfa (unless graph is colored)
   -n, --no-compress-out    Output files must be uncompressed
   -N, --no-index-out       Do not make index file
   -v, --verbose            Print information messages during execution

[PARAMETERS]: update

  > Mandatory with required argument:

   -g, --input-graph-file   Input graph file to update in gfa(.gz) or bfg format
   -s, --input-seq-file     Input sequence file in fasta/fastq(.gz) format
                            Multiple files can be provided as a list in a text file (one file per line)
                            K-mers with exactly 1 occurrence in the input sequence files will be discarded
   -r, --input-ref-file     Input reference file in fasta/fastq(.gz) or gfa(.gz) format
                            Multiple files can be provided as a list in a text file (one file per line)
                            All k-mers of the input reference files are used
   -o, --output-file        Prefix for output file(s)

   > Optional with required argument:

   -I, --input-index-file   Input index file associated with graph to update in bfi format
   -C, --input-color-file   Input color file associated with graph to update in color.bfg format
   -t, --threads            Number of threads (default: 1)
   -k, --kmer-length        Length of k-mers (default: read from input graph file if built with Bifrost or 31)
   -m, --min-length         Length of minimizers (default: read from input graph if built with Bifrost, auto otherwise)
   -T, --tmp-dir            Path for tmp directory (default: creates tmp directory in output directory)

   > Optional with no argument:

   -i, --clip-tips          Clip tips shorter than k k-mers in length
   -d, --del-isolated       Delete isolated contigs shorter than k k-mers in length
   -f, --fasta-out          Output file in fasta format (only sequences) instead of gfa (unless colors are output)
   -b, --bfg-out            Output file in bfg/bfi format (Bifrost graph/index) instead of gfa (unless graph is colored)
   -n, --no-compress-out    Output files must be uncompressed
   -N, --no-index-out       Do not make index file
   -v, --verbose            Print information messages during execution

[PARAMETERS]: query

  > Mandatory with required argument:

   -g, --input-graph-file   Input graph file to query in gfa(.gz) or bfg
   -q, --input-query-file   Input query file in fasta/fastq(.gz). Each record is a query.
                            Multiple files can be provided as a list in a text file (one file per line)
   -o, --output-file        Prefix for output file

   > Optional with required argument:

   -e, --min_ratio-kmers    Minimum ratio of k-mers from each query that must occur in the graph
   -E, --min-nb-colors      Minimum number of colors from each query that must occur in the graph
   -I, --input-index-file   Input index file associated with graph to query in bfi format
   -C, --input-color-file   Input color file associated with the graph to query in color.bfg format
   -t, --threads            Number of threads (default: 1)
  

Related Skills

View on GitHub
GitHub Stars223
CategoryDevelopment
Updated13d ago
Forks29

Languages

C++

Security Score

95/100

Audited on Mar 21, 2026

No findings