Edlib
Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
Install / Use
/learn @Martinsos/EdlibREADME
Edlib
·

A lightweight and super fast C/C++ library for sequence alignment using edit distance.
Calculating edit distance of two strings is as simple as:
edlibAlign("hello", 5, "world!", 6, edlibDefaultAlignConfig()).editDistance;
Edlib is also available for Python (Click here for Python README), with code residing at bindings/python.
There are third-party bindings to edlib in other languages as well:
- Edlib.jl, a Julia package created and supported by Christopher Rowley (@cjdoris)
- edlibR, an R package created and supported by Evan Biederstedt (@evanbiederstedt)
- ruby-edlib, a Ruby package created and supported by @kojix2
- Text::Levenshtein::Edlib, a Perl wrapper created and supported by Marius Gavrilescu
Features
- Calculates edit distance (Levenshtein distance).
- It can find optimal alignment path (instructions how to transform first sequence into the second sequence).
- It can find just the start and/or end locations of alignment path - can be useful when speed is more important than having exact alignment path.
- Supports multiple alignment methods: global(NW), prefix(SHW) and infix(HW), each of them useful for different scenarios.
- You can extend character equality definition, enabling you to e.g. have wildcard characters, to have case insensitive alignment or to work with degenerate nucleotides.
- It can easily handle small or very large sequences, even when finding alignment path, while consuming very little memory.
- Super fast thanks to Myers's bit-vector algorithm.
Contents
- Features
- Building
- Using Edlib in your project
- Usage and examples
- API documentation
- Alignment methods
- Aligner
- Running tests
- Time and space complexity
- Test data
- Development and contributing
- Publication
- Acknowledgements
Using Edlib in your project
You can use Edlib in you project by either directly copying header and source files from edlib/, or by linking Edlib library (see Building for instructions how to build Edlib libraries).
In any case, only thing that you have to do in your source files is to include edlib.h.
To get you started quickly, let's take a look at a few ways to get simple Hello World project working.
Our Hello World project has just one source file, helloWorld.cpp file, and it looks like this:
#include <cstdio>
#include "edlib.h"
int main() {
EdlibAlignResult result = edlibAlign("hello", 5, "world!", 6, edlibDefaultAlignConfig());
if (result.status == EDLIB_STATUS_OK) {
printf("edit_distance('hello', 'world!') = %d\n", result.editDistance);
}
edlibFreeAlignResult(result);
}
Running it should output edit_distance('hello', 'world!') = 5.
Approach #1: Directly copying edlib source and header files.
Here we directly copied edlib/ directory to our project, to get following project structure:
edlib/ -> copied from edlib/
include/
edlib.h
src/
edlib.cpp
helloWorld.cpp -> your program
Since helloWorld is a c++ program, we can compile it with just one line: c++ helloWorld.cpp edlib/src/edlib.cpp -o helloWorld -I edlib/include.
If hello world was a C program, we would compile it like this:
c++ -c edlib/src/edlib.cpp -o edlib.o -I edlib/include
cc -c helloWorld.c -o helloWorld.o -I edlib/include
c++ helloWorld.o edlib.o -o helloWorld
Approach #2: Copying edlib header file and static library.
Instead of copying edlib source files, you could copy static library (check Building on how to create static library). We also need to copy edlib header files. We get following project structure:
edlib/ -> copied from edlib
include/
edlib.h
edlib.a
helloWorld.cpp -> your program
Now you can compile it with c++ helloWorld.cpp -o helloWorld -I edlib/include -L edlib -ledlib.
Approach #3: Install edlib library on machine.
Alternatively, you could avoid copying any Edlib files and instead install libraries by running sudo make install (check Building for exact instructions depending on approach you used for building). Now, all you have to do to compile your project is c++ helloWorld.cpp -o helloWorld -ledlib.
If you get error message like cannot open shared object file: No such file or directory, make sure that your linker includes path where edlib was installed.
Approach #4: Use edlib in your project via CMake.
Using git submodule
If you are using CMake for compilation, we suggest adding edlib as a git submodule with the command git submodule add https://github.com/martinsos/edlib vendor/edlib. Afterwards, modify your top level CMakeLists.txt file accordingly:
add_subdirectory(vendor/edlib EXCLUDE_FROM_ALL)
target_link_libraries(your_exe edlib) # or target_link_libraries(your_exe edlib)
The add_subdirectory command adds a folder to the build tree, meaning it will run CMakeLists.txt from the included folder as well. Flag EXCLUDE_FROM_ALL disables building (and instalment) of targets in the added folder which are not needed in your project. In the above example only the (static) library edlib will be build, while edlib-aligner, hello_world and the rest won't. In order to access the edlib API, add #include "edlib.h" in your source file (CMake will automatically update your include path).
For more example projects take a look at applications in apps/.
Using VCPKG
Edlib is available on VCPKG package manager. With VCPKG on your system, Edlib can be downloaded using the VCPKG install command vcpkg install edlib. Once the library has been downloaded, add the following instructions to your CMakeLists.txt file:
find_package(edlib CONFIG REQUIRED)
target_link_libraries(MyProject PRIVATE edlib::edlib)
then you should be able to include the library header in your project (#include "edlib.h)
Building
Meson
Primary way of building Edlib is via Meson build tool.
Requirements: make sure that you have meson installed on your system.
Execute
make
to build static library and binaries (apps and tests) and also run tests.
To build shared library and binaries, do make LIBRARY_TYPE=shared.
Library and binaries will be created in meson-build directory.
You can choose alternate build directory like this: make BUILD_DIR=some-other-dir.
Optionally, you can run
sudo make install
to install edlib library on your machine (on Linux, this will usually install it to usr/local/lib and usr/local/include).
Check Makefile if you want to run individual steps on your own (building, tests, ...).
NOTE: If you need more control, use meson command directly, Makefile is here only to help with common commands.
CMake
Edlib can alternatively be built with CMake.
Execute following command to build Edlib using CMAKE:
cd build && cmake -D CMAKE_BUILD_TYPE=Release .. && make
This will create binaries in bin/ directory and libraries (static and shared) in lib/ directory.
./bin/runTests
to run tests.
Optionally, you can run
sudo make install
to install edlib library on your machine.
Conda
Edlib can also be installed via Conda: :
conda install edlib.
Usage and examples
Main function in edlib is edlibAlign. Given two sequences (and their lengths), it will find edit distance, alignment path or its end and start locations.
char* query = "ACCTCTG";
char* target = "ACTCTGAAA"
EdlibAlignResult result = edlibAlign(query, 7, target, 9, edlibDefaultAlignConfig());
if (result.status == EDLIB_STATUS_OK) {
printf("%d", result.editDistance);
}
edlibFreeAlignResult(result);
NOTE: One character is expected to occupy one char/byte, meaning that characters spanning multiple chars/bytes are not supported. As long as your alphabet size is <= 256 you can manually map it to numbers/chars from 0 to 255 and solve this that way, but if its size is > 256 then you will not be able to use Edlib.
Configuring edlibAlign()
edlibAlign takes configuration object (it is a struct EdlibAlignConfig), which allows you to further customize how alignment will be done. You can choose alignment method, tell edlib what to calculate (just edit distance or also path and locations) and set upper limit for edit distance.
For example, if you want to use infix(HW) alignment method, want to find alignment path (and edit distance), are interested in result only if edit dista
