CRoaring
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, Alibaba Tair, Redpanda, YDB and StarRocks
Install / Use
/learn @RoaringBitmap/CRoaringREADME
CRoaring
Portable Roaring bitmaps in C (and C++) with full support for your favorite compiler (GNU GCC, LLVM's clang, Visual Studio, Apple Xcode, Intel oneAPI). Included in the Awesome C list of open source C software.
Table of Contents
- Introduction
- Objective
- Requirements
- Quick Start
- Packages
- Using Roaring as a CPM dependency
- Using as a CMake dependency with FetchContent
- Amalgamating
- API
- Dealing with large volumes of data
- Running microbenchmarks
- Custom memory allocators
- Example (C)
- Compressed 64-bit Roaring bitmaps (C)
- Conventional bitsets (C)
- Example (C++)
- Building with cmake (Linux and macOS, Visual Studio users should see below)
- Building (Visual Studio under Windows)
- SIMD-related throttling
- Thread safety
- How to best aggregate bitmaps?
- Wrappers for Roaring Bitmaps
- Mailing list/discussion group
- Contributing
- References about Roaring
Introduction
Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps.
Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. They are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Metamarkets' Druid, LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, InfluxDB, Pilosa, Bleve, Microsoft Visual Studio Team Services (VSTS), and eBay's Apache Kylin. The CRoaring library is used in several systems such as Apache Doris, ClickHouse, Redpanda, YDB, Alibaba Tair, clice, biscuit and StarRocks. The YouTube SQL Engine, Google Procella, uses Roaring bitmaps for indexing.
We published a peer-reviewed article on the design and evaluation of this library:
- Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 arXiv:1709.07821
Roaring bitmaps are found to work well in many important applications:
Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)
There is a serialized format specification for interoperability between implementations. Hence, it is possible to serialize a Roaring Bitmap from C++, read it in Java, modify it, serialize it back and read it in Go and Python.
Objective
The primary goal of the CRoaring is to provide a high performance low-level implementation that fully take advantage of the latest hardware. Roaring bitmaps are already available on a variety of platform through Java, Go, Rust... implementations. CRoaring is a library that seeks to achieve superior performance by staying close to the latest hardware.
(c) 2016-... The CRoaring authors.
Requirements
- Linux, macOS, FreeBSD, Windows (MSYS2 and Microsoft Visual studio).
- We test the library with ARM, x64/x86 and POWER processors. We only support little endian systems (big endian systems are vanishingly rare).
- Recent C compiler supporting the C11 standard (GCC 7 or better, LLVM 8 or better (clang), Xcode 11 or better, Microsoft Visual Studio 2022 or better, Intel oneAPI Compiler 2023.2 or better), there is also an optional C++ class that requires a C++ compiler supporting the C++11 standard.
- CMake (to contribute to the project, users can rely on amalgamation/unity builds if they do not wish to use CMake).
- The CMake system assumes that git is available.
- Under x64 systems, the library provides runtime dispatch so that optimized functions are called based on the detected CPU features. It works with GCC, clang (version 9 and up) and Visual Studio (2017 and up). Other systems (e.g., ARM) do not need runtime dispatch.
Hardly anyone has access to an actual big-endian system. Nevertheless, We support big-endian systems such as IBM s390x through emulators---except for IO serialization which is only supported on little-endian systems (see issue 423).
Quick Start
The CRoaring library can be amalgamated into a single source file that makes it easier for integration into other projects. Moreover, by making it possible to compile all the critical code into one compilation unit, it can improve the performance. For the rationale, please see the SQLite documentation, or the corresponding Wikipedia entry. Users who choose this route, do not need to rely on CRoaring's build system (based on CMake).
We offer amalgamated files as part of each release.
Linux or macOS users might follow the following instructions if they have a recent C or C++ compiler installed and a standard utility (wget).
- Pull the library in a directory
wget https://github.com/RoaringBitmap/CRoaring/releases/download/v2.1.0/roaring.c wget https://github.com/RoaringBitmap/CRoaring/releases/download/v2.1.0/roaring.h wget https://github.com/RoaringBitmap/CRoaring/releases/download/v2.1.0/roaring.hh - Create a new file named
demo.cwith this content:
You can try it out on Godbolt.#include <stdio.h> #include <stdlib.h> #include "roaring.c" int main() { roaring_bitmap_t *r1 = roaring_bitmap_create(); for (uint32_t i = 100; i < 1000; i++) roaring_bitmap_add(r1, i); printf("cardinality = %d\n", (int) roaring_bitmap_get_cardinality(r1)); roaring_bitmap_free(r1); bitset_t *b = bitset_create(); for (int k = 0; k < 1000; ++k) { bitset_set(b, 3 * k); } printf("%zu \n", bitset_count(b)); bitset_free(b); return EXIT_SUCCESS; } - Create a new file named
demo.cppwith this content:#include <iostream> #include "roaring.hh" // the amalgamated roaring.hh includes roaring64map.hh #include "roaring.c" int main() { roaring::Roaring r1; for (uint32_t i = 100; i < 1000; i++) { r1.add(i); } std::cout << "cardinality = " << r1.cardinality() << std::endl; roaring::Roaring64Map r2; for (uint64_t i = 18000000000000000100ull; i < 18000000000000001000ull; i++) { r2.add(i); } std::cout << "cardinality = " << r2.cardinality() << std::endl; return 0; } - Compile
cc -o demo demo.c c++ -std=c++11 -o demopp demo.cpp ./democardinality = 900 1000./demoppcardinality = 900 cardinality = 900
Packages
Using Roaring as a CPM dependency
If you like CMake and CPM, you can add just a few lines in your CMakeLists.txt file to grab a CRoaring release. [See our CPM demonstration for
