USearch
Fast Open-Source Search & Clustering engine Ć for Vectors & Arbitrary Objects Ć in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram š
Install / Use
/learn @unum-cloud/USearchREADME
- ā 10x faster HNSW implementation than FAISS.
- ā Simple and extensible single C++11 header library.
- ā Trusted by giants like Google and DBs like ClickHouse & DuckDB.
- ā SIMD-optimized and user-defined metrics with JIT compilation.
- ā
Hardware-agnostic
f16&i8- half-precision & quarter-precision support. - ā View large indexes from disk without loading into RAM.
- ā Heterogeneous lookups, renaming/relabeling, and on-the-fly deletions.
- ā Binary Tanimoto and Sorensen coefficients for Genomics and Chemistry applications.
- ā
Space-efficient point-clouds with
uint40_t, accommodating 4B+ size. - ā Compatible with OpenMP and custom "executors" for fine-grained parallelism.
- ā Semantic Search and Joins.
- š Near-real-time clustering and sub-clustering for Tens or Millions of clusters.
Technical Insights and related articles:
- Uses Arm SVE and x86 AVX-512's masked loads to eliminate tail
for-loops. - Uses Horner's method for polynomial approximations, beating GCC 12 by 119x.
- For every language implements a custom separate binding.
Comparison with FAISS
FAISS is a widely recognized standard for high-performance vector search engines. USearch and FAISS both employ the same HNSW algorithm, but they differ significantly in their design principles. USearch is compact and broadly compatible without sacrificing performance, primarily focusing on user-defined metrics and fewer dependencies.
| | FAISS | USearch | Improvement |
| :------------------------------------------- | ----------------------: | -----------------------: | ----------------------: |
| Indexing time ā° | | | |
| 100 Million 96d f32, f16, i8 vectors | 2.6 Ā· 2.6 Ā· 2.6 h | 0.3 Ā· 0.2 Ā· 0.2 h | 9.6 Ā· 10.4 Ā· 10.7 x |
| 100 Million 1536d f32, f16, i8 vectors | 5.0 Ā· 4.1 Ā· 3.8 h | 2.1 Ā· 1.1 Ā· 0.8 h | 2.3 Ā· 3.6 Ā· 4.4 x |
| | | | |
| Codebase length ¹ | 84 K SLOC | 3 K SLOC | maintainable |
| Supported metrics ² | 9 fixed metrics | any metric | extendible |
| Supported languages ³ | C++, Python | 10 languages | portable |
| Supported ID types ā“ | 32-bit, 64-bit | 32-bit, 40-bit, 64-bit | efficient |
| Filtering āµ | ban-lists | any predicates | composable |
| Required dependencies ā¶ | BLAS, OpenMP | - | light-weight |
| Bindings ā· | SWIG | Native | low-latency |
| Python binding size āø | ~ 10 MB | < 1 MB | deployable |
ⰠTested on Intel Sapphire Rapids, with the simplest inner-product distance, equivalent recall, and memory consumption while also providing far superior search speed. ¹ A shorter codebase of
usearch/overfaiss/makes the project easier to maintain and audit. ² User-defined metrics allow you to customize your search for various applications, from GIS to creating custom metrics for composite embeddings from multiple AI models or hybrid full-text and semantic search. ³ With USearch, you can reuse the same preconstructed index in various programming languages. ⓠThe 40-bit integer allows you to store 4B+ vectors without allocating 8 bytes for every neighbor reference in the proximity graph. ⵠWith USearch the index can be combined with arbitrary external containers, like Bloom filters or third-party databases, to filter out irrelevant keys during index traversal. ⶠLack of obligatory dependencies makes USearch much more portable. ⷠNative bindings introduce lower call latencies than more straightforward approaches. ⸠Lighter bindings make downloads and deployments faster.
Base functionality is identical to FAISS, and the interface must be familiar if you have ever investigated Approximate Nearest Neighbors search:
# pip install usearch
import numpy as np
from usearch.index import Index
index = Index(ndim=3) # Default settings for 3D vectors
vector = np.array([0.2, 0.6, 0.4]) # Can be a matrix for batch operations
index.add(42, vector) # Add one or many vectors in parallel
matches = index.search(vector, 10) # Find 10 nearest neighbors
assert matches[0].key == 42
assert matches[0].distance <= 0.001
assert np.allclose(index[42], vector, atol=0.1) # Ensure high tolerance in mixed-precision comparisons
More settings are always available, and the API is designed to be as flexible as possible.
The default storage/quantization level is hardware-dependant for efficiency, but bf16 is recommended for most modern CPUs.
index = Index(
ndim=3, # Define the number of dimensions in input vectors
metric='c
