VectorDBBench
Benchmark for vector databases.
Install / Use
/learn @zilliztech/VectorDBBenchREADME
VectorDBBench(VDBBench): A Benchmark Tool for VectorDB
What is VDBBench
VDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly. To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.
Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as SIFT, GIST, Cohere, and a dataset generated by OpenAI from an opensource raw dataset. It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
Prepare to delve into the world of VDBBench, and let it guide you in uncovering your perfect vector database match.
VDBBench is sponsored by Zilliz,the leading opensource vectorDB company behind Milvus. Choose smarter with VDBBench - start your free test on zilliz cloud today!
Leaderboard: https://zilliz.com/benchmark
Quick Start
Prerequirement
python >= 3.11
Install
Install vectordb-bench with only PyMilvus
pip install vectordb-bench
Install all database clients
pip install 'vectordb-bench[all]'
Install the specific database client
pip install 'vectordb-bench[pinecone]'
All the database client supported
| Optional database client | install command |
|--------------------------|---------------------------------------------|
| pymilvus, zilliz_cloud (default) | pip install vectordb-bench |
| all (clients requirements might be conflict with each other) | pip install vectordb-bench[all] |
| qdrant | pip install vectordb-bench[qdrant] |
| pinecone | pip install vectordb-bench[pinecone] |
| weaviate | pip install vectordb-bench[weaviate] |
| elastic, aliyun_elasticsearch| pip install vectordb-bench[elastic] |
| pgvector, pgvectorscale, pgdiskann, alloydb | pip install vectordb-bench[pgvector] |
| pgvecto.rs | pip install vectordb-bench[pgvecto_rs] |
| redis | pip install vectordb-bench[redis] |
| memorydb | pip install vectordb-bench[memorydb] |
| chromadb | pip install vectordb-bench[chromadb] |
| cockroachdb | pip install vectordb-bench[cockroachdb] |
| awsopensearch | pip install vectordb-bench[opensearch] |
| aliyun_opensearch | pip install vectordb-bench[aliyun_opensearch] |
| mongodb | pip install vectordb-bench[mongodb] |
| tidb | pip install vectordb-bench[tidb] |
| vespa | pip install vectordb-bench[vespa] |
| oceanbase | pip install vectordb-bench[oceanbase] |
| hologres | pip install vectordb-bench[hologres] |
| tencent_es | pip install vectordb-bench[tencent_es] |
| alisql | pip install 'vectordb-bench[alisql]' |
| doris | pip install vectordb-bench[doris] |
| zvec | pip install vectordb-bench[zvec] |
| endee | pip install vectordb-bench[endee] |
| lindorm | pip install vectordb-bench[lindorm] |
Run
init_bench
OR:
Run from the command line.
vectordbbench [OPTIONS] COMMAND [ARGS]...
To list the clients that are runnable via the commandline option, execute: vectordbbench --help
$ vectordbbench --help
Usage: vectordbbench [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
pgvectorhnsw
pgvectorivfflat
test
weaviate
To list the options for each command, execute vectordbbench [command] --help
$ vectordbbench pgvectorhnsw --help
Usage: vectordbbench pgvectorhnsw [OPTIONS]
Options:
--config-file PATH Read configuration from yaml file
--drop-old / --skip-drop-old Drop old or skip [default: drop-old]
--load / --skip-load Load or skip [default: load]
--search-serial / --skip-search-serial
Search serial or skip [default: search-
serial]
--search-concurrent / --skip-search-concurrent
Search concurrent or skip [default: search-
concurrent]
--case-type [CapacityDim128|CapacityDim960|Performance768D100M|Performance768D10M|Performance768D1M|Performance768D10M1P|Performance768D1M1P|Performance768D10M99P|Performance768D1M99P|Performance1536D500K|Performance1536D5M|Performance1536D500K1P|Performance1536D5M1P|Performance1536D500K99P|Performance1536D5M99P|Performance1536D50K]
Case type
--db-label TEXT Db label, default: date in ISO format
[default: 2024-05-20T20:26:31.113290]
--dry-run Print just the configuration and exit
without running the tasks
--k INTEGER K value for number of nearest neighbors to
search [default: 100]
--concurrency-duration INTEGER Adjusts the duration in seconds of each
concurrency search [default: 30]
--num-concurrency TEXT Comma-separated list of concurrency values
to test during concurrent search [default:
1,10,20]
--concurrency-timeout INTEGER Timeout (in seconds) to wait for a
concurrency slot before failing. Set to a
negative value to wait indefinitely.
[default: 3600]
--user-name TEXT Db username [required]
--password TEXT Db password [required]
--host TEXT Db host [required]
--db-name TEXT Db name [required]
--maintenance-work-mem TEXT Sets the maximum memory to be used for
maintenance operations (index creation). Can
be entered as string with unit like '64GB'
or as an integer number of KB.This will set
the parameters:
max_parallel_maintenance_workers,
max_parallel_workers &
table(parallel_workers)
--max-parallel-workers INTEGER Sets the maximum number of parallel
processes per maintenance operation (index
creation)
--m INTEGER hnsw m
--ef-construction INTEGER hnsw ef-construction
--ef-search INTEGER hnsw ef-search
--quantization-type [none|bit|halfvec]
quantization type for vectors (in index)
--table-quantization-type [none|bit|halfvec]
quantization type for vectors (in table). If
equal to bit, the parameter
quantization_type will be set to bit too.
--reranking / --skip-reranking Enable reranking for HNSW search for binary
quantization
--reranking-metric [L2|COSINE|IP|DP]
Distance metric for reranking [default:
COSINE]
--quantized-fetch-limit INTEGER
Limit of fetching quantized vector ranked by
distance for reranking --
bound by ef_search
--custom-case-name TEXT Custom case name i.e. PerformanceCase1536D50K
--custom-case-description TEXT Custom name description
--custom-case-load-timeout INTEGER
Custom case load timeout [default: 36000]
--custom-case-optimize-timeout INTEGER
Custom case optimize timeout [default: 36000]
--custom-dataset-name TEXT
Dataset name i.e OpenAI
--custom-dataset-dir TEXT Dataset directory i.e. openai_medium_500k
--custom-dataset-size INTEGER Dataset size i.e. 500000
--custom-dataset-dim INTEGER Dataset dimension
--custom-dataset-metric-type TEXT
Dataset distance metric [default: COSINE]
--custom-dataset-file-count INTEGER
