VectorDBBench(VDBBench): A Benchmark Tool for VectorDB

What is VDBBench

VDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.

Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly. To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.

Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as SIFT, GIST, Cohere, and a dataset generated by OpenAI from an opensource raw dataset. It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!

Prepare to delve into the world of VDBBench, and let it guide you in uncovering your perfect vector database match.

VDBBench is sponsored by Zilliz，the leading opensource vectorDB company behind Milvus. Choose smarter with VDBBench - start your free test on zilliz cloud today!

Leaderboard: https://zilliz.com/benchmark

Quick Start

Prerequirement

python >= 3.11

Install

Install vectordb-bench with only PyMilvus

pip install vectordb-bench

Install all database clients

pip install 'vectordb-bench[all]'

Install the specific database client

pip install 'vectordb-bench[pinecone]'

All the database client supported

| Optional database client | install command | |--------------------------|---------------------------------------------| | pymilvus, zilliz_cloud (default) | pip install vectordb-bench | all (clients requirements might be conflict with each other) | qdrant | pip install vectordb-bench[qdrant] | pinecone | pip install vectordb-bench[pinecone] | weaviate | pip install vectordb-bench[weaviate] | elastic, aliyun_elasticsearch| pip install vectordb-bench[elastic] | pgvector, pgvectorscale, pgdiskann, alloydb | pip install vectordb-ben | pgvecto.rs | pip install vectordb-bench[pgvecto_rs] | redis | pip install vectordb-bench[redis] | memorydb | pip install vectordb-bench[memorydb] | chromadb | pip install vectordb-bench[chromadb] | cockroachdb | pip install vectordb-bench[cockroachdb] | awsopensearch | pip install vectordb-bench[opensearch] | aliyun_opensearch | pip install vectordb-bench[aliyun_opensearch] | mongodb | pip install vectordb-bench[mongodb] | tidb | pip install vectordb-bench[tidb] | vespa | pip install vectordb-bench[vespa] | oceanbase | pip install vectordb-bench[oceanbase] | hologres | pip install vectordb-bench[hologres] | tencent_es | pip install vectordb-bench[tencent_es] | alisql | pip install 'vectordb-bench[alisql]' | doris | pip install vectordb-bench[doris] | zvec | pip install vectordb-bench[zvec] | endee | pip install vectordb-bench[endee] | lindorm | pip install vectordb-bench[lindorm] | | pip install vectordb-bench[all] | | | | | ch[pgvector] | | | | | | | | | | | | | | | | | | |

Run

init_bench

OR:

Run from the command line.

vectordbbench [OPTIONS] COMMAND [ARGS]...

To list the clients that are runnable via the commandline option, execute: vectordbbench --help

$ vectordbbench --help
Usage: vectordbbench [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  pgvectorhnsw
  pgvectorivfflat
  test
  weaviate

To list the options for each command, execute vectordbbench [command] --help

$ vectordbbench pgvectorhnsw --help
Usage: vectordbbench pgvectorhnsw [OPTIONS]

Options:
  --config-file PATH              Read configuration from yaml file
  --drop-old / --skip-drop-old    Drop old or skip  [default: drop-old]
  --load / --skip-load            Load or skip  [default: load]
  --search-serial / --skip-search-serial
                                  Search serial or skip  [default: search-
                                  serial]
  --search-concurrent / --skip-search-concurrent
                                  Search concurrent or skip  [default: search-
                                  concurrent]
  --case-type [CapacityDim128|CapacityDim960|Performance768D100M|Performance768D10M|Performance768D1M|Performance768D10M1P|Performance768D1M1P|Performance768D10M99P|Performance768D1M99P|Performance1536D500K|Performance1536D5M|Performance1536D500K1P|Performance1536D5M1P|Performance1536D500K99P|Performance1536D5M99P|Performance1536D50K]
                                  Case type
  --db-label TEXT                 Db label, default: date in ISO format
                                  [default: 2024-05-20T20:26:31.113290]
  --dry-run                       Print just the configuration and exit
                                  without running the tasks
  --k INTEGER                     K value for number of nearest neighbors to
                                  search  [default: 100]
  --concurrency-duration INTEGER  Adjusts the duration in seconds of each
                                  concurrency search  [default: 30]
  --num-concurrency TEXT          Comma-separated list of concurrency values
                                  to test during concurrent search  [default:
                                  1,10,20]
  --concurrency-timeout INTEGER   Timeout (in seconds) to wait for a
                                  concurrency slot before failing. Set to a
                                  negative value to wait indefinitely.
                                  [default: 3600]
  --user-name TEXT                Db username  [required]
  --password TEXT                 Db password  [required]
  --host TEXT                     Db host  [required]
  --db-name TEXT                  Db name  [required]
  --maintenance-work-mem TEXT     Sets the maximum memory to be used for
                                  maintenance operations (index creation). Can
                                  be entered as string with unit like '64GB'
                                  or as an integer number of KB.This will set
                                  the parameters:
                                  max_parallel_maintenance_workers,
                                  max_parallel_workers &
                                  table(parallel_workers)
  --max-parallel-workers INTEGER  Sets the maximum number of parallel
                                  processes per maintenance operation (index
                                  creation)
  --m INTEGER                     hnsw m
  --ef-construction INTEGER       hnsw ef-construction
  --ef-search INTEGER             hnsw ef-search
  --quantization-type [none|bit|halfvec]
                                  quantization type for vectors (in index)
  --table-quantization-type [none|bit|halfvec]
                                  quantization type for vectors (in table). If
                                  equal to bit, the parameter
                                  quantization_type will be set to bit too.
  --reranking / --skip-reranking  Enable reranking for HNSW search for binary
                                  quantization
  --reranking-metric [L2|COSINE|IP|DP]
                                  Distance metric for reranking  [default:
                                  COSINE]
  --quantized-fetch-limit INTEGER
                                  Limit of fetching quantized vector ranked by
                                  distance for reranking                 --
                                  bound by ef_search
  --custom-case-name TEXT         Custom case name i.e. PerformanceCase1536D50K
  --custom-case-description TEXT  Custom name description
  --custom-case-load-timeout INTEGER
                                  Custom case load timeout [default: 36000]
  --custom-case-optimize-timeout INTEGER
                                  Custom case optimize timeout [default: 36000]
  --custom-dataset-name TEXT
                                  Dataset name i.e OpenAI
  --custom-dataset-dir TEXT       Dataset directory i.e. openai_medium_500k
  --custom-dataset-size INTEGER   Dataset size i.e. 500000
  --custom-dataset-dim INTEGER    Dataset dimension
  --custom-dataset-metric-type TEXT
                                  Dataset distance metric [default: COSINE]
  --custom-dataset-file-count INTEGER

VectorDBBench

Install / Use

README