mindtro / Semafold
17Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).
universal
embedding-compressionkv-cachellm-inference+7
3