Semafold
Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).
Install / Use
/learn @mindtro/SemafoldREADME
Semafold
Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — no GPU required by default, but professionally accelerated on NVIDIA (CUDA) and Apple Silicon (Metal) when available.
Semafold is a vector-first compression toolkit for AI workloads that compresses embeddings, retrieval representations, and cache-shaped KV tensors with explicit byte accounting, typed encode/decode contracts, and validation evidence. It is designed for teams building AI infrastructure that need measurable storage reduction without losing visibility into distortion, artifact size, or integration boundaries.
Today it is strongest at two jobs:
- compressing embedding / vector workloads
- compressing cache-shaped K/V tensors with TurboQuant-based codecs
It gives you:
- typed encode/decode contracts
- measured byte accounting
- explicit guarantees and validation evidence
- deterministic synthetic validation and benchmarks
- pure NumPy core — no GPU required, runs anywhere
- enterprise GPU acceleration — zero-config, automatic offloading to PyTorch (CUDA/MPS) or MLX (Apple Metal) when installed
Compression Results
| Workload | Baseline | Setting | Artifact Size | Smaller | Ratio |
|---|---:|---|---:|---:|---:|
| Embedding 128 x 1536 | float32 786,432 B | TurboQuantMSE 3-bit | 74,738 B | 90.50% | 10.52x |
| Embedding 128 x 1536 | fp16/bf16 393,216 B | TurboQuantMSE 3-bit | 74,738 B | 80.99% | 5.26x |
| KV tensor (4,8,256,128) | float32 8,388,608 B | K=Prod 3b, V=MSE 3b | 885,734 B | 89.44% | 9.47x |
| KV tensor (4,8,256,128) | fp16/bf16 4,194,304 B | K=Prod 3b, V=MSE 3b | 885,734 B | 78.88% | 4.74x |
Full benchmark details: turboquant_benchmark_report.md
Distribution / import names today:
- distribution:
semafold - import:
semafold
Architecture
semafold
├─ Stable root API
│ ├─ core
│ │ ├─ CompressionBudget
│ │ ├─ CompressionEstimate
│ │ ├─ CompressionFootprint
│ │ ├─ CompressionGuarantee
│ │ └─ ValidationEvidence
│ └─ vector
│ ├─ VectorEncodeRequest
│ ├─ VectorEncoding
│ ├─ VectorDecodeRequest
│ └─ VectorCodec
├─ Codec layer
│ ├─ PassthroughVectorCodec
│ ├─ ScalarReferenceVectorCodec
│ └─ TurboQuant family
│ ├─ TurboQuantMSEVectorCodec
│ ├─ TurboQuantProdVectorCodec
│ └─ kv
│ ├─ TurboQuantKVConfig
│ └─ TurboQuantKVPreviewCodec
├─ Compute backend layer (v0.2.0)
│ ├─ ComputeBackend protocol
│ ├─ NumPyBackend — always available (default)
│ ├─ TorchBackend — CUDA / MPS (pip install semafold[torch])
│ └─ MLXBackend — Metal (pip install semafold[mlx])
└─ Validation and benchmarking
├─ contract / unit / integration tests
├─ paper-shaped vector validation
└─ synthetic KV benchmark and benchmark report
Read it as:
- the stable root gives you the generic Semafold contract surface
- the codec layer provides concrete compression implementations
- the TurboQuant family is the current high-performance path for vector and KV-tensor workloads
- the validation layer keeps storage, distortion, and behavioral checks measurable
Where It Fits
Semafold is a good fit when you want to reduce the storage footprint of numeric AI representations:
- embedding stores
- vector databases and retrieval pipelines
- long-term vector memory in AI orchestrators
- cache-shaped K/V tensor compression in custom inference stacks
Semafold is not a text summarizer. It does not shorten prompts or reduce token counts by rewriting text. Its current strength is compression of vectors and tensors.
Current Capability Surface
Stable today:
- root imports from
semafold CompressionBudgetCompressionEstimateCompressionFootprintCompressionGuaranteeValidationEvidenceEncodingBoundTypeWorkloadSuitabilityVectorEncodeRequestVectorEncodingSegmentVectorEncodingVectorDecodeRequestVectorDecodeResultVectorCodecPassthroughVectorCodecEncodeObjectiveEncodeMetricEncodingSegmentKind
Available today, but intentionally outside the stable root surface:
semafold.turboquantsemafold.turboquant.kvScalarReferenceVectorCodec
That means TurboQuant already works, but it is currently a deep-import surface rather than a root export.
Install
pip install semafold # NumPy core — no GPU required
pip install semafold[torch] # + NVIDIA CUDA / Apple MPS acceleration
pip install semafold[mlx] # + Apple Silicon Metal acceleration
pip install "semafold[torch,mlx]" # both
Quickstart
Install locally from the package directory:
python3 -m pip install -e ".[dev]"
Runnable versions of the examples below live in examples/.
Stable Root Quickstart
Run the exact file here: examples/wire_roundtrip.py
import numpy as np
from semafold import EncodeObjective
from semafold import PassthroughVectorCodec
from semafold import VectorDecodeRequest
from semafold import VectorEncodeRequest
codec = PassthroughVectorCodec()
request = VectorEncodeRequest(
data=np.linspace(-1.0, 1.0, 1024, dtype=np.float32),
objective=EncodeObjective.RECONSTRUCTION,
)
encoding = codec.encode(request)
decoded = codec.decode(VectorDecodeRequest(encoding=encoding))
assert decoded.data.shape == request.data.shape
TurboQuant Embedding Example
Run the exact file here: examples/turboquant_embedding.py
import numpy as np
from semafold import EncodeMetric
from semafold import EncodeObjective
from semafold import VectorDecodeRequest
from semafold import VectorEncodeRequest
from semafold.turboquant import TurboQuantMSEConfig
from semafold.turboquant import TurboQuantMSEVectorCodec
rows = np.random.default_rng(7).normal(size=(128, 1536)).astype(np.float32)
codec = TurboQuantMSEVectorCodec(
config=TurboQuantMSEConfig(default_bits_per_scalar=3, default_rotation_seed=7)
)
encoding = codec.encode(
VectorEncodeRequest(
data=rows,
objective=EncodeObjective.RECONSTRUCTION,
metric=EncodeMetric.MSE,
role="embedding",
seed=11,
)
)
decoded = codec.decode(VectorDecodeRequest(encoding=encoding))
print(encoding.footprint.total_bytes, encoding.footprint.compression_ratio)
assert decoded.data.shape == rows.shape
TurboQuant KV Tensor Example
Run the exact file here: examples/turboquant_kv_block.py
These examples use the current TurboQuant deep-import surface rather than stable root exports.
import numpy as np
from semafold.turboquant.kv import TurboQuantKVConfig
from semafold.turboquant.kv import TurboQuantKVPreviewCodec
keys = np.random.default_rng(7).normal(size=(4, 8, 256, 128)).astype(np.float32)
values = np.random.default_rng(11).normal(size=(4, 8, 256, 128)).astype(np.float32)
codec = TurboQuantKVPreviewCodec(
config=TurboQuantKVConfig(
key_total_bits_per_scalar=3,
value_bits_per_scalar=3,
default_key_rotation_seed=7,
default_key_qjl_seed=11,
default_value_rotation_seed=7,
)
)
artifact = codec.compress(keys, values)
keys_hat, values_hat = codec.decompress(artifact)
stats = codec.memory_stats(artifact)
print(stats["combined_bytes"], stats["combined_compression_ratio"])
assert keys_hat.shape == keys.shape
assert values_hat.shape == values.shape
Runnable versions of these examples live here:
- examples/README.md
- examples/wire_roundtrip.py
- examples/turboquant_embedding.py
- examples/turboquant_kv_block.py
Benchmark Details
Benchmark runners and detailed report:
Benchmarks
Run the synthetic benchmark runners from the package directory:
PYTHONPATH=src python benchmarks/turboquant_paper_validation.py --output /tmp/turboquant-paper.json
PYTHONPATH=src python benchmarks/turboquant_synthetic_kv_benchmark.py --output /tmp/turboquant-kv.json
Benchmark documentation lives here:
Validation and Quality Gates
Current local closeout commands:
PYTHONPATH=src pytest tests -q
PYTHONPATH=src pyright --project pyproject.toml src tests examples benchmarks
python3 -m build
Repository Notes
- stability policy: STABILITY.md
- change log: CHANGELOG.md
License
Semafold is currently packaged here under:
For the current package directory, the intended license is Apache-2.0.
Current Maturity
Semafold already supports:
- vector / embedding compression
- cache-shaped K/V tensor compression
- measured compression accounting
- synthetic attention-proxy validation for compressed K/V tensors
The next layer after this is runtime/backend integration, not core compression math.
References
- TurboQuant paper: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Related Skills
feishu-drive
349.0k|
things-mac
349.0kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
349.0kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
postkit
PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver
