Chunk
๐ The Fastest Chunker in the West ๐บ๐ธ Upto 1TB/s "semantic" chunking, quick and easy!
Install / Use
/learn @chonkie-inc/ChunkREADME
you know how every chunking library claims to be fast? yeah, we actually meant it.
chunk splits text at semantic boundaries (periods, newlines, the usual suspects) and does it stupid fast. we're talking "chunk the entire english wikipedia in 120ms" fast.
want to know how? read the blog post where we nerd out about SIMD instructions and lookup tables.
<p align="center"> <img src="assets/benchmark.png" alt="Benchmark comparison" width="700"> </p> <p align="center"> <em>See <a href="benches/">benches/</a> for detailed benchmarks.</em> </p>[!NOTE] chunk was previously known as memchunk. It still contains the
chunkmethod but now also includes other basic operations that power chonkie core.
๐ฆ Installation
cargo add chunk
looking for python or javascript?
๐ Usage
use chunk::chunk;
let text = b"Hello world. How are you? I'm fine.\nThanks for asking.";
// With defaults (4KB chunks, split at \n . ?)
let chunks: Vec<&[u8]> = chunk(text).collect();
// With custom size
let chunks: Vec<&[u8]> = chunk(text).size(1024).collect();
// With custom delimiters
let chunks: Vec<&[u8]> = chunk(text).delimiters(b"\n.?!").collect();
// With multi-byte pattern (e.g., metaspace โ for SentencePiece tokenizers)
let metaspace = "โ".as_bytes();
let chunks: Vec<&[u8]> = chunk(text).pattern(metaspace).prefix().collect();
// With consecutive pattern handling (split at START of runs, not middle)
let chunks: Vec<&[u8]> = chunk(b"word next")
.pattern(b" ")
.consecutive()
.collect();
// With forward fallback (search forward if no pattern in backward window)
let chunks: Vec<&[u8]> = chunk(text)
.pattern(b" ")
.forward_fallback()
.collect();
๐ Citation
If you use chunk in your research, please cite it as follows:
@software{chunk2025,
author = {Minhas, Bhavnick},
title = {chunk: The fastest text chunking library},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/chonkie-inc/chunk}},
}
๐ License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot ๅฏๅชไฝๆถๅ่ฝๅใไฝฟ็จ <qqmedia> ๆ ็ญพ๏ผ็ณป็ปๆ นๆฎๆไปถๆฉๅฑๅ่ชๅจ่ฏๅซ็ฑปๅ๏ผๅพ็/่ฏญ้ณ/่ง้ข/ๆไปถ๏ผใ
