SkillAgentSearch skills...

Chunk

๐Ÿš€ The Fastest Chunker in the West ๐Ÿ‡บ๐Ÿ‡ธ Upto 1TB/s "semantic" chunking, quick and easy!

Install / Use

/learn @chonkie-inc/Chunk
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="assets/memchunk_wide.png" alt="chunk" width="500"> </p> <h1 align="center">chunk</h1> <p align="center"> <em>the fastest text chunking library โ€” up to 1 TB/s throughput</em> </p> <p align="center"> <a href="https://crates.io/crates/chunk"><img src="https://img.shields.io/crates/v/chunk.svg?color=e74c3c" alt="crates.io"></a> <a href="https://pypi.org/project/chonkie-core"><img src="https://img.shields.io/pypi/v/chonkie-core.svg?color=e67e22" alt="PyPI"></a> <a href="https://www.npmjs.com/package/@chonkiejs/chunk"><img src="https://img.shields.io/npm/v/@chonkiejs/chunk.svg?color=2ecc71" alt="npm"></a> <a href="https://docs.rs/chunk"><img src="https://img.shields.io/docsrs/chunk?color=3498db" alt="docs.rs"></a> <a href="LICENSE-MIT"><img src="https://img.shields.io/badge/license-MIT%2FApache--2.0-9b59b6.svg" alt="License"></a> </p>

you know how every chunking library claims to be fast? yeah, we actually meant it.

chunk splits text at semantic boundaries (periods, newlines, the usual suspects) and does it stupid fast. we're talking "chunk the entire english wikipedia in 120ms" fast.

want to know how? read the blog post where we nerd out about SIMD instructions and lookup tables.

[!NOTE] chunk was previously known as memchunk. It still contains the chunk method but now also includes other basic operations that power chonkie core.

<p align="center"> <img src="assets/benchmark.png" alt="Benchmark comparison" width="700"> </p> <p align="center"> <em>See <a href="benches/">benches/</a> for detailed benchmarks.</em> </p>

๐Ÿ“ฆ Installation

cargo add chunk

looking for python or javascript?

๐Ÿš€ Usage

use chunk::chunk;

let text = b"Hello world. How are you? I'm fine.\nThanks for asking.";

// With defaults (4KB chunks, split at \n . ?)
let chunks: Vec<&[u8]> = chunk(text).collect();

// With custom size
let chunks: Vec<&[u8]> = chunk(text).size(1024).collect();

// With custom delimiters
let chunks: Vec<&[u8]> = chunk(text).delimiters(b"\n.?!").collect();

// With multi-byte pattern (e.g., metaspace โ– for SentencePiece tokenizers)
let metaspace = "โ–".as_bytes();
let chunks: Vec<&[u8]> = chunk(text).pattern(metaspace).prefix().collect();

// With consecutive pattern handling (split at START of runs, not middle)
let chunks: Vec<&[u8]> = chunk(b"word   next")
    .pattern(b" ")
    .consecutive()
    .collect();

// With forward fallback (search forward if no pattern in backward window)
let chunks: Vec<&[u8]> = chunk(text)
    .pattern(b" ")
    .forward_fallback()
    .collect();

๐Ÿ“ Citation

If you use chunk in your research, please cite it as follows:

@software{chunk2025,
  author = {Minhas, Bhavnick},
  title = {chunk: The fastest text chunking library},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/chonkie-inc/chunk}},
}

๐Ÿ“„ License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

Related Skills

View on GitHub
GitHub Stars328
CategoryDevelopment
Updated2d ago
Forks21

Languages

Rust

Security Score

95/100

Audited on Apr 6, 2026

No findings