Chunkipy
chunkipy is an extremely useful tool for segmenting long texts into smaller chunks, based on either a character or token count. With customizable chunk sizes and splitting strategies, chunkipy provides flexibility and control for various text processing tasks.
Install / Use
/learn @gioelecrispo/ChunkipyREADME
Chunkipy
chunkipy is a modular and extensible text chunking library for Python, built for NLP and LLM pipelines.
Why Chunkipy?
- ✅ Lightweight core with optional extras
- ✅ Configurable overlap support via
overlap_ratio - ✅ Composable architecture (chunkers + splitters + size estimators + language detectors)
- ✅ Practical defaults with customizable behavior
Quick Example
from chunkipy import FixedSizeTextChunker
text = "Chunkipy makes text processing modular, flexible, and powerful!"
chunker = FixedSizeTextChunker(chunk_size=20, overlap_ratio=0.2)
chunks = chunker.chunk(text)
for i, c in enumerate(chunks):
print(f"Chunk {i + 1}: {c}")
Implemented vs Roadmap
| Status | Strategy |
| --- | --- |
| ✅ Implemented | FixedSizeTextChunker |
| ✅ Implemented | RecursiveTextChunker |
| 🚧 Roadmap | Document-based chunking |
| 🚧 Roadmap | Semantic chunker |
| 🚧 Roadmap | LLM-based chunker |
Semantic sentence splitters and language detectors are already available and can be used today.
Installation
Install core package:
pip install chunkipy
Install optional feature groups:
pip install "chunkipy[language-detection]" # Language detection (langdetect + fasttext)
pip install "chunkipy[nlp]" # NLP backends (spacy + stanza)
pip install "chunkipy[ai]" # LLM integration (openai + tiktoken)
pip install "chunkipy[all]" # All optional dependencies
Or install individual packages:
pip install "chunkipy[spacy]"
pip install "chunkipy[stanza]"
pip install "chunkipy[langdetect]"
pip install "chunkipy[fasttext]"
pip install "chunkipy[openai]"
pip install "chunkipy[tiktoken]"
Documentation
Full guides and API reference: 👉 https://gioelecrispo.github.io/chunkipy
Examples: 👉 https://github.com/gioelecrispo/chunkipy/tree/main/examples
Contributing
Issues and pull requests are welcome: 👉 https://github.com/gioelecrispo/chunkipy/issues
For local setup, see CONTRIBUTING.md.
License
chunkipy is released under the MIT License.
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
