Chunkpad
Chunkpad is designed to prepare documents for Retrieval-Augmented Generation (RAG) pipelines and AI applications.
Install / Use
/learn @gidea/ChunkpadREADME
Chunkpad
A desktop application for parsing documents (PDF, DOCX, PPTX), chunking them into RAG-optimized segments, and exporting them in multiple formats. Built with Electron for cross-platform support.
Features
Document Processing
- Multi-format Support: Parse PDF, DOCX, and PPTX files
- Intelligent Parsing: Preserves document structure, headings, and metadata
- Page/Slide Tracking: Maintains page numbers (PDF) and slide numbers (PPTX) in chunk metadata
Chunking Strategies
- Fixed Size: Token-based chunking with configurable size and overlap (default)
- Heading-Aware: Groups content under headings, preserving document hierarchy
- Paragraph-Aware: Recursive splitting (paragraph → sentence → token) maintaining paragraph coherence
- Sliding Window: Overlapping chunks with configurable windows for maximum context preservation
- Per-File Configuration: Each document can use its own chunking strategy and settings
Editing & Management
- Rich Text Editor: TipTap-powered markdown editor with formatting tools
- Metadata Management: Document-level and chunk-level metadata with YAML front matter
- Drag-and-Drop: Reorder chunks with intuitive interface
- Real-time Token Counting: Live token counts for each chunk
Export Options
- Multiple Formats: JSON, Markdown, or Plain Text
- Export Modes: Single combined file or multiple individual files
- Clean Markdown: HTML-to-Markdown conversion with proper formatting
- RAG-Ready: Includes strategy metadata, section paths, and source information
Desktop Features
- Native File Dialogs: System file pickers for opening and saving
- File Associations: Double-click
.docx,.pptx,.pdf, or.chunkpadfiles to open - Project Persistence: Save and load entire workspace state
- Keyboard Shortcuts: Full keyboard support (see Help menu)
Quick Start
Prerequisites
- Node.js v18 or higher
- npm or yarn
Local Development
-
Clone and install
git clone <YOUR_GIT_URL> cd chunkpad npm install -
Run Electron app
npm run electron:dev -
Build for production
npm run electron:distDistributables will be in the
release/directory.
Web Version (Optional)
npm run dev
Navigate to http://localhost:8080 (or the port shown in terminal).
Usage
- Load Documents: Open PDF, DOCX, or PPTX files via File menu or drag-and-drop
- Configure Chunking: Use the toolbar below the document title to select a chunking strategy and adjust settings
- Edit Chunks: Click on chunks to edit content, titles, and metadata
- Export: Use the Export button to save chunks in your preferred format
Chunking Strategies Guide
- Fixed Size: Best for general purpose, unstructured documents
- Heading-Aware: Best for structured documents with clear headings (manuals, reports, technical docs)
- Paragraph-Aware: Best for narrative content (blog posts, articles, stories)
- Sliding Window: Best for technical documents where context spanning boundaries is critical
See docs/user-chunking-modes.md for detailed strategy documentation.
Technology Stack
- Frontend: React 18 + TypeScript
- Desktop: Electron
- Build: Vite
- UI: shadcn/ui (Radix UI) + Tailwind CSS
- Editor: TipTap (ProseMirror)
- Parsing: pdfjs-dist, mammoth, jszip
- Token Counting: tiktoken
- Markdown: Turndown
Documentation
- Architecture - System architecture and design
- Chunking Architecture - Multi-strategy chunking system
- User Guide - Chunking strategies guide
- Technical Review - Implementation review
Troubleshooting
PDF Parsing Issues
- Ensure PDF contains selectable text (not scanned images)
- Check DevTools console for errors (View > Toggle Developer Tools)
Large Files
- Files over 50MB will show a warning
- Processing may take 30-60 seconds for large documents
- Progress indicators show during processing
Build Issues
# Clear and reinstall
rm -rf node_modules package-lock.json
npm install
# Clear caches
rm -rf .vite dist dist-electron
npm run electron:dev
License
Open source and available for use and modification.
