EBookReaderFullStack
A local-first EPUB reader with high-fidelity neural text-to-speech, word-level synchronization, and Next.js/FastAPI/ONNX stack.
Install / Use
/learn @sezer-muhammed/EBookReaderFullStackREADME
eBookBot: Full-Stack Neural EPUB Reader
A premium, local-first EPUB reader with high-fidelity "Direct Neural" text-to-speech. Built with Next.js, FastAPI, and ONNX.
🚀 Overview
eBookBot converts your EPUB books into immersive audio experiences. It uses a Flow-Matching based TTS engine (ReaderAudioEngine) to generate natural speech with precise word-level synchronization.
TTS model in use: Supertone/supertonic-2.

Highlights
- Local-first pipeline with fast, responsive playback.
- Word-sync highlighting aligned with neural audio.
- Fine-grained reading controls for layout and tempo.
- Modular architecture: Next.js UI + FastAPI API + ONNX TTS engine.
Stack
| Layer | Technology | | --- | --- | | Frontend | Next.js (App Router) | | Backend | FastAPI | | TTS Engine | ONNX Runtime + ReaderAudioEngine | | TTS Model | Supertone/supertonic-2 |
📖 How to Use
Requirements
- Python 3.10+
- Node.js 18+
- ONNX Runtime (CUDA recommended for GPU acceleration, works on CPU too)
⚡ Quick Start (Recommended)
The easiest way to run both the backend and frontend simultaneously is using the run.py script:
python run.py
This will:
- Start the FastAPI backend.
- Start the Next.js frontend.
- Handle clean shutdown of both services.
🔧 Manual Setup
If you prefer to run services separately:
1. Backend Setup (ReaderAudioAPI)
cd ReaderAudioAPI
pip install -r requirements.txt
python -m uvicorn app.main:app --reload
2. Frontend Setup (reader-frontend)
cd reader-frontend
npm install
npm run dev
Adding Books
- Open http://localhost:3000.
- Click the + (Plus) icon in the sidebar.
- Upload an EPUB file and wait for processing.
- Tip: You can purchase high-quality EPUBs from official bookstores or find catalogs on community sites like Free Media Collection.
- Select the book and click Play.
Storage Notes
An average book requires about 400 MB of local storage (audio + cache). We will optimize this in the future; see TODO below.
Data Directory
By default, runtime data is stored in ReaderAudioAPI/oas_assets/ (uploads, audio, metadata).
You can override this location by setting EBOOKBOT_DATA_DIR before starting the backend.
Performance & Resource Management
- TTS worker pool: defaults to 3 GPU workers.
EBOOKBOT_TTS_WORKERS(default: 3)EBOOKBOT_TTS_MAX_INFLIGHT(default: workers * 2)EBOOKBOT_TTS_TASK_TIMEOUT_SECONDS(default: 600)
- Idle GPU cleanup: when you pause generation and no work remains, the TTS worker processes shut down to free VRAM. Workers will auto-resume on the next queued task.
⚙️ Features
- Dynamic Controls
- Precise sliders for Reading Size, Line Height, Word Spacing, and Chunk Gaps
- Tempo control (0.5x to 3.0x)
- Instant Playback: Iterative chunking lets you start instantly while the rest builds in the background.
- Word-Sync: Visual highlighting tracks the neural audio in real time.
🛠 ReaderAudioEngine (Submodule)
The core engine is included as a submodule. It is responsible for:
- Auto-downloading models from HuggingFace.
- Low-latency ONNX inference.
- Estimating precise word timestamps for highlighting.
To contribute or find more details about the engine, visit the ReaderAudioEngine/ directory.
TODO
- [ ] Optimize per-book storage size (target below ~100 MB).
- it is 200 right now.
- [x] Add audio compression or streaming for long books.
- [ ] Provide a cleanup tool for cached audio.
License
MIT (see LICENSE).
Note: the TTS model and the ReaderAudioEngine submodule may be governed by their own separate licenses/terms.
Contact
| Type | Details | | --- | --- | | Author | Izzet Sezer | | Email | sezer@imsezer.com |
