eBookBot: Full-Stack Neural EPUB Reader

A premium, local-first EPUB reader with high-fidelity "Direct Neural" text-to-speech. Built with Next.js, FastAPI, and ONNX.

🚀 Overview

eBookBot converts your EPUB books into immersive audio experiences. It uses a Flow-Matching based TTS engine (ReaderAudioEngine) to generate natural speech with precise word-level synchronization.

TTS model in use: Supertone/supertonic-2.

Reader SS

Highlights

Local-first pipeline with fast, responsive playback.
Word-sync highlighting aligned with neural audio.
Fine-grained reading controls for layout and tempo.
Modular architecture: Next.js UI + FastAPI API + ONNX TTS engine.

Stack

| Layer | Technology | | --- | --- | | Frontend | Next.js (App Router) | | Backend | FastAPI | | TTS Engine | ONNX Runtime + ReaderAudioEngine | | TTS Model | Supertone/supertonic-2 |

📖 How to Use

Requirements

Python 3.10+
Node.js 18+
ONNX Runtime (CUDA recommended for GPU acceleration, works on CPU too)

⚡ Quick Start (Recommended)

The easiest way to run both the backend and frontend simultaneously is using the run.py script:

python run.py

This will:

Start the FastAPI backend.
Start the Next.js frontend.
Handle clean shutdown of both services.

🔧 Manual Setup

If you prefer to run services separately:

1. Backend Setup (ReaderAudioAPI)

cd ReaderAudioAPI
pip install -r requirements.txt
python -m uvicorn app.main:app --reload

2. Frontend Setup (reader-frontend)

cd reader-frontend
npm install
npm run dev

Adding Books

Open http://localhost:3000.
Click the + (Plus) icon in the sidebar.
Upload an EPUB file and wait for processing.
- Tip: You can purchase high-quality EPUBs from official bookstores or find catalogs on community sites like Free Media Collection.
Select the book and click Play.

Storage Notes

An average book requires about 400 MB of local storage (audio + cache). We will optimize this in the future; see TODO below.

Data Directory

By default, runtime data is stored in ReaderAudioAPI/oas_assets/ (uploads, audio, metadata).

You can override this location by setting EBOOKBOT_DATA_DIR before starting the backend.

Performance & Resource Management

TTS worker pool: defaults to 3 GPU workers.
- EBOOKBOT_TTS_WORKERS (default: 3)
- EBOOKBOT_TTS_MAX_INFLIGHT (default: workers * 2)
- EBOOKBOT_TTS_TASK_TIMEOUT_SECONDS (default: 600)
Idle GPU cleanup: when you pause generation and no work remains, the TTS worker processes shut down to free VRAM. Workers will auto-resume on the next queued task.

⚙️ Features

Dynamic Controls
- Precise sliders for Reading Size, Line Height, Word Spacing, and Chunk Gaps
- Tempo control (0.5x to 3.0x)
Instant Playback: Iterative chunking lets you start instantly while the rest builds in the background.
Word-Sync: Visual highlighting tracks the neural audio in real time.

🛠 ReaderAudioEngine (Submodule)

The core engine is included as a submodule. It is responsible for:

Auto-downloading models from HuggingFace.
Low-latency ONNX inference.
Estimating precise word timestamps for highlighting.

To contribute or find more details about the engine, visit the ReaderAudioEngine/ directory.

TODO

[ ] Optimize per-book storage size (target below ~100 MB).
- it is 200 right now.
[x] Add audio compression or streaming for long books.
[ ] Provide a cleanup tool for cached audio.

License

MIT (see LICENSE).

Note: the TTS model and the ReaderAudioEngine submodule may be governed by their own separate licenses/terms.

Contact

| Type | Details | | --- | --- | | Author | Izzet Sezer | | Email | sezer@imsezer.com |

EBookReaderFullStack

Install / Use

README