VoxNovel

📋 Overview

VoxNovel is an innovative program that leverages the capabilities of booknlp to analyze literature, attribute quotations to specific characters, and generate a tailored audiobook where each character has a distinct voice through coqui tts. This not only provides an immersive audiobook experience but also brings each character to life with a unique voice, making the listening experience much more engaging.

🗣️ Included TTS Models

All Coqui TTS models-(Tacotron, Tacotron2, Glow-TTS, Speedy-Speech, Align-TTS, FastPitch, FastSpeech, FastSpeech2, SC-GlowTTS, Capacitron, OverFlow, Neural HMM TTS, Delightful TTS, ⓍTTS, VITS, 🐸 YourTTS, 🐢 Tortoise, 🐶 Bark), and STYLETTS2.

<details> <summary> 🌍🎙️ Accents you can give each character with the default cloning model (XTTS) </summary> - They also allow them to speak these languages, but the quotation attribution won't correctly identify for anything thats not English. English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) </details>

outputs as a m4b with all book metadata and chapters, example output file in a audiobook player app

Example_of_output_in_audiobook_program

(as well as a folder of individual mp4 chatper files with ebook image embedded in them if you want that)

🔊 DEMOS

High Quality XTTS V2 Demos

https://github.com/DrewThomasson/VoxNovel-OLD-/assets/126999465/9e10b36d-b2e9-4462-8bad-13a3d8fce192

<details> <summary> 🔊🎉 More Demo Audio files :) </summary>

High Quality Tortoise Demos

https://github.com/DrewThomasson/VoxNovel-OLD-/assets/126999465/94e23918-b7e1-4399-935e-179dd12212c3

Super fast audio Balacoon Demos

https://github.com/DrewThomasson/VoxNovel-OLD-/assets/126999465/5e3c5501-4c87-462b-a11b-a15f546e51f4

https://github.com/DrewThomasson/VoxNovel-OLD-/assets/126999465/f4e57afe-53df-485c-81ff-65d7dcf29cb5

**Super High Quality testing with fine tuned models **

https://github.com/DrewThomasson/bark/assets/126999465/5da79b9d-2974-471e-a564-31a180ba2833

</details>

You can fine-tune your own Xtts models with around 6+ minutes of audio for free with this colab ~~https://colab.research.google.com/drive/1GiI4_X724M8q2W-zZ-jXo7cWTV7RfaH-~~

Edit: that colab doesn't work anymore: use my version that provides a fix: https://colab.research.google.com/drive/1sqQqzupo2pdjgggkrbM60sU6sBFYo3su?usp=sharing

🤖 Headless VoxNovel Google Colab

Explore and run the interactive version of the Headless VoxNovel project directly on Google Colab! Get started here.

GUI

<img width="200" alt="gui_1_select_file" src="https://github.com/DrewThomasson/VoxNovel/blob/e39b5e742c57cc3f88aa7549a5ce5517f392103e/readme_files/gui_1_select_file.png"> <details> <summary> GUI Part 1 (BookNLP Processor) Info/Features </summary> -"Process File" button: Click and it'll ask you to select a ebook file. </details> <img width="1000" alt="gui_2_finetune" src="https://github.com/DrewThomasson/VoxNovel/blob/main/readme_files/GUI_1.5_.png"> <details> <summary>Manual Speaker Assignment Correction Tool (GUI 1.5) </summary>

This GUI is for the manual correction of speaker assignments if quotes are incorrectly assigned by BookNLP. It reads the book.csv file containing the books extracted quotes and speaker information, and allows you to visually inspect and modify speaker assignments as needed before being passed to the next TTS step.

Key Features:

Scrollable Text Display: Allows users to view the book's text with color-coded speaker assignments.
Speaker Selection: Users can select a new speaker from a dropdown menu to reassign specific lines.
Checkable Quotes: Lines from the book are displayed with checkboxes, enabling the selection of multiple lines for speaker reassignment.
Speaker Color Coding: Each speaker is assigned a unique color for easy identification.
Buttons for Action:
- Update Selected Speakers: Apply the selected speaker to all checked lines.
- Deselect All: Uncheck all selected lines.
- Continue: Save changes and exit the tool.

How to Use:

Select Lines: Check the boxes next to the lines you want to change.
Choose Speaker: Select the desired speaker from the dropdown menu.
Apply Changes: Click "Update Selected Speakers" to apply the changes.
Review: The text will update to reflect the changes.
Deselect: Click "Deselect All" to clear your selections.
Finish: Once satisfied with the corrections, click "Continue" to save and exit.

</details> <img width="1000" alt="gui_2_finetune" src="https://github.com/DrewThomasson/VoxNovel/blob/e39b5e742c57cc3f88aa7549a5ce5517f392103e/readme_files/gui_3_finetune.png"> <details> <summary> GUI Part 2 (Coqui TTS GUI) Info/Features </summary>

Select TTS Model Dropdown: This selects the TTS model that will be used for voice cloning.
Include fast Voice Models Checkbox: (Fast generate at cost of audio quality) Click this to be able to see every other model and singular voices supported by Coqui TTS.
- It will update the "Select TTS Model" Dropdown for voice cloning models to also include (List of values to be added).
- It will update the Dropdown for voices to select for each character to also include (List of values to be added).
Make all audio generate with Narrator voice Checkbox: This will make every character's audio be generated with the voice you have selected for the Narrator when you click the "Generate audio" button.
Clone new voice Button: Click this to add a new voice you can clone (make sure you have a reference audio file on hand).
Add Fine-tuned Xtts model to voice actor Button: If you have a folder containing all the parameters of a fine-tuned Xtts model of a specific voice, then you can click this to make that voice actor clone with that fine-tuned Xtts model, to provide much better voice cloning results.
Character voices Dropdowns: These are the dropdowns for selecting the Voice Actor (and the Accent of each character if using XTTS).
- (1): The Voice actors available to select from for this character. (Default value is audio selected based on inferred gender of character being: "F, M, Other").
  - When you select a voice It will play the audio sample of that voice, if it's a fast voice model voice and a refrence audio does not exist, then it will generate one to play.
- (2): The Accents available to select from for this character. (Optional, Default is English).
Chapter Delimiter Field: Will change the default chapter delimiter (The string that's used to identify chapters).
Silence Duration in milliseconds (ms) Field: This will change the amount of milliseconds in between each combined chunk of audio.
Select TTS Language Dropdown: This will let you select the default Accent used for every character which has not had the Accent manually selected for.
Loading bar: Will give an approximate amount of time left. (Estimate, you probably won't see accurate predictions until it's been running for 5 min).
Annotated book preview Block: This will show the entirety of the book with each character's lines color-coded.
- You can click on a line while the audiobook is being generated to hear what that generated line sounds like. But only if the line has already had audio generated for it; if not, it'll play nothing.
Load Book Button: Clicking this will reload the color-coded annotated book view, it will just randomize the selected colors for each character's lines.
Generate Audio Button: Will start generating the full audiobook.
Select random voices Button (Will only be visible if the "include fast Voice Models" checkbox is checked): Will Select an auto-gender-inferred fast model voice for every character except for the narrator's voice.

</details> <img width="585" alt="gui_3_run" src="https://github.com/DrewThomasson/VoxNovel/blob/e39b5e742c57cc3f88aa7549a5ce5517f392103e/readme_files/gui_2_run.png"> <details> <summary> GUI Part 3 (Book Viewer) Info/Features </summary> -It's hard to explain its more of a playground if you mess around with it then you should get how it works. But it can be used to fine tune the audiobook -Close out of the window when your done with it. </details>

📦 SetUp Install

<details> <summary> 🤖 Headless VoxNovel Google Colab</summary>

Explore and run the interactive version of the Headless VoxNovel project directly on Google Colab! Get started here.

</details> <details> <summary> 🐳 Docker (Sound not working in gui yet) </summary> <details> <summary>🐳 Headless Docker</summary> <details> <summary>Docker headless m1 🍏Mac</summary>

cd ~
git clone https://github.com/DrewThomasson/VoxNovel.git
sudo docker run -v "$HOME/VoxNovel:/VoxNovel/" -it athomasson2/voxnovel:headless_m1_v2

</details> <details> <summary>Headless Docker 🐧 Linux/Intel 🍏Mac</summary>

For Headless Docker on only cpu

cd ~
git clone https://github.com/DrewThomasson/VoxNovel.git
sudo docker run -v "$HOME/VoxNovel:/VoxNovel/" -it athomasson2/voxnovel:latest_headless

For headless docker with gpu speedup if you have a nvida gpu

cd ~
`git clone https://github.com/Dre

VoxNovel

Install / Use

README

VoxNovel

📋 Overview

🗣️ Included TTS Models

outputs as a m4b with all book metadata and chapters, example output file in a audiobook player app

🔊 DEMOS

🤖 Headless VoxNovel Google Colab

GUI

Key Features:

How to Use:

📦 SetUp Install

For Headless Docker on only cpu

For headless docker with gpu speedup if you have a nvida gpu