Audiotext
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Install / Use
/learn @HenestrosaDev/AudiotextREADME
<div id="top"></div>
<!-- PROJECT SHIELDS -->
<!--
*** I am using markdown "reference style" links for readability.
*** Reference links are enclosed in brackets [ ] instead of parentheses ( ).
*** See the bottom of this document for the declaration of the reference variables
*** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use.
*** https://www.markdownguide.org/basic-syntax/#reference-style-links
-->
<!-- PROJECT LOGO -->
<div align="center">
<picture>
<source
srcset="docs/light/icon.png"
width="128"
height="128"
media="(prefers-color-scheme: light)"
/>
<source
srcset="docs/dark/icon.png"
width="128"
height="128"
media="(prefers-color-scheme: dark)"
/>
<img src="docs/light/icon.png" alt="Logo" width="128" height="128">
</picture>
<h1 align="center">Audiotext</h1>
<p align="center">A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.</p>
<p>
<a href="https://github.com/HenestrosaDev/audiotext/actions/workflows/code-quality.yml">
<img
src="https://github.com/HenestrosaDev/audiotext/actions/workflows/code-quality.yml/badge.svg"
alt="Code Quality badge status"
/>
</a>
<br>
<a href="https://github.com/HenestrosaDev/audiotext/releases/latest">
<img
src="https://img.shields.io/github/v/release/HenestrosaDev/audiotext"
alt="Version"
/>
</a>
<a href="https://github.com/HenestrosaDev/audiotext/stargazers">
<img
src="https://img.shields.io/github/stars/HenestrosaDev/audiotext"
alt="GitHub Contributors"
/>
</a>
<a href="https://github.com/HenestrosaDev/audiotext/blob/main/LICENSE">
<img
src="https://img.shields.io/badge/license-BSD--4--Clause-lightgray"
alt="License"
/>
</a>
<br>
<a href="https://github.com/HenestrosaDev/audiotext/graphs/contributors">
<img
src="https://img.shields.io/github/contributors/HenestrosaDev/audiotext"
alt="GitHub Contributors"
/>
</a>
<a href="https://github.com/HenestrosaDev/audiotext/issues">
<img
src="https://img.shields.io/github/issues/HenestrosaDev/audiotext"
alt="Issues"
/>
</a>
<a href="https://github.com/HenestrosaDev/audiotext/pulls">
<img
src="https://img.shields.io/github/issues-pr/HenestrosaDev/audiotext"
alt="GitHub pull requests"
/>
</a>
</p>
<p>
<a href="https://github.com/HenestrosaDev/audiotext/issues/new/choose">
Report Bug
</a>
·
<a href="https://github.com/HenestrosaDev/audiotext/issues/new/choose">
Request Feature
</a>
·
<a href="https://github.com/HenestrosaDev/audiotext/discussions">
Ask Question
</a>
</p>
</div>
<!-- TABLE OF CONTENTS -->
Table of Contents
- About the Project
- Getting Started
- Usage
- Troubleshooting
- Roadmap
- Authors
- Contributing
- Acknowledgments
- License
- Support
About the Project

Audiotext transcribes the audio from an audio file, video file, microphone input, directory, or YouTube video into any of the 99 different languages it supports. You can transcribe using the Google Speech-to-Text API, the Whisper API, or WhisperX. The last two methods can even translate the transcription or generate subtitles!
You can also choose the theme you like best. It can be dark, light, or the one configured in the system.
<details> <summary>Dark</summary> <img src="docs/dark/from-file.png" alt="Dark theme"> </details> <details> <summary>Light</summary> <img src="docs/light/from-file.png" alt="Light theme"> </details> <!-- SUPPORTED LANGUAGES -->Supported Languages
<details> <summary>Click here to display</summary>- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Assamese
- Azerbaijan
- Bashkir
- Basque
- Belarusian
- Bengali
- Bosnian
- Breton
- Bulgarian
- Burmese
- Catalan
- Chinese
- Chinese (Yue)
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Faroese
- Farsi
- Finnish
- French
- Galician
- Georgian
- German
- Greek
- Gujarati
- Haitian
- Hausa
- Hawaiian
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Indonesian
- Italian
- Japanese
- Javanese
- Kannada
- Kazakh
- Khmer
- Korean
- Lao
- Latin
- Latvian
- Lingala
- Lithuanian
- Luxembourgish
- Macedonian
- Malagasy
- Malay
- Malayalam
- Maltese
- Maori
- Marathi
- Mongolian
- Nepali
- Norwegian
- Norwegian Nynorsk
- Occitan
- Pashto
- Polish
- Português
- Punjabi
- Romanian
- Russian
- Sanskrit
- Serbian
- Shona
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Somali
- Spanish
- Sundanese
- Swahili
- Swedish
- Tagalog
- Tajik
- Tamil
- Tatar
- Telugu
- Thai
- Tibetan
- Turkish
- Turkmen
- Ukrainian
- Urdu
- Uzbek
- Vietnamese
- Welsh
- Yiddish
- Yoruba
Supported File Types
<details> <summary>Audio file formats</summary>.aac.flac.mp3.mpeg.oga.ogg.opus.wav.wma
.3g2.3gp2.3gp.3gpp2.3gpp.asf.avi.f4a.f4b.f4v.flv.m4a.m4b.m4r.m4v.mkv.mov.mp4.ogv.ogx.webm.wmv
Project Structure
<details> <summary>ASCII folder structure</summary>│ .gitignore
│ audiotext.spec
│ LICENSE
│ README.md
│ requirements.txt
│
├───.github
│ │ CONTRIBUTING.md
│ │ FUNDING.yml
│ │
│ ├───ISSUE_TEMPLATE
│ │ bug_report_template.md
│ │ feature_request_template.md
│ │
│ └───PULL_REQUEST_TEMPLATE
│ pull_request_template.md
│
├───docs/
│
├───res
│ ├───img
│ │ icon.ico
│ │
│ └───locales
│ │ main_controller.pot
│ │ main_window.pot
│ │
│ ├───en
│ │ └───LC_MESSAGES
│ │ app.mo
│ │ app.po
│ │ main_controller.po
│ │ main_window.po
│ │
│ └───es
│ └───LC_MESSAGES
│ app.mo
│ app.po
│ main_controller.po
│ main_window.po
│
└───src
│ app.py
│
├───controllers
│ __init__.py
│ main_controller.py
│
├───handlers
│ file_handler.py
│ google_api_handler.py
│ openai_api_handler.py
│ whisperx_handler.py
│ youtube_handler.py
│
├───interfaces
│ transcribable.py
│
├───models
│ │ __init__.py
│ │ transcription.py
│ │
│ └───config
│ __init__.py
│ config_subtitles.py
│ config_system.py
│ config_transcription.py
│ config_whisper_api.py
│ config_whisperx.py
│
├───utils
│ __init__.py
│ audio_utils.py
│ config_manager.py
│ constants.py
│ dict_utils.py
│ enums.py
│ env_keys.py
│ path_helper.py
│
└───views
│ __init__.py
│ main_window.py
│
└───custom_widgets
__init__.py
ctk_scrollable_dropdown/
ctk_input_dialog.py
</details>
<!-- BUILT WITH -->
Built With
- CTkScrollableDropdown for the scrollable option menu to display the full list of supported languages.
- CustomTkinter for the GUI.
- moviepy for video processing, from which the program extracts the audio to be t
