SkillAgentSearch skills...

Audiotext

A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.

Install / Use

/learn @HenestrosaDev/Audiotext

README

<div id="top"></div> <!-- PROJECT SHIELDS --> <!-- *** I am using markdown "reference style" links for readability. *** Reference links are enclosed in brackets [ ] instead of parentheses ( ). *** See the bottom of this document for the declaration of the reference variables *** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use. *** https://www.markdownguide.org/basic-syntax/#reference-style-links --> <!-- PROJECT LOGO --> <div align="center"> <picture> <source srcset="docs/light/icon.png" width="128" height="128" media="(prefers-color-scheme: light)" /> <source srcset="docs/dark/icon.png" width="128" height="128" media="(prefers-color-scheme: dark)" /> <img src="docs/light/icon.png" alt="Logo" width="128" height="128"> </picture> <h1 align="center">Audiotext</h1> <p align="center">A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.</p> <p> <a href="https://github.com/HenestrosaDev/audiotext/actions/workflows/code-quality.yml"> <img src="https://github.com/HenestrosaDev/audiotext/actions/workflows/code-quality.yml/badge.svg" alt="Code Quality badge status" /> </a> <br> <a href="https://github.com/HenestrosaDev/audiotext/releases/latest"> <img src="https://img.shields.io/github/v/release/HenestrosaDev/audiotext" alt="Version" /> </a> <a href="https://github.com/HenestrosaDev/audiotext/stargazers"> <img src="https://img.shields.io/github/stars/HenestrosaDev/audiotext" alt="GitHub Contributors" /> </a> <a href="https://github.com/HenestrosaDev/audiotext/blob/main/LICENSE"> <img src="https://img.shields.io/badge/license-BSD--4--Clause-lightgray" alt="License" /> </a> <br> <a href="https://github.com/HenestrosaDev/audiotext/graphs/contributors"> <img src="https://img.shields.io/github/contributors/HenestrosaDev/audiotext" alt="GitHub Contributors" /> </a> <a href="https://github.com/HenestrosaDev/audiotext/issues"> <img src="https://img.shields.io/github/issues/HenestrosaDev/audiotext" alt="Issues" /> </a> <a href="https://github.com/HenestrosaDev/audiotext/pulls"> <img src="https://img.shields.io/github/issues-pr/HenestrosaDev/audiotext" alt="GitHub pull requests" /> </a> </p> <p> <a href="https://github.com/HenestrosaDev/audiotext/issues/new/choose"> Report Bug </a> · <a href="https://github.com/HenestrosaDev/audiotext/issues/new/choose"> Request Feature </a> · <a href="https://github.com/HenestrosaDev/audiotext/discussions"> Ask Question </a> </p> </div> <!-- TABLE OF CONTENTS -->

Table of Contents

<!-- ABOUT THE PROJECT -->

About the Project

Main

Audiotext transcribes the audio from an audio file, video file, microphone input, directory, or YouTube video into any of the 99 different languages it supports. You can transcribe using the Google Speech-to-Text API, the Whisper API, or WhisperX. The last two methods can even translate the transcription or generate subtitles!

You can also choose the theme you like best. It can be dark, light, or the one configured in the system.

<details> <summary>Dark</summary> <img src="docs/dark/from-file.png" alt="Dark theme"> </details> <details> <summary>Light</summary> <img src="docs/light/from-file.png" alt="Light theme"> </details> <!-- SUPPORTED LANGUAGES -->

Supported Languages

<details> <summary>Click here to display</summary>
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Assamese
  • Azerbaijan
  • Bashkir
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Breton
  • Bulgarian
  • Burmese
  • Catalan
  • Chinese
  • Chinese (Yue)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Faroese
  • Farsi
  • Finnish
  • French
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Indonesian
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Lao
  • Latin
  • Latvian
  • Lingala
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Nepali
  • Norwegian
  • Norwegian Nynorsk
  • Occitan
  • Pashto
  • Polish
  • Português
  • Punjabi
  • Romanian
  • Russian
  • Sanskrit
  • Serbian
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tagalog
  • Tajik
  • Tamil
  • Tatar
  • Telugu
  • Thai
  • Tibetan
  • Turkish
  • Turkmen
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Yiddish
  • Yoruba
</details>

Supported File Types

<details> <summary>Audio file formats</summary>
  • .aac
  • .flac
  • .mp3
  • .mpeg
  • .oga
  • .ogg
  • .opus
  • .wav
  • .wma
</details> <details> <summary>Video file formats</summary>
  • .3g2
  • .3gp2
  • .3gp
  • .3gpp2
  • .3gpp
  • .asf
  • .avi
  • .f4a
  • .f4b
  • .f4v
  • .flv
  • .m4a
  • .m4b
  • .m4r
  • .m4v
  • .mkv
  • .mov
  • .mp4
  • .ogv
  • .ogx
  • .webm
  • .wmv
</details> <!-- PROJECT STRUCTURE -->

Project Structure

<details> <summary>ASCII folder structure</summary>
│   .gitignore
│   audiotext.spec
│   LICENSE
│   README.md
│   requirements.txt
│
├───.github
│   │   CONTRIBUTING.md
│   │   FUNDING.yml
│   │
│   ├───ISSUE_TEMPLATE
│   │       bug_report_template.md
│   │       feature_request_template.md
│   │
│   └───PULL_REQUEST_TEMPLATE
│           pull_request_template.md
│
├───docs/
│
├───res
│   ├───img
│   │       icon.ico
│   │
│   └───locales
│       │   main_controller.pot
│       │   main_window.pot
│       │
│       ├───en
│       │   └───LC_MESSAGES
│       │           app.mo
│       │           app.po
│       │           main_controller.po
│       │           main_window.po
│       │
│       └───es
│           └───LC_MESSAGES
│                   app.mo
│                   app.po
│                   main_controller.po
│                   main_window.po
│
└───src
    │   app.py
    │
    ├───controllers
    │       __init__.py
    │       main_controller.py
    │
    ├───handlers
    │       file_handler.py
    │       google_api_handler.py
    │       openai_api_handler.py
    │       whisperx_handler.py
    │       youtube_handler.py
    │
    ├───interfaces
    │       transcribable.py
    │
    ├───models
    │   │   __init__.py
    │   │   transcription.py
    │   │
    │   └───config
    │           __init__.py
    │           config_subtitles.py
    │           config_system.py
    │           config_transcription.py
    │           config_whisper_api.py
    │           config_whisperx.py
    │
    ├───utils
    │       __init__.py
    │       audio_utils.py
    │       config_manager.py
    │       constants.py
    │       dict_utils.py
    │       enums.py
    │       env_keys.py
    │       path_helper.py
    │
    └───views
        │   __init__.py
        │   main_window.py
        │
        └───custom_widgets
                __init__.py
                ctk_scrollable_dropdown/
                ctk_input_dialog.py
</details> <!-- BUILT WITH -->

Built With

  • CTkScrollableDropdown for the scrollable option menu to display the full list of supported languages.
  • CustomTkinter for the GUI.
  • moviepy for video processing, from which the program extracts the audio to be t
View on GitHub
GitHub Stars344
CategoryContent
Updated1d ago
Forks32

Languages

Python

Security Score

85/100

Audited on Apr 6, 2026

No findings