TranscriptionSuite

A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.

Generate Convert Improve

Install / Use

/learn @homelab-00/TranscriptionSuite

About this skill

Quality Score

0/100

README

<p align="left"> <img src="assets/logo_wide_readme.png" alt="TranscriptionSuite logo" width="680"> </p> <table width="100%"> <tr> <td valign="top"> <table> <tr> <td width="375px"> <pre> A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription. Electron dashboard + Python backend with multi-backend STT (Whisper, NVIDIA NeMo, VibeVoice-ASR), NVIDIA GPU acceleration or CPU mode. Dockerized for fast setup. </pre> </td> </tr> </table> </td> <td align="left" valign="top" width="280px"> <strong>OS Support:</strong><br> <img src="https://img.shields.io/badge/Linux-%23FCC624.svg?style=for-the-badge&logo=linux&logoColor=black" alt="Linux"> <img src="https://img.shields.io/badge/Windows%2011-%230078D4.svg?style=for-the-badge&logo=Windows%2011&logoColor=white" alt="Windows 11"><br> Experimental:<br> <img src="https://img.shields.io/badge/macOS-000000.svg?style=for-the-badge&logo=apple&logoColor=white" alt="macOS"><br><br> <strong>Hardware Acceleration:</strong><br> <img src="https://img.shields.io/badge/NVIDIA-Recommended-%2376B900.svg?style=for-the-badge&logo=nvidia&logoColor=white" alt="NVIDIA Recommended"><br> <img src="https://img.shields.io/badge/CPU-Supported-%230EA5E9.svg?style=for-the-badge" alt="CPU Supported"> </td> </tr> </table> <br> <div align="center">

Demo

https://github.com/user-attachments/assets/f63ee730-de9a-4a55-b0ab-e342b30905a4

</div>

1. Introduction
2. Installation
3. Remote Connection
- 3.1 Option A: Tailscale (recommended)
  - Server Machine Setup
- 3.2 Option B: LAN (same local network)
4. OpenAI-compatible API Endpoints
5. Outgoing Webhooks
6. Troubleshooting
7. Technical Info
8. License
9. State of the Project
- 9.1 In General & AI Disclosure
- 9.2 Contributing

1. Introduction

1.1 Features

100% Local: Everything runs on your own computer, the app doesn't need internet beyond the initial setup*
Multiple Models available: WhisperX (all three sizes of the faster-whisper models), NVIDIA NeMo Parakeet v3/Canary v2, and VibeVoice-ASR models are supported
Speaker Diarization: Speaker identification & diarization (subtitling) for all three model families; Whisper and Nemo use PyAnnote for diarization while VibeVoice does it by itself
Parallel Processing: If your VRAM budget allows it, transcribe & diarize a recording at the same time - speeding up processing time significantly
Truly Multilingual: Whisper supports 90+ languages; NeMo Parakeet/Canary support 25 European languages; VibeVoice supports 50 languages
Longform Transcription: Record as long as you want and have it transcribed in seconds; either using your mic or the system audio
Session File Import: Import existing audio files from the Session tab; transcription results are saved directly as .txt or .srt to a folder of your choice — no Notebook entry created
Live Mode: Real-time sentence-by-sentence transcription for continuous dictation workflows (Whisper-only currently)
Global Keyboard Shortcuts: System-wide shortcuts & paste-at-cursor functionality
Remote Access: Securely access your desktop at home running the model from anywhere (utilizing Tailscale) or share it on your local network via LAN
Audio Notebook: An Audio Notebook mode, with a calendar-based view, full-text search, and LM Studio integration (chat with the AI about your notes)

📌Half an hour of audio transcribed in under a minute with Whisper (RTX 3060)!

*All transcription processing runs entirely on your own computer — your audio never leaves your machine. Internet is only needed to download model weights on first use (STT models, PyAnnote diarization, and wav2vec2 alignment models); all weights are cached locally in a Docker volume and no further internet access is required after that.

1.2 Screenshots

| Session Tab | Notebook Tab | |:-----------:|:------------:| | | |

| Audio Note View | Server Tab | |:---------------:|:----------:| | | |

</div>

1.3 Short Tour

https://github.com/user-attachments/assets/688fd4b2-230b-4e2f-bfed-7f92aa769010

</div>

2. Installation

2.1 Prerequisites

To begin with, you need to install Docker (or Podman).

Both are supported; the dashboard and shell scripts auto-detect which runtime is available (Docker is checked first, then Podman).

Linux (Docker):

Install Docker Engine
- For Arch run sudo pacman -S --needed docker
- For other distros refer to the Docker documentation
Add your user to the docker group so the app can talk to Docker without sudo:
```
sudo usermod -aG docker $USER
```
Then log out and back in (or reboot) for the change to take effect.
Install NVIDIA Container Toolkit (for GPU mode)
- Refer to the NVIDIA documentation
- Not required if using CPU mode

Linux (Podman):

Install Podman (4.7+ required for podman compose support)
- For Arch run sudo pacman -S --needed podman
- For Fedora/RHEL: Podman is pre-installed
- For other distros refer to the Podman documentation
For GPU mode, configure CDI (Container Device Interface):
```
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```
- Requires nvidia-container-toolkit 1.14+
- Not required if using CPU mode

Windows:

Install Docker Desktop with WSL2 backen (during installation, if presented with the option, make sure the 'Use WSL 2 instead of Hyper-V' checkbox is enabled). After installation to make sure it's enabled, run wsl --list --verbose - if the number is 2, Docker is using the WSL 2 backend.
Install NVIDIA GPU driver with WSL support (standard NVIDIA gaming drivers work fine)
- Not required if using CPU mode

macOS:

Install Docker Desktop for Mac or Podman Desktop
GPU mode is not available on macOS — the server runs in CPU mode automatically

2.2 Download the Dashboard app

Before doing anything else, you need to download the Dashboard app for your platform from the Releases page. This is just the frontend, no models or packages are downloaded yet.

Linux and Windows builds are x64; macOS is arm64

Each release artifact includes an gpg signature by my key (.sig)

2.2.1 Linux AppImage Prerequisites

AppImages require FUSE 2 (libfuse.so.2), which is not installed by default on distros that ship with GNOME (both Fedora & Arch KDE worked fine out of the box). If you see dlopen(): error loading libfuse.so.2, install the appropriate package:

| Distribution | Package | Install Command | |---|---|---| | Ubuntu 22.04 / Debian | libfuse2 | sudo apt install libfuse2 | | Ubuntu 24.04+ | libfuse2t64 | sudo apt install libfuse2t64 | | Fedora | fuse-libs | sudo dnf install fuse-libs | | Arch Linux | fuse2 | sudo pacman -S fuse2 |

Sandbox note: The AppImage automatically disables Chromium's SUID sandbox (--no-sandbox) since the AppImage squashfs mount cannot satisfy its permission requirements. This is the standard approach for Electron-based AppImages and does not affect application security.

2.2.2 Verify Download with Kleopatra (optional)

Download both files from the same release:
- installer/app (.AppImage, .exe or .dmg)
- matching signature file (.sig)
Install Kleopatra: https://apps.kde.org/kleopatra/
Import the public key in Kleopatra from this repository:
- docs/assets/homelab-00_0xBFE4CC5D72020691_public.asc
In Kleopatra, use File -> Decrypt/Verify Files... and select the downloaded .asc signature.
If prompted, select the corresponding downloaded app file. Verification should report a valid signature.

2.3 Setting Up the Server

We're now ready to start the server. This process includes two parts: downloading the Docker image and starting a Docker container based off of that image.

Download the image: Using the Sidebar on the l

Related Skills

node-connect

336.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

prose

336.2k

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

frontend-design

82.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

sonoscli

336.2k

Control Sonos speakers (discover/status/play/volume/group).

homelab-00

View profile

View on GitHub

GitHub Stars306

CategoryDevelopment

Updated11h ago

Forks26

homelab-00/TranscriptionSuite

Languages

TypeScript

Security Score

100/100

Audited on Mar 25, 2026

No findings

TranscriptionSuite

Install / Use

README

Table of Contents

1. Introduction

1.1 Features

1.2 Screenshots

1.3 Short Tour

2. Installation

2.1 Prerequisites

2.2 Download the Dashboard app

2.2.1 Linux AppImage Prerequisites

2.2.2 Verify Download with Kleopatra (optional)

2.3 Setting Up the Server

Related Skills