DiarizationVisualization
Visualization tools for audio-only and multi-modal speaker diarization dataset
Install / Use
/learn @liutaocode/DiarizationVisualizationREADME
Visualization Tools for Speaker Diarization
Introduction
The current landscape lacks a robust tool for diarization visualization, which is critical for the analysis of datasets and algorithm outcomes. In this repository, we offer intuitive methods to illustrate speaker diarization results. A pivotal criterion for selecting this visualization software was its capacity for interactive operation. While these visualization tools have room for improvement, they are the best available options at present.
Go to: Visualization tool for Audio-only datasets
Go to: Visualization tool for Audio-visual datasets
<p id="anchor_ao"></p>Visualization for Audio-only datasets
Step 1: Generating praat format:
python audio_visualized.py -rttm audio_cases/afjiv.rttm -audio_path audio_cases/afjiv.wav -praat_result audio_cases/afjiv.txt
rttm--- the reference or system rttmaudio_path--- the audio pathpraat_result--- visualized result for praat software
(Example is from VoxConverse)
Step 2: Import praat_result into Praat:
- Install Praat Mac or Windows
- import
praat_resultinto Praat- Open
praat_resultandaudio - <img src='imgs/praat_import.png' width=50% />
- Select them all
- Click
View & Edit
- Open
Step3: Overview

You can slide with a horizontal scroll. Speaker labels are shown in each timeline (e.g., spk00, spk01 ...).
Some useful shortcuts:
CMD + A: Show all utterances in one screen.CMD + N: Dive into selected areas.
Visualization for Audio-visual datasets
Step 1: Generating VIA format
python audio_visual_visualized.py -rttm audio_visual_cases/00115.rttm -mp4_path audio_visual_cases/00115.rttm -via_json_result audio_visual_cases/00115.json
rttm--- the reference or system rttmmp4_path--- the mp4 pathvia_json_result--- visualized result for VIA software
(Example is from MSDWild)
If the video cannot be previewed or quickly previewed, please try to convert them to support the specific mp4 format of HTML5.
ffmpeg -i original.mp4 -vcodec libx264 -acodec aac -preset fast -movflags +faststart previewed.mp4
Step 2: Import via_format.json into VIA tools
- Download
via_video_annotator.htmlfrom URL or directly use a online demo. This website is an offline client, and we have tested on versionvia-3.0.11(see file:via_video_annotator_3.0.11.htmlin this repo). - Import JSON by clicking the
folder buttonas follows:<img src='imgs/via_import.png' width=90% /> - You can also modify the script to support online URLs from OSS (Object Storage Service).
Step3: Overview

You can use the Space key to control Play/Pause Media.
More keys can be found on:
<img src='imgs/via_shortcut.png' width=20% />References
- https://www.fon.hum.uva.nl/praat/
- https://www.robots.ox.ac.uk/~vgg/software/via/
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
