SpeechToSpectogram
ECSE 523 Project. Records audio, turns into a spectogram and can perform ASR on it as well.
Install / Use
/learn @ZoredA/SpeechToSpectogramREADME
SpeechToSpectogram
Course Project. Records audio, turns into a spectogram and can perform ASR on it as well.
Installation
Unfortunately, first time installation is a little bit of a hassle.
The Easy Way:
-Install Miniconda or Anaconda, but mini is smaller and we can add stuff as we need it.
Be sure to select Python 3.5 64bit installer. During the installation make sure to let it add conda to your path (should be checked by default).
Open up a windows command line, and create an environment:
conda create -n py3k python
It will ask you if you are okay with installing a few things. Type in y.
Activate the environment:
activate py3k
Install numpy
conda install numpy
Install matplotlib
conda install matplotlib
Install pyaudio
pip install pyaudio
Install Speech Recognition library.
pip install SpeechRecognition
Now simply navigate to whereever you downloaded SpeechToSpectagram (either the Zip from github or a git clone) and type:
python main.py
The harder way
-Install Python 3.4+ (Can be found here: https://www.python.org/downloads/)
-Install pyaudio, numpy, matplotlib and speech recognition
Commands to do so:
pip install pyaudio
pip install SpeechRecognition
pip does not work so well for numpy and possibly Matplotlib as well. You can try (this probably works on Linux):
pip install numpy
pip install matplotlib
but at least for numpy, you'll probably have to work a bit harder. The website has a few instructions. SourceForge has an older version with an installer, but I couldn't get the installer to work. You might have better luck if your Python install is properly present in the Windows registry.
Then, download the zip or git clone, extract it. Go into the directory and python main.py
Usage
python main.py should open up the main settings window.

This window lets you tune the basic parameters of the spectogram as well as decide on what you wish to show. If you haven't run the program before, you need to create a recording. Press the record button to record some audio. While the button is depressed, your microphone is on and recording. The duration of the recording is determined by the recording length input.
After the button returns to normal, you can press the Generate button to create plots. These plots are made from a wav file called output.wav in the tmp directory.
Note: Reference for the spectogram colormaps can be found here. At this point in time, revese color maps are not supported.
If you check the Use ASR button, the program will pass the audio recording to a speech recognition service (currently only supports Google) and display the result below the spectogram. Note that this is a slightly time consuming network call, so the program will hang for a few seconds while it completes.

Note that more than one possibility is displayed (mostly for educational reasons really) and a confidence threshold, as determined by the speech recognization service is also shown. When the confidence value is low, the number is not displayed because the service doesn't send one.
If you select Create freq grid, a 3x3 grid of 9 plots showing the frequency spectrum at different time intervals will also be displayed. Note that the y-axis is 10 log of the amplitude, i.e. it is traditional decibels (and not the EE 20log). The maximum frequency of each of the plots in the grid is determined by the Max Freq parameter (this also controls the max Y-axis frequency of the spectogram).

If you select Create time grid, a 3x3 grid of 9 plots showing time vs amplitude (raw signal, not log) will be displayed.

Each plot comes with a fair few Matplotlib controls in the bottom left corner. You can use these to zoom in and adjust the plots a little to your liking.
Pressing the Close button on the original window will close all the plots.
Related Skills
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
47PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
