AudioSegment

Wrapper for pydub AudioSegment objects

Generate Convert Improve

Install / Use

/learn @MaxStrange/AudioSegment

About this skill

Quality Score

0/100

README

AudioSegment

Wrapper for pydub AudioSegment objects. An audiosegment.AudioSegment object wraps a pydub.AudioSegment object. Any methods or properties it has, this also has.

Docs are hosted by GitHub Pages, but are currently hideous. I've got to do something about them as soon as I find some time. You can also try Read The Docs, though the docs there don't seem to be building for some reason.... also something I need to look into. Up-to-date docs are also built and pushed and are in the docs folder of this repository.

Notes

There is a hidden dependency on the command line program 'sox'. Pip will not install it for you. You will have to install sox by:

Debian/Ubuntu: sudo apt-get install sox
Mac OS X: brew install sox
Windows: choco install sox

Also, I use librosa and scipy, for some of the functionality. These dependencies are hefty, and I have decided to make them optional. If you do not install them, you may get warnings when using audiosegment.

So, a full installation on Debian/Ubuntu would like like this:

sudo apt-get install sox
pip3 install --user audiosegment

# To get scipy, you will need some lapack/blas resources:
sudo apt-get install libatlas-base-dev gfortran
pip3 install --user scipy

# To get librosa, you will need numba, which requires LLVMlite, which requires LLVM.
sudo apt-get install llvm
pip3 install --user librosa

Make suitable adjustments to fit your own OS's package management system.

TODO

The following is the list of items I plan on implementing.

Finish implementing auditory scene analysis (a.k.a blind source separation)
Add voice-pass filtering and make voice activity detection better
Add language classification for English and Chinese (and show how to do it for other languages)
Add more examples to README (especially filterbank)
Finish removing the SOX dependency

I am open to other suggestions. Open an issue if you have requests, or better yet, if you can do it yourself and open a pull request, I'll take a look and merge in if I think it makes sense.

Example Usage

Basic information

import audiosegment

print("Reading in the wave file...")
seg = audiosegment.from_file("whatever.wav")

print("Information:")
print("Channels:", seg.channels)
print("Bits per sample:", seg.sample_width * 8)
print("Sampling frequency:", seg.frame_rate)
print("Length:", seg.duration_seconds, "seconds")

Voice Detection

# ...
print("Detecting voice...")
seg = seg.resample(sample_rate_Hz=32000, sample_width=2, channels=1)
results = seg.detect_voice()
voiced = [tup[1] for tup in results if tup[0] == 'v']
unvoiced = [tup[1] for tup in results if tup[0] == 'u']

print("Reducing voiced segments to a single wav file 'voiced.wav'")
voiced_segment = voiced[0].reduce(voiced[1:])
voiced_segment.export("voiced.wav", format="WAV")

print("Reducing unvoiced segments to a single wav file 'unvoiced.wav'")
unvoiced_segment = unvoiced[0].reduce(unvoiced[1:])
unvoiced_segment.export("unvoiced.wav", format="WAV")

Silence Removal

import matplotlib.pyplot as plt

# ...
print("Plotting before silence...")
plt.subplot(211)
plt.title("Before Silence Removal")
plt.plot(seg.get_array_of_samples())

seg = seg.filter_silence(duration_s=0.2, threshold_percentage=5.0)
outname_silence = "nosilence.wav"
seg.export(outname_silence, format="wav")

print("Plotting after silence...")
plt.subplot(212)
plt.title("After Silence Removal")

plt.tight_layout()
plt.plot(seg.get_array_of_samples())
plt.show()

alt text

FFT

import matplotlib.pyplot as plt
import numpy as np

#...
# Do it just for the first 3 seconds of audio
hist_bins, hist_vals = seg[1:3000].fft()
hist_vals_real_normed = np.abs(hist_vals) / len(hist_vals)
plt.plot(hist_bins / 1000, hist_vals_real_normed)
plt.xlabel("kHz")
plt.ylabel("dB")
plt.show()

alt text

Spectrogram

import matplotlib.pyplot as plt

#...
freqs, times, amplitudes = seg.spectrogram(window_length_s=0.03, overlap=0.5)
amplitudes = 10 * np.log10(amplitudes + 1e-9)

# Plot
plt.pcolormesh(times, freqs, amplitudes)
plt.xlabel("Time in Seconds")
plt.ylabel("Frequency in Hz")
plt.show()

alt text

Related Skills

qqbot-channel

351.2k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

claude-opus-4-5-migration

110.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

docs-writer

100.5k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

351.2k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

MaxStrange

View profile

View on GitHub

GitHub Stars96

CategoryContent

Updated1y ago

Forks14

MaxStrange/AudioSegment

Languages

Python

Security Score

85/100

Audited on Dec 20, 2024

No findings