Audiolizr

A bentoML-powered API to transcribe audio and make sense of it

Generate Convert Improve

Install / Use

/learn @ahmedbesbes/Audiolizr

About this skill

Quality Score

0/100

README

audiolizr

Audiolizr (audio and analyzer) is an API built and deployed with BentoML to transcribe Youtube videos and extract the following metadata:

keywords and topics using the Yake algorithm
a generated summary using the T5 Transformer model
named entities (people's name, locations, products, organizations, etc.) using spaCy

This API can be used to provide a summary and additional information to understand any youtube video or audio content.

This service is deployed on AWS EC2 on a GPU-powered g4dn.xlarge instance. (see deployment section for details)

Demo

I've used audiolizr to process this (interesting) TEDx short video

Here a demo of audiolizr in Streamlit.

https://user-images.githubusercontent.com/6267065/208546956-a9e97789-2933-40b8-8081-db585b5f5c4d.mov

As shown in the following diagram, the video moves through different runners to

download the audio
transcribe the audio into text
extract keywords
extract named entities
summarize the text

note: runners 1 and 2 are executed sequentially and runners 3, 4 and 5 are executed concurrently

Here's the JSON output that you'd get at the end of the pipeline:

{
  "transcript": "How much do you get paid? Don't answer that out loud. But put a number in your head. Now, how much do you think the person sitting next to you gets paid? It turns out that pay transparency, sharing salaries openly across a company, makes for a better workplace for both the employee and for the organization. You see, keeping salary secret leads to what economists call information asymmetry. This is a situation where in a negotiation, one party has loads more information than the other. And in hiring or promotion or annual raise discussions, an employer can use that secrecy to save a lot of money. Imagine how much better you could negotiate for a raise if you knew everybody's salary. Now, I realized that letting people know what you make might feel uncomfortable, but isn't it less uncomfortable than always wondering if you're being discriminated against, or if your wife or your daughter or your sister is being paid unfairly? Openness remains the best way to ensure fairness. And pay transparency does that.",
  "metadata": {
    "keywords": [
      [
        "pay transparency",
        0.19212871128874295
      ],
      [
        "information",
        0.27056834789491807
      ],
      [
        "salary",
        0.27854871121244107
      ],
      [
        "raise",
        0.2849695469393418
      ],
      [
        "uncomfortable",
        0.2997431150212997
      ],
      [
        "paid unfairly",
        0.3407466612452491
      ],
      [
        "information asymmetry",
        0.4356297000199267
      ],
      [
        "call information asymmetry",
        0.4715329551423117
      ],
      [
        "sharing salaries openly",
        0.47429797071055785
      ],
      [
        "call information",
        0.49492941874515994
      ]
    ],
    "entities": [
      {
        "entity_text": "one",
        "entity_label": "CARDINAL",
        "start": 438,
        "end": 441
      },
      {
        "entity_text": "annual",
        "entity_label": "DATE",
        "start": 521,
        "end": 527
      }
    ],
    "summary": "If you know everybody's salary, you can save a lot of money. And in hiring or promotion or annual raise discussions, an employer can use that secrecy to save money. Pay transparency, sharing salaries openly across companies, makes for better workplaces for both the employee and for the organization. I realized that letting people know what you make might feel uncomfortable, but isn't it less uncomfortable than always wondering whether your wife or your daughter is being paid unfairly?"
  }
}

Dependencies

Run locally

Run the following commands to start a fresh environment with the needed dependencies:

cd audiolizr/
pipenv install 
pipenv shell

# install whisper with pip
pip install git+https://github.com/openai/whisper.git
# install spacy language model
python -m spacy download en_core_web_md

To serve the API locally, run the following command.

cd src/
bentoml serve service:svc --reload

To serve the API in production mode (and enable multiple api workers), run the following command (keep --api-workers low to avoid hammering the RAM)

cd src/
bentoml serve service:svc --production --api-workers 2

If everything works as expected, build the bento to prepare the deployment:

cd src/
bentoml build

Here's what you'll see when it's done:

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/ahmedbesbes/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Building BentoML service "speech_to_text_pipeline:m57a6etzlg4imhqa" from build context "/Users/ahmedbesbes/Documents/perso/whisper/src".

██████╗░███████╗███╗░░██╗████████╗░█████╗░███╗░░░███╗██╗░░░░░
██╔══██╗██╔════╝████╗░██║╚══██╔══╝██╔══██╗████╗░████║██║░░░░░
██████╦╝█████╗░░██╔██╗██║░░░██║░░░██║░░██║██╔████╔██║██║░░░░░
██╔══██╗██╔══╝░░██║╚████║░░░██║░░░██║░░██║██║╚██╔╝██║██║░░░░░
██████╦╝███████╗██║░╚███║░░░██║░░░╚█████╔╝██║░╚═╝░██║███████╗
╚═════╝░╚══════╝╚═╝░░╚══╝░░░╚═╝░░░░╚════╝░╚═╝░░░░░╚═╝╚══════╝

Successfully built Bento(tag="speech_to_text_pipeline:m57a6etzlg4imhqa").

When a bento is created, build a Docker image from it with this command:

bentoml containerize speech_to_text_pipeline:m57a6etzlg4imhqa

This will run multiple steps to build the docker image:

Building OCI-compliant image for speech_to_text_pipeline:m57a6etzlg4imhqa with docker

[+] Building 30.7s (20/20) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 2.52kB                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                            0.0s
 => => transferring context: 2B                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04                                                                                     1.1s
 => [base-container  1/15] FROM docker.io/nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04@sha256:812fe80b7123467f5d6c746bd5d7cbd3b96f385c3c6a57a532b21617ad433858              0.0s
 => [internal] load build context                                                                                                                                            0.0s
 => => transferring context: 24.82kB                                                                                                                                         0.0s
 => CACHED [base-container  2/15] RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache     0.0s
 => CACHED [base-container  3/15] RUN --mount=type=cache,target=/var/lib/apt --mount=type=cache,target=/var/cache/apt set -eux &&     apt-get update -y &&     apt-get inst  0.0s
 => CACHED [base-container  4/15] RUN --mount=type=cache,target=/var/lib/apt --mount=type=cache,target=/var/cache/apt     set -eux &&     apt-get install -y --no-install-r  0.0s
 => CACHED [base-container  5/15] RUN ln -sf /usr/bin/python3.9 /usr/bin/python3 &&     ln -sf /usr/bin/pip3.9 /usr/bin/pip3                                                 0.0s
 => CACHED [base-container  6/15] RUN curl -O https://bootstrap.pypa.io/get-pip.py &&     python3 get-pip.py &&     rm -rf get-pip.py                                        0.0s
 => CACHED [base-container  7/15] RUN groupadd -g 1034 -o bentoml && useradd -m -u 1034 -g 1034 -o -r bentoml                                                                0.0s
 => CACHED [base-container  8/15] RUN mkdir /home/bentoml/bento && chown bentoml:bentoml /home/bentoml/bento -R                                                              0.0s
 => CACHED [base-container  9/15] WORKDIR /home/bentoml/bento                                                                                                                0.0s
 => [base-container 10/15] COPY --chown=bentoml:bentoml . ./                                                                                                                 0.0s
 => [base-container 11/15] RUN --mount=type=cache,target=/root/.cache/pip bash -euxo pipefail /home/bentoml/bento/env/python/install.sh                                     20.6s
 => [base-container 12/15] RUN chmod +x /home/bentoml/bento/env/docker/setup_script                                                                                          0.2s
 => [base-container 13/15] RUN /home/bentoml/bento/env/docker/setup_script                                                                                                   6.1s
 => [base-container 14/15] RUN rm -rf /var/lib/{apt,cache,log}                                                                                                               0.2s
 => [base-container 15/15] RUN chmod +x /home/bentoml/bento/env/docker/entrypoint.sh

Related Skills

node-connect

342.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

prose

342.0k

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

frontend-design

84.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).