formantanalyzer.js

npm GitHub branch checks state NPM

A JS Web API based spectrum analyzer for speech and music analysis. It can be used for labeling or feature extraction.

Preview

Demo

Bare minimum starter file: tabahi.github.io/formantanalyzer.js

Related projects:

Installation

Using JS Script

Load the javascript module as

<script src="https://unpkg.com/formantanalyzer@1.1.8/index.js"></script>

Then use the entry point FormantAnalyzer to use the imported javascript library.

Using Webpack Library

Prerequisites: Node.js, check versions >12 for node -v and >6 for npm -v.

For importing the JS package in webpack project: Install formantanalyzer using npm

npm i formantanalyzer
:: or
npm install formantanalyzer --save-dev

const FormantAnalyzer = require('formantanalyzer'); //after npm installation
FormantAnalyzer.configure(launch_config);
FormantAnalyzer.LaunchAudioNodes(2, webAudioElement, call_backed_function, ['file_label'], false, false);

If you are creating a webpack project from scratch by just using JS files from ./src then first initialize a new webpack project by npx webpack and create a new webconfig.js in project root directory. See instructions here.

Usage

First, configure the fomant analyzer. Pass the #SpectrumCanvas element if plot is enabled. Pass null if no need for plot. See 'index.html' for a simple example.

HTML:

<div id="canvas_div">
    <canvas id="SpectrumCanvas" width="1200" height="300" ></canvas>
</div>

In javascript:

/*Using <script src="https://unpkg.com/formantanalyzer@1.1.8/index.js"></script>
Can also import in webpack as:
const FormantAnalyzer = require('formantanalyzer');
*/

function Configure_FormantAnalyzer()
{
    const BOX_HEIGHT = 300;
    const BOX_WIDTH = window.screen.availWidth - 50;
    document.getElementById('SpectrumCanvas').width = BOX_WIDTH;    //reset the size of canvas element
    document.getElementById('SpectrumCanvas').height = BOX_HEIGHT;
    
    let launch_config = { plot_enable: true,
    spec_type: 1,       //see below
    output_level: 2, //see below
    plot_len: 200, f_min: 50, f_max: 4000,
    N_fft_bins: 256,
    N_mel_bins: 128,
    window_width: 25, window_step: 15,
    pause_length: 200, min_seg_length: 50,
    auto_noise_gate: true, voiced_min_dB: 10, voiced_max_dB: 100,
    plot_lag: 1, pre_norm_gain: 1000, high_f_emph: 0.0,
    plot_canvas: document.querySelector('#SpectrumCanvas').getContext('2d'),
    canvas_width: BOX_WIDTH,
    canvas_height: BOX_HEIGHT };

    FormantAnalyzer.configure(launch_config);
}

Initialize an Audio Element, or local audio file binary element, or null if using the mic stream. Then pass it to LaunchAudioNodes with suitable parameters. See 'index.html' for examples for local audio file and mic streaming.


var webAudioElement = new Audio("./audio_file.mp3");
/*Parameters:*/
const context_source = 2;   //1: Local file binary, 2: play from a web Audio, 3: mic
const test_mode = true; //plots only, it does not return callback
const offline = false;  //play on speakers, set true to play silently
const file_labels =[];     //array of labels that will be passed to callback after feature extraction

/* Wait for audio file to load */
webAudioElement.addEventListener("canplaythrough", event => {
    /* Launch Audio Nodes */
    FormantAnalyzer.LaunchAudioNodes(context_source, webAudioElement,
                    callback, file_labels, offline, test_mode).then(function()
        {
            console.log("Done");
        }).catch((err)=>{
            console.log(err);
        });
});

callback(seg_index, file_labels, seg_time, features)
{
//callback function to which extracted features are passed
}

`LaunchAudioNodes()`

Returns: This function returns a promise as resolve(true) after playback is finished or reject(err) if there is an error. If you want an abrupt stop, then call the FormantAnalyzer.stop_playing("no reason") function. Then this function will return resolve("no reason"). Different audio contexts are buffered/streamed differently, therefore each has a separate function in AudioNodes.js.

Parameters:

context_source (int):

1 --- Play from a locally loaded file (pass an audio binary as source_obj).
2 --- Play from an Audio element (pass an Audio object as source_obj)
3 --- Stream from mic

source_obj (object): Source audio object.

If context_source==1 (playing from a local file) then pass a binary of file. Get binary from FileReader as: FileReader.onload (e)=>(binary = e.target.result). See index.html.
If context_source==2 (playing from a web address) then pass an Audio object.
If context_source==3 (playing from mic Pass null). See index.html.

callback: It is the callback function to be called after each segment ends. It should accept 4 variables; segment_index, segment_time_array, segment_labels_array, segment_features_array. Callback is called asynchronously, so there might be a latency between audio play and it's respective callback, that's why it's important to send the labels to async segmentor function.

file_labels (Array): It is an array of labels for currently playing file. It is returned as it is to the callback function. It is used to avoid the label mismatch during slow async processing in case if a new file is playing, but the callback sends the output of the previous one. Sometimes callback is called with a delay of 2 seconds, so it helps to keep track which file was playing 2 seconds ago. e.g., file_labels=['filename.wav', 'Angry']

offline_mode (boolean): If true then the locally loaded files will be played silently in an offline buffer.

test_play (boolean) set it true to avoid calling the callback. Plots and AudioNodes will still work as it is, but there will be no call backs. It can be enabled to test plotting or re listening.

play_offset and play_duration are in seconds to play a certain part of the file, otherwise pass null.

As an example, callback function in WebSpeechAnalyzer app looks like this:


async function callback(seg_index, seg_label, seg_time, features)
{
    if(launch_config.output_level == 13) //Syllable 53x statistical features for each segment
    {
        if(settings.collect)
        for(let segment = 0; segment < features.length; segment++ ) 
        {
            storage_mod.StoreFeatures(launch_config.output_level, settings.DB_ID, (seg_index + (segment/100)), seg_label, seg_time[segment], features[segment]);
        }
        
        if(settings.plot_enable && settings.predict_en)
        {
            pred_mod.predict_by_multiple_syllables(settings.predict_type, settings.predict_label, seg_index, features, seg_time);
        }
    }
}

Different types of features are returned to the callback(seg_index, seg_label, seg_time, features) at different output levels. Variables for callback:

seg_index the index of segment since the play start. Segments are separated by significant pauses, the first segment of each file starts has seg_index = 0.
seg_label is the same as file_labels passed to LaunchAudioNodes().
seg_time is an array of two elements [start_time, total_duration] at segment level. At syllable level, it is a 2D array of shape (syllables, 2), giving [start_time, total_duration] for each syllable separately.
features is the array of extracted features that is of different shape at different output_level. Detailed descriptions for each level are given below.

`configure()`

Before playing any audio source using LaunchAudioNodes(), the audio nodes must be configured otherwise, default launch_config settings will be assumed.

let launch_config = { plot_enable: true,
    spec_type: 1,
    output_level: 4,
    plot_len: 200,
    f_min: 50,
    f_max: 4000,
    N_fft_bins: 256,
    N_mel_bins: 128,
    window_width: 25,
    window_step: 15,
    pause_length: 200,
    min_seg_length: 50,
    auto_noise_gate: true,
    voiced_min_dB: 10,
    voiced_max_dB: 100,
    plot_lag: 1,
    pre_norm_gain: 1000,
    high_f_emph: 0.0,
    plot_canvas: document.querySelector('#SpectrumCanvas').getContext('2d'),
    canvas_width: 900,
    canvas_height: 300 };

    FormantAnalyzer.configure(launch_config);

Available spec_type options:

1 = Mel-spectrum
2 = Power Spectrum
3 = Discrete FFT

Available output_level options:

1 = Bars (no spectrum, only the last filter bank)
2 = Spectrum
3 = Segments
4 = Segment Formants / segment
5 = Segment Features 53x / segment [ML]
10 = Syllable Formants / syllable
11 = Distributions 264x / file [ML]
12 = Syllable Curves 23x / syllable [ML]
13 = Syllable Features 53x / syllable [ML]

Output Level Descriptions:

Different features are returned to the callback(seg_index, seg_label, seg_time, features) at different output levels. The shape of seg_time also differs from (2) to (syllable, 2) for segment vs syllable. The shape of `feature

Formantanalyzer.js

Install / Use

README