ViSQOL

ViSQOL (Virtual Speech Quality Objective Listener) is an objective, full-reference metric for perceived audio quality. It uses a spectro-temporal measure of similarity between a reference and a test speech signal to produce a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score. MOS-LQO scores range from 1 (the worst) to 5 (the best).

Features
Build
Command Line Usage
API Usage
Dependencies
License
Papers
FAQ
Acknowledgement

Guidelines

ViSQOL can be run from the command line, or integrated into a project and used through its C++ or Python APIs. Whether being used from the command line, or used through the API, ViSQOL is capable of running in two modes:

Audio Mode:

When running in audio mode, input signals must have a 48kHz sample rate. Input should be resampled to 48kHz.
Input signals can be multi-channel, but they will be down-mixed to mono for performing the comparison.
Audio mode uses support vector regression, with the maximum range at ~4.75.

Speech Mode:

When running in speech mode, ViSQOL uses a wideband model. It therefore expects input sample rates of 16kHz. Input should be resampled to 16kHz.
As part of the speech mode processing, a root mean square implementation for voice activity detection is performed on the reference signal to determine what parts of the signal have voice activity and should therefore be included in the comparison. The signal is normalized before performing the voice activity detection.
Input signals can be multi-channel, but they will be down-mixed to mono for performing the comparison.
Speech mode is scaled to have a maximum MOS of 5.0 to match previous version behavior.

General guidelines for input

ViSQOL was trained with data from subjective tests that roughly follow industry standards, such as ITU-T Rec. P.863. As a result certain assumptions are made, and your input to ViSQOL should probably have these properties:

The input audio files should be approximately 8-10 seconds, with not too much silence inside of them and around 0.5s of silence around the audible part.
When comparing audio from different sources, be aware of sample rate on the files. If you compare the result from a 16kHz file and a 48kHz file with very similar content, the scores can be quite different.
The reference audio is clean and equal or higher quality than the degraded audio.
ITU-T P.800 has describes a standard listening test to measure MOS. It has various recommendations about the audio and environment that may be useful as a reference.

General guidelines for interpreting the output

Single scores are not very meaningful. Rather, treatments should be aggregated over several samples that have the same treatment.
The choice of audio mode vs speech mode can have large effects on the output.

Build

Linux/Mac Build Instructions

Install Bazel

Bazel can be installed following the instructions for Linux or Mac.
Tested with Bazel version 5.1.0.

Install Numpy

Can be installed with pip install numpy

Build ViSQOL

Change directory to the root of the ViSQOL project (i.e. where the WORKSPACE file is) and run the following command: bazel build :visqol -c opt

Windows Build Instructions (Experimental, last tested on Windows 10 x64, 2020 August)

Install Bazel

Bazel can be installed for Windows from here.
Tested with Bazel version 5.1.0.

Install git

git for Windows can be obtained from the official git website.
When installing, select the option that allows git to be accessed from the system shells.

Install Tensorflow dependencies

Follow the instructions detailed here to install tensorflow build dependencies for windows.

Build ViSQOL:

Change directory to the root of the ViSQOL project (i.e. where the WORKSPACE file is) and run the following command: bazel build :visqol -c opt

Command Line Usage

Note Regarding Usage

When run from the command line, input signals must be in WAV format.

Flags

--reference_file

The 48k sample rate WAV file used as the reference audio.

--degraded_file

The 48k sample rate WAV file that will be compared to the reference audio.

--batch_input_csv

Used to specify a path to a CSV file with the format:

reference,degraded ref1.wav,deg1.wav ref2.wav,deg2.wav
If the batch_input_csv flag is used, the reference_file and degraded_file flags will be ignored.

--results_csv

Used to specify a path that the similarity score results will be output to. This will be a CSV file with the format:

reference,degraded,moslqo ref1.wav,deg1.wav,3.4 ref2.wav,deg2.wav,4.1

--verbose

The reference file path, degraded file path and the MOS-LQO values will be output to the console after the MOS-LQO has been calculated, along with similarity scores on a per-patch and per-frequency band basis.

--output_debug

Used to specify a file path where output debug information will be written to. This debug info contains the full details of the comparison between the reference and degraded audio signals and is in JSON format. The file does not need to previously exist. Contents will be appended to the file if it does already exist or if ViSQOL is run in batch mode.

--similarity_to_quality_model

The lattice or libsvm model to use during comparison. Use this only if you want to explicitly specify the model file location, otherwise the default model will be used.

--use_speech_mode

Use a wideband model (sensitive up to 8kHz) with voice activity detection that normalizes the polynomial NSIM->MOS mapping so that a perfect NSIM score of 1.0 translates to 5.0.

--use_unscaled_speech_mos_mapping

When used in conjunction with --use_speech_mode, this flag will prevent a perfect NSIM score of 1.0 being translated to a MOS score of 5.0. Perfect NSIM scores will instead result in MOS scores of ~4.x.

--use_lattice_model

(default: true) Use a deep lattice network model to map similarity to quality. This produces more accurate results for speech (audio mode is not yet supported).

Example Command Line Usage

To compare two files and output their similarity to the console:

Linux/Mac:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --verbose

Windows:

bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --verbose

To compare all reference-degraded file pairs in a CSV file, outputting the results to another file and also outputting additional "debug" information:

Linux/Mac:

./bazel-bin/visqol --batch_input_csv input.csv --results_csv results.csv --output_debug debug.json

Windows:

bazel-bin\visqol.exe --batch_input_csv "input.csv" --results_csv "results.csv" --output_debug "debug.json"

To compare two files using scaled speech mode and output their similarity to the console:

Linux/Mac:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --use_speech_mode --verbose

Windows:

bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --use_speech_mode --verbose

To compare two files using unscaled speech mode and output their similarity to the console:

Linux/Mac:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --use_speech_mode --use_unscaled_speech_mos_mapping --verbose

Windows:

bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --use_speech_mode --use_unscaled_speech_mos_mapping --verbose

C++ API Usage

ViSQOL Integration

To integrate ViSQOL with your Bazel project:

Add ViSQOL to your WORKSPACE file as a local_repository:

local_repository (
    name = "visqol",
    path = "/path/to/visqol",
)

Then in your project's BUILD file, add the ViSQOL library as a dependency to your binary/library dependency list:
```
deps = ["@visqol//:visqol_lib"],
```
Note that Bazel does not currently resolve transitive dependencies (see issue #2391). As a workaround, it is required that you copy the contents of the ViSQOL WORKSPACE file to your own project's WORKSPACE file until this is resolved.

Sample Program

int main(int argc, char **argv) {

  // Create an instance of the ViSQOL API configuration class.
  Visqol::VisqolConfig config;

  // Set the sample rate of the signals that are to be compared.
  // Both signals must have the same sample rate.
  config.mutable_audio()->set_sample_rate(48000);

  // When running in audio mode, sample rates of 48k is recommended for the input signals.
  // Using non-48k input will very likely negatively affect the comparison result.
  // If, however, API users wish to run with non-48k input, set this to true.
  config.mutable_options()->set_allow_unsupported_sample_rates(false);

  // Optionally, set the location of the model file to use.
  // If not set, the default model file will be used.
  config.mutable_options()->set_model_path("visqol/model/libsvm_nu_svr_model.txt");

  // ViSQOL will run in audio mode comparison by default.
  // If speech mode comparison is desired, set to true.
  config.mutable_options()->set_use_speech_scoring(false);

  // Speech mode will scale the MOS mapping by default. This means that a
  // perfect NSIM score of 1.0 will be mapped to a perfect MOS-LQO of 5.0.
  // Set to true to use unscaled speech mode. This means that a perfect
  // NSIM score will i

Visqol

Install / Use

README

ViSQOL

Table of Contents

Guidelines

Audio Mode:

Speech Mode:

General guidelines for input

General guidelines for interpreting the output

Build

Linux/Mac Build Instructions

Install Bazel

Install Numpy

Build ViSQOL

Windows Build Instructions (Experimental, last tested on Windows 10 x64, 2020 August)

Install Bazel

Install git

Install Tensorflow dependencies

Build ViSQOL:

Command Line Usage

Note Regarding Usage

Flags

Example Command Line Usage

Linux/Mac:

Windows:

Linux/Mac:

Windows:

Linux/Mac:

Windows:

Linux/Mac:

Windows:

C++ API Usage

ViSQOL Integration

Sample Program