OpenXBOW

openXBOW - the Passau Open-Source Crossmodal Bag-of-Words Toolkit

Generate Convert Improve

Install / Use

/learn @openXBOW/OpenXBOW

About this skill

Quality Score

0/100

README

openXBOW - the Passau Open-Source Crossmodal Bag-of-Words Toolkit

openXBOW generates a bag-of-words representation from a sequence of numeric and/or textual features, e.g., acoustic LLDs, visual features, and transcriptions of natural speech. The tool provides a multitude of options, e.g., different modes of vector quantisation, codebook generation, term frequency weighting and methods known from natural language processing.
Below, you find a tutorial that helps you to starting working with openXBOW.

The development of this toolkit has been supported by the European Union's Horizon 2020 Programme under grant agreement No. 645094 (IA SEWA) and the European Community's Seventh Framework Programme through the ERC Starting Grant No. 338164 (iHEARu).

<img src="http://sewaproject.eu/images/sewa-logo.png" alt="SEWA" width="200" /> <img src="http://www.schuller.it/iHEARu-logo.png" alt="iHEARu" width="320" /> Horizon2020

For more information, please visit the official websites:
http://sewaproject.eu
http://ihearu.eu

(C) 2016-2021, published under GPL v3, please check the file LICENSE.txt for details. Maximilian Schmitt, Björn Schuller: University of Passau, University of Augsburg.
Contact: maximilian.schmitt@mailbox.org

Citing

If you use openXBOW or any code from openXBOW in your research work, you are kindly asked to acknowledge the use of openXBOW in your publications.

http://www.jmlr.org/papers/v18/17-113.html
Maximilian Schmitt and Björn Schuller: "openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit", The Journal of Machine Learning Research, Volume 18, No. 96, pp. 1-5, October 2017.

Tutorial

In this tutorial, you will learn to start using openXBOW for your reasearch.
openXBOW is written in Java. You can either generate the jar file on your own or use the precompiled version directly.
To display a general help text for openXBOW and get a list of all available parameters (including a description), simply run openXBOW in your console:

java -jar openXBOW.jar

Make sure that the props folder containing the information about the command line options is in the same folder as the jar file. Windows users might need to call Java like this:

java.exe -jar openXBOW.jar

In the following, four examples on how to use opemXBOW are given, highlighting several common usecases of BoW processing.

Example 1 - Generation of a Bag-of-Words Representation from Numeric Low-Level Descriptors

A typical use case would be the classification of audio segments or images. For each audio/image document, a certain number of feature vectors, i.e., numeric low-level descriptors (LLDs), e.g., MFCCs, SIFT, etc. are given.
From all LLDs belonging to one document/sample, a bag-of-words representation should be created.
In the folder examples/example1, you find two files llds.arff and llds.csv, which contain exactly the same information, but differ only in the format (ARFF, used by the machine learning software Weka) and CSV. You can always use either of the two formats, depending on your personal preference. The first attribute (or column) is always a string specifying the sample name the LLDs belong to. The file labels.csv is a list of lables for each sample. Please note that a CSV (separator ;) file and a header line are required here.
The file llds_labels.arff is another representation of the same LLDs, but here, the labels are directly included in the ARFF file. The labels can also be included in the same way as a column in the CSV file (this is not shown here). However, in case the labels are provided with the LLDs, the value for the target attribute/column must be constant throughout the LLDs of the same document. It is even possible to have multiple labels (targets) for each segment. Then, each target must have its own attribute (ARFF) or column (CSV).

Now, let's start to generate a bag-of-words output for the sample data. The three following lines using different input formats should all create the same output arff bag-of-words file.

java -jar openXBOW.jar -i examples/example1/llds.arff -l examples/example1/labels.csv -o bow.arff
java -jar openXBOW.jar -i examples/example1/llds.csv  -l examples/example1/labels.csv -o bow.arff
java -jar openXBOW.jar -i examples/example1/llds_labels.arff -o bow.arff

The command line option -i specifies the input file and -o the output file. The file format is always recognised automatically based on the file ending. Option -l specifies the label file.
Of course it is always possible to generate bag-of-words output without a target (class label), such as

java -jar openXBOW.jar -i examples/example1/llds.csv -o bow.arff

A CSV output is generated using -o bow.csv. Also LibSVM (Liblinear) output is supported (-o bow.libsvm):

java -jar openXBOW.jar -i examples/example1/llds.csv -o bow.csv
java -jar openXBOW.jar -i examples/example1/llds.csv -o bow.libsvm

Note that, in case of a libsvm output, the labels must be integers.

As you can see from the examples, the default codebook size is 500. The codebook size (in case of numeric input) can be modified using the parameter -size.
Also the number of assignments, i.e., the number of words in the bag representation, where the counter is increased for each input LLD, can be modified. The default number of assigments is 1, for multi assignment, use the option -a. As distance measure, the Euclidean distance is always used.

To generate a bag-of-words representation with a codebook size of 1000, and 10 assignments per input LLD, use the following command line:

java -jar openXBOW.jar -i examples/example1/llds_labels.arff -o bow.arff -size 1000 -a 10

Until now, we generated the codebook by a random sampling of all LLDs in the input file. More precisely, per default, the initialisation step of the k-means++ clustering algorithm is executed, but the cluster centroids are not updated. In this initialisation step, the codebook vectors (words/templates) are selected subsequently while favouring vectors which are farther away (in terms of Euclidean distance) from the already selected ones. Random sampling is much faster than clustering while obtaining almost the same final performance. k-means++ is employed when using -c kmeans++. Also the standard k-means clustering (-c kmeans) and a standard random sampling (-c random) are available.
Right now, we know how to generate different codebooks, but so far, the codebook is always learnt from the whole given input sequence. In supervised learning (as the main application of the bag-of-words representation), however, we usually need to evaluate a method on completely unseen test data.
To accomplish this, the codebook needs to be stored and then loaded when used on the test data. You can trigger those options using -B (store) and -b (load).

In the following line, a codebook of size 200 is learnt using kmeans++ clustering, first and is then stored in the file codebook. At the same time, a bag-of-words representation using multi assignment of the input is generated and stored (bow.arff).

java -jar openXBOW.jar -i examples/example1/llds_labels.arff -o bow.arff -a 5 -c kmeans++ -size 200 -B codebook

In the following line, the learnt codebook is loaded and applied to the same input data. (Applying the codebook to the same data does not make sense, of course, this is just to exemplify the usage.) The format of the codebook is a text format and readable quite easily, you can have a look opening codebook with a text editor.

java -jar openXBOW.jar -i examples/example1/llds_labels.arff -o bow.arff -a 5 -b codebook

Please note that, in case consistency between training and test data is targeted, the parameter for multi assignment -a 5 must be repeated when processing the test data.

After the generation of the bag-of-words representation, they can be further processed with the following options:

Logarithmic term-frequency weighting: -log applies lg(TF+1) to each term frequency (TF) in the bag. This option is stored in the codebook file and used when the respective codebook is loaded.
Inverse document frequency (IDF) transform: -idf Each term frequency is multiplied with the logarithm of the ratio of the total number of instances and the number of instances where the respective term/word is present. This option and the corresponding parameters (IDF weights) are stored in the codebook file and used when the respective codebook is loaded.
Histogram normalisation: -norm 1, -norm 2, or -norm 3. Please see the help text for information about the differences between the options.

The histogram normalization is a very common option and especially required when the amount of input LLDs differs between the different input documents/samples.

Usually, it is also required to standardise (or normalise) the input LLDs as their values are in different ranges. This can be tackled using the options -standardizeInput or -normalizeInput. Note that, the corresponding parameters derived from the data (in case of standardisation: mean and std. deviation) are also stored in the codebook and are then applied to test data (online approach). The same goes for the standardisation/normalisation of the resulting bag-of-words representation. This is done by openXBOW using -standardizeOutput or -normalizeOutput. Standardisation/normalisation of the features is required (or at least useful) in some machine learning algorithms, such as, e.g., support vector machine.

Further options can be seen in the default help text on the command line.

Example 2 - Generation of a time-dependent Bag-of-Word

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。