SkillAgentSearch skills...

CCAligner

๐Ÿ”ฎ Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.

Install / Use

/learn @saurabhshri/CCAligner

README

๐Ÿ—ผCCAligner image:http://upload.wikimedia.org/wikipedia/commons/3/35/Tux.svg[Linux,25,25,link="https://travis-ci.org/saurabhshri/CCAligner"] / image:https://upload.wikimedia.org/wikipedia/commons/f/fa/Apple_logo_black.svg[macOS,25,25,link="https://travis-ci.org/saurabhshri/CCAligner"] image:https://travis-ci.org/saurabhshri/CCAligner.svg?branch=master["Linux/macOS Build Status", link="https://travis-ci.org/saurabhshri/CCAligner"] image:https://upload.wikimedia.org/wikipedia/commons/e/ee/Windows_logo_%E2%80%93_2012_%28dark_blue%29.svg[Windows,25,25,link="https://ci.appveyor.com/project/saurabhshri/ccaligner"] image:https://ci.appveyor.com/api/projects/status/pojryaxnykthuy9p/branch/master?svg=true["Windows Build Status", link="https://ci.appveyor.com/project/saurabhshri/ccaligner"]

Word by word audio subtitle synchronization (forced alignemnt) tool and API. Developed under Google Summer of Code 2017 with CCExtractor.

(https://saurabhshri.github.io/)

[link=https://www.youtube.com/watch?v=38_27E1PxXA] image::https://raw.githubusercontent.com/saurabhshri/CCAligner/master/docs/demo.gif[align="center"]


The project is in it's very early stage and is constantly evolving. The available functions, usage instructions et cetera are expected to refactor over time. It is not production ready but you are welcome to play with it, or better, help improve it! :)


== Using CCAligner

CCAligner can be used as both standalone tool or a library in your own project.

=== Installing Dependencies ===

To automatically generate language models, dictionaries and grammars, following dependencies need to be met. The tool has capability to generate them without these dependencies, but the accuracy in that case is not guaranteed. It is highly recommended to work with the dependencies installed.

  1. cmuclmtk : to generate vocab and LM. (install/dependencies/cmuclmtk-0.7.tar.gz)
  2. g2p-seq2seq : to generate dictionary. (install/dependencies/g2p-seq2seq)

You will also need to install http://www.perl.org/get.html[Perl] and move install/quick_lm.pl to the same directory as the CCAligner or a directory that is set in the environment variable PATH.

Steps :

Linux/MacOS

To install cmuclmtk :

  1. Navigate to install/dependencies directory and uncompress cmuclmtk-0.7.tar.gz while preserving the permissions :

    tar xvpzf cmuclmtk-0.7.tar.gz

Original download link : (https://sourceforge.net/projects/cmusphinx/files/cmuclmtk/0.7/cmuclmtk-0.7.tar.gz/download)

  1. Navigate to cmumltk-0.7 directory :

    cd cmuclmtk-0.7

  2. Install :

    ./configure make sudo make install

You may have to run sudo ldconfig to fix errors such as missing shared library.

To install g2p-seq2seq :

  1. The tool requires TensorFlow at least version 1.0.0. Please see the installation https://www.tensorflow.org/install[guide^] for details. If you are on Linux (x86_64), you may directly run the following :

    sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-cp27-none-linux_x86_64.whl

Note : To use g2p with latest versions of TF, download the up-to-date repository from (https://github.com/cmusphinx/g2p-seq2seq). Please note that the recent g2p version brings changes that break few things in CCAligner, so using the supplied version is recommended.

  1. Navigate to install/dependencies directory and uncompress g2p-seq2seq.zip while preserving the permissions :

    unzip g2p-seq2seq-master.zip

  2. Navigate to g2p-seq2seq-master directory and run :

    sudo python setup.py install

The alternate ways of generating language models, dictionaries and grammars are covered later in the docs.

Windows

To install cmuclmtk :

  1. Navigate to install/dependencies directory and uncompress cmuclmtk-0.7.tar.gz. Build it with the cmuclmtk.sln it provides.

  2. Copy the compiled files (wfreq2vocab.exe, text2wfreq.exe, text2idngram.exe, idngram2lm.exe) to the same directory as the CCAligner or you can set the directory contains these files to environment variable (PATH)

To install g2p-seq2seq :

  1. First, install Python 3.5 (64-bit) and Tensorflow 1.0.0 by your preferred choice of method

  2. Navigate to install/dependencies directory and uncompress g2p-seq2seq-master.zip

  3. Navigate to g2p-seq2seq-master directory and run :

    python setup.py install

=== Before You Run ===

  1. Please make sure you have all the dependencies installed in case you want to use grammar tools. To disable generating grammar by CCAligner, issue --generate-grammar no.

  2. Make sure the model folder and g2p-seq2seq-cmudict are in the directory where you are compiling CCAligner.

  3. Make sure the subtitles are clean and are in proper SRT format.

  4. The wav file should be 16 bit PCM mono sampled at 16KHz. To generate the wav file using a video through FFmpeg, you may :

    ./ffmpeg -i input.video -bits_per_raw_sample 16 -ar 16000 -ac 1 output.wav

=== Installing ===

Linux/MacOS

  1. Clone the repository from Github using :

    git clone https://www.github.com/saurabhshri/CCAligner.git

  2. Navigate to install directory and run build.sh.

    cd install/ ./build.sh

  3. Align!

    ./ccaligner <arguments>

Windows

  1. Clone the repository from Github using :

    git clone https://www.github.com/saurabhshri/CCAligner.git

  2. Use CMake to generate project files, and then build it.

  3. Align!

    .\ccaligner <arguments>

=== Quick Demo ===

The default output of CCAligner is stored as an XML file. For example, the next command will generate file.xml :

./ccaligner -wav /path/to/file.wav -srt /path/to/file.srt

Generated Output Snippet :

.
.
<subtitle>
    <start>12780</start>
    <dialogue>I was offered a summer research      fellowship at Princeton.    </dialogue>
    <edited_dialogue>I was offered a summer research fellowship at Princeton</edited_dialogue>
        <words>
            <word>
                <recognised>0</recognised>
                <text>I</text>
                <start>12780</start>
                <end>12911</end>
                <duration>131</duration>
            </word>
            <word>
                <recognised>1</recognised>
                <text>was</text>
                <start>13030</start>
                <end>13330</end>
                <duration>300</duration>
            </word>
            <word>
                <recognised>1</recognised>
                <text>offered</text>
                <start>13400</start>
                <end>13770</end>
                <duration>370</duration>
            </word>
            .
            .
            .
        </words>
    <end>16382</end>
</subtitle>
.
.

=== API or Library usage ===

  1. Clone the repository from Github :

    git clone https://github.com/saurabhshri/CCAligner.git

  2. Place the CCAligner folder in appropriate directory in your project.

  3. In your project, simply include the directories and source file you wish to use. You may refer to CMakeLists.txt in the src/ directory to get an idea. The CCAligner tool is built around the CCAligner API.

For example : If you want to use the audio based alignment in your project


//include the header file
#include "recognize_using_pocketsphinx.h"

//Declare the aligner
PocketsphinxAligner * aligner = new PocketsphinxAligner(_parameters);

//Align
aligner->align();

//Print the result
aligner->printAligned("Manual_Printing.json", json);

//delete the aligner
delete(aligner);

Complete documentation of the API will be written in docs.

=== Some Previews ===

  • Click on video thumbnail or link to watch the video on YouTube.

[cols="1,5"] |=== a| [link=https://www.youtube.com/watch?v=38_27E1PxXA] image::https://img.youtube.com/vi/38_27E1PxXA/0.jpg[height = "100px"] | Word by Word Audio Subtitle Synchronization - Karaoke Demo 1

(https://www.youtube.com/watch?v=38_27E1PxXA)

[Sitcom]

a| [link=https://www.youtube.com/watch?v=6VnhC8u_d40] image::https://img.youtube.com/vi/6VnhC8u_d40/0.jpg[height = "100px"] | Word by Word Audio Subtitle Synchronization - Karaoke Demo 2

(https://www.youtube.com/watch?v=6VnhC8u_d40)

[Ted Talk]

a| [link=https://www.youtube.com/watch?v=j_zeixo-zJY] image::https://img.youtube.com/vi/j_zeixo-zJY/0.jpg[height = "100px"] | Word by Word Audio Subtitle Synchronization - Karaoke Demo 3

(https://www.youtube.com/watch?v=j_zeixo-zJY)

[Cartoon Show]

a| [link=https://www.youtube.com/watch?v=8tTDX6NZGsU] image::https://img.youtube.com/vi/8tTDX6NZGsU/0.jpg[height = "100px"] | Word by Word Audio Subtitle Synchronization - Karaoke Demo 1

(https://www.youtube.com/watch?v=8tTDX6NZGsU)

[Discussion Video]

a| [link=https://www.youtube.com/watch?v=tFrf0TVnqIQ] image::https://img.youtube.com/vi/tFrf0TVnqIQ/0.jpg[height = "100px"] | Word by Word Audio Video Transcription Demo

(https://www.youtube.com/watch?v=tFrf0TVnqIQ)

[Reality Show]

a| [link=https://www.youtube.com/watch?v=km1iHe_mGuo] image::https://img.youtube.com/vi/km1iHe_mGuo/0.jpg[height = "100px"] | Approximate Word by Word Audio Subtitle Synchronization

(https://www.youtube.com/watch?v=km1iHe_mGuo)

|===

== Usage Parameters ==

The following is a complete list of available parameters that can be passed to CCAligner. Feel free to open a PR if you spot a missing parameter.

  • Input related parameters :

[cols="2,2,4"] |=== | Parameter | Accepted Values | Description

|-wav |/path/to/wav_file |Provide path to input audio wave file. Wave file must be 16 bit PCM mono sampled at 16KHz.

E.g.: ccaligner -wav tbbt.wav -srt tbbt.srt

Required : yes.

|-srt |/path/to/subtitle_file |Provide path to subtitle file in SRT format. Please ensure that the subtitle file is clean and in proper format.

E.g.: ccaligner -wav tbbt.wav -srt tbbt.srt

R

View on GitHub
GitHub Stars172
CategoryDevelopment
Updated19d ago
Forks34

Languages

C++

Security Score

85/100

Audited on Mar 11, 2026

No findings