Neuon

Neural Network for Computer Vision and Audition

Generate Convert Improve

Install / Use

/learn @sergeyrachev/Neuon

About this skill

Quality Score

0/100

README

NEUON

Neural Network for Lip Sync detection in video streaming

##Concept

https://arxiv.org/pdf/1706.05739.pdf https://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf

##Dataset

Self-made accordingly The VoxCeleb1 Dataset description http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html

Current folder layout(31.01.2019) differs from previous one when the download and clipping scripts were made. To use the latest dataset description script adaptation is needed.

Used Components

youtube-dl To download video from Youtube by URL
FFMpeg(licenced under LGPL2.1 ) Use with dynamic linking to decode media source and perform video framerate conversion and audio samplerate conversion in example application
Aquila(licensed under MIT) Used for audio feature extraction. Original code was patched to provide MFEC feature alongside original MFCC
Dlib(Licensed under Boost 1.0 License and CC-0 for pretrained model) Library provides routines for face landmark detection with pre-trained model
Keras(Licensed under MIT) + other Python libraries(documentation is in progress)

##Implementation details

WIP

##Known limitation

WIP

How to build

Refer to ci folder to resolve dependencies and install prerequisites. Then:

cmake -DBUILD_SHARED_LIBS=On -DNEUON_PREFIX_PATH=${PWD}/shared -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCMAKE_PREFIX_PATH=${PWD}/static:${PWD}/shared -DCMAKE_INSTALL_PREFIX="neuon" -DCPACK_SET_DESTDIR=On -DCMAKE_BUILD_TYPE=Release -Dversion=0.0.0.0 -Drevision=00000000 ..

Related Skills

node-connect

354.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。