Neuon
Neural Network for Computer Vision and Audition
Install / Use
/learn @sergeyrachev/NeuonREADME
NEUON
Neural Network for Lip Sync detection in video streaming
##Concept
https://arxiv.org/pdf/1706.05739.pdf https://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf
##Dataset
Self-made accordingly The VoxCeleb1 Dataset description http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
Current folder layout(31.01.2019) differs from previous one when the download and clipping scripts were made. To use the latest dataset description script adaptation is needed.
Used Components
-
youtube-dl To download video from Youtube by URL
-
FFMpeg(licenced under LGPL2.1 ) Use with dynamic linking to decode media source and perform video framerate conversion and audio samplerate conversion in example application
-
Aquila(licensed under MIT) Used for audio feature extraction. Original code was patched to provide MFEC feature alongside original MFCC
-
Dlib(Licensed under Boost 1.0 License and CC-0 for pretrained model) Library provides routines for face landmark detection with pre-trained model
-
Keras(Licensed under MIT) + other Python libraries(documentation is in progress)
##Implementation details
WIP
##Known limitation
WIP
How to build
Refer to ci folder to resolve dependencies and install prerequisites. Then:
cmake -DBUILD_SHARED_LIBS=On -DNEUON_PREFIX_PATH=${PWD}/shared -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCMAKE_PREFIX_PATH=${PWD}/static:${PWD}/shared -DCMAKE_INSTALL_PREFIX="neuon" -DCPACK_SET_DESTDIR=On -DCMAKE_BUILD_TYPE=Release -Dversion=0.0.0.0 -Drevision=00000000 ..
Related Skills
node-connect
354.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
