Avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
Install / Use
/learn @afourast/AvobjectsREADME
avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
Project page | Paper
Self-Supervised Learning of audio-visual objects from video.<br> Triantafyllos Afouras, Andrew Owens, Joon Son Chung, Andrew Zisserman <br> In ECCV 2020.
Installing dependencies
conda env create -f environment.yml
conda activate avobjects
Demo
Download pretrained model weights
bash download_models.bash
Run the separation demo
python main.py --resume checkpoints/avobjects_loc_sep.pt --input_video demo.mp4 --output_dir demo_out
Output
<p align="center"> <img src="media/example_output/demo_output.jpg" width="100%"/> </p>The output directory will also contain videos with the separated audio for every tracked speaker.
Optional: You can point a web browser to the output directory to view the video results. If working on a remote machine, you can run a web server on port 8000 by running
cd demo_out; python3 -m http.server 8000
Running on custom video
To run the model on a new video, add it into the media/ directory and select it using the --input_video argument.
You can specify the number of AV objects to track using the --n_peaks argument.
For example
python main.py --resume checkpoints/avobjects_loc_sep.pt --n_peaks 2 --input_video trump_biden_debate.mp4 --output_dir trum_biden_debate_out
Output
<p align="center"> <img src="media/example_output/trump_biden_debate_output.jpg" width="100%"/> </p>Citation
If you use this code for your research, please cite:
@InProceedings{Afouras20b,
author = "Triantafyllos Afouras and Andrew Owens and Joon~Son Chung and Andrew Zisserman",
title = "Self-Supervised Learning of Audio-Visual Objects from Video",
booktitle = "European Conference on Computer Vision",
year = "2020",
}
Related Skills
qqbot-channel
350.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.4k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
350.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
