MOOSA
[ECCV 2024] Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
Install / Use
/learn @donghao51/MOOSAREADME
</div>
Our proposed MOOSA framework for Multimodal Open-Set Domain Generalization and Adaptation.
Update: We have a new survey paper on Multimodal Adaptation and Generalization
Code
The code was tested using Python 3.10.13, torch 2.3.1+cu121 and NVIDIA GeForce RTX 3090, more dependencies are in requirement.txt.
Environments:
mmcv-full 1.2.7
mmaction2 0.13.0
EPIC-Kitchens Dataset
Prepare
Download Pretrained Weights
-
Download Audio model link, rename it as
vggsound_avgpool.pth.tarand place under theEPIC-rgb-flow-audio/pretrained_modelsdirectory -
Download SlowFast model for RGB modality link and place under the
EPIC-rgb-flow-audio/pretrained_modelsdirectory -
Download SlowOnly model for Flow modality link and place under the
EPIC-rgb-flow-audio/pretrained_modelsdirectory
Download EPIC-Kitchens Dataset
bash download_script.sh
Download Audio files EPIC-KITCHENS-audio.zip.
Unzip all files and the directory structure should be modified to match:
<details> <summary>Click for details...</summary>├── MM-SADA_Domain_Adaptation_Splits
├── rgb
| ├── train
| | ├── D1
| | | ├── P08_01.wav
| | | ├── P08_01
| | | | ├── frame_0000000000.jpg
| | | | ├── ...
| | | ├── P08_02.wav
| | | ├── P08_02
| | | ├── ...
| | ├── D2
| | ├── D3
| ├── test
| | ├── D1
| | ├── D2
| | ├── D3
├── flow
| ├── train
| | ├── D1
| | | ├── P08_01
| | | | ├── u
| | | | | ├── frame_0000000000.jpg
| | | | | ├── ...
| | | | ├── v
| | | ├── P08_02
| | | ├── ...
| | ├── D2
| | ├── D3
| ├── test
| | ├── D1
| | ├── D2
| | ├── D3
</details>
Video and Audio
<details> <summary>Click for details...</summary>cd EPIC-rgb-flow-audio
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 5 --mask_ratio 0.7 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.7 --entropy_min_weight 1.0 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.7 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
</details>
Video and Flow
<details> <summary>Click for details...</summary>cd EPIC-rgb-flow-audio
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_flow -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --mask_ratio 0.7 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_flow -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.7 --entropy_min_weight 1.0 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_flow -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --mask_ratio 0.7 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
</details>
Flow and Audio
<details> <summary>Click for details...</summary>cd EPIC-rgb-flow-audio
python train_video_flow_audio_EPIC_MOOSA.py --use_flow --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 25 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_flow --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_flow --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/EPIC-KITCHENS/
</details>
Video and Flow and Audio
<details> <summary>Click for details...</summary>cd EPIC-rgb-flow-audio
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_flow --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --mask_ratio 0.7 --entropy_min_weight 0.001 --jigsaw_num_splits 2 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_flow --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.7 --entropy_min_weight 0.1 --jigsaw_num_splits 2 --datapath /path/to/EPIC-KITCHENS/
python train_video_flow_audio_EPIC_MOOSA.py --use_video --use_flow --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.3 --entropy_min_weight 0.1 --jigsaw_num_splits 2 --datapath /path/to/EPIC-KITCHENS/
</details>
HAC Dataset
This dataset can be downloaded at link.
Unzip all files and the directory structure should be modified to match:
<details> <summary>Click for details...</summary>HAC
├── human
| ├── videos
| | ├── ...
| ├── flow
| | ├── ...
| ├── audio
| | ├── ...
├── animal
| ├── videos
| | ├── ...
| ├── flow
| | ├── ...
| ├── audio
| | ├── ...
├── cartoon
| ├── videos
| | ├── ...
| ├── flow
| | ├── ...
| ├── audio
| | ├── ...
</details>
Download the pretrained weights similar to EPIC-Kitchens Dataset and put under the HAC-rgb-flow-audio/pretrained_models directory.
Video and Audio
<details> <summary>Click for details...</summary>cd HAC-rgb-flow-audio
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 5 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.7 --entropy_min_weight 0.001 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
</details>
Video and Flow
<details> <summary>Click for details...</summary>cd HAC-rgb-flow-audio
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_flow -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_flow -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_flow -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
</details>
Flow and Audio
<details> <summary>Click for details...</summary>cd HAC-rgb-flow-audio
python train_video_flow_audio_HAC_MOOSA.py --use_flow --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_flow --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.3 --entropy_min_weight 0.001 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_flow --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.7 --entropy_min_weight 0.001 --datapath /path/to/HAC/
</details>
Video and Flow and Audio
<details> <summary>Click for details...</summary>cd HAC-rgb-flow-audio
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_flow --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.7 --entropy_min_weight 0.001 --jigsaw_num_splits 2 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_flow --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --mask_ratio 0.7 --entropy_min_weight 0.001 --jigsaw_num_splits 2 --jigsaw_samples 64 --datapath /path/to/HAC/
python train_video_flow_audio_HAC_MOOSA.py --use_video --use_flow --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 20 --mask_ratio 0.7 --alpha_trans 0.5 --entropy_min_weight 0.001 --jigsaw_num_splits 2 --datapath /path/to/HAC/
</details>
Multimodal Open-Set Domain Adaptation
Video and Audio
<details> <summary>Click for details...</summary>cd EPIC-rg
Related Skills
node-connect
337.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.1kCommit, push, and open a PR
