IMAX
Official PyTorch codes for Imaginatively-connected Embedding in Complex Space for Unseen Attribute-Object Discrimination (TPAMI 2024)
Install / Use
/learn @LanchJL/IMAXREADME
Imaginary-Connected Embedding in Complex Space for Unseen Attribute-Object Discrimination
- Title: Imaginary-Connected Embedding in Complex Space for Unseen Attribute-Object Discrimination
- Institutes: Nanjing University of Science and Technology, Newcastle University, Durham University, University of Chinese Academy of Sciences
- Publication Status: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
News
[2025.4] We uploaded the test code as well as our trained weights to facilitate testing directly based on the existing weights.
[2025.1] To enhance readability, we restructured the IMAX code based on Troika and introduced an Adapter module to facilitate the adaptation of the CLIP visual encoder to the test dataset. We found that this improvement further enhances IMAX's performance across three benchmark datasets.
[2025.1] We have completed the open-sourcing of the CLIP-based IMAX code, and the remaining encoder implementations will be uploaded shortly.
Setup
conda create --name imax python=3.8
conda activate imax
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip3 install git+https://github.com/openai/CLIP.git
The remaining dependencies can be found in the ./requirements.txt file and installed using pip install -r requirements.txt.
The CLIP weights can be downloaded via CLIP and should be placed in the ./clip_modules/ directory.
Datasets
The splits of dataset and its attributes can be found in ./utils/download_data.sh, the complete installation process can be found in CGE&CompCos.
You can download the datasets using
bash ./utils/download_data.sh
Train
If you wish to try training our model from scratch, for example, for UT-Zappos:
python -u train.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_UT-Zap50k> \
--save_path <path_to_logs> \
--yml_path ./config/ut-zappos.yml \
--num_workers 4 \
--seed 0 \
--adapter
MIT-States:
python -u train.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_MIT-States> \
--save_path <path_to_logs> \
--yml_path ./config/mit-states.yml \
--num_workers 2 \
--seed 0 \
--adapter
C-GQA:
python -u train.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_C-GQA> \
--save_path <path_to_logs> \
--yml_path ./config/cgqa.yml \
--num_workers 2 \
--seed 0 \
--adapter
Or you can run sh files in ./run/, for example:
sh ./run/utzappos.sh
Test
Our trained weights can be found in here. If you want to test that the trained model is based on existing weights, run the following command:
python -u test.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_CG-QA> \
--yml_path <path_to_yml> \
--num_workers 4 \
--seed 0 \
--adapter \
--load_model <path_to_weights>
Acknowledgement
The code we publish is based on the following outstanding repositories, which have helped us a lot
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
