IMAX

Official PyTorch codes for Imaginatively-connected Embedding in Complex Space for Unseen Attribute-Object Discrimination (TPAMI 2024)

Generate Convert Improve

Install / Use

/learn @LanchJL/IMAX

About this skill

Quality Score

0/100

README

Imaginary-Connected Embedding in Complex Space for Unseen Attribute-Object Discrimination

Title: Imaginary-Connected Embedding in Complex Space for Unseen Attribute-Object Discrimination
Institutes: Nanjing University of Science and Technology, Newcastle University, Durham University, University of Chinese Academy of Sciences
Publication Status: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

News

[2025.4] We uploaded the test code as well as our trained weights to facilitate testing directly based on the existing weights.

[2025.1] To enhance readability, we restructured the IMAX code based on Troika and introduced an Adapter module to facilitate the adaptation of the CLIP visual encoder to the test dataset. We found that this improvement further enhances IMAX's performance across three benchmark datasets.

[2025.1] We have completed the open-sourcing of the CLIP-based IMAX code, and the remaining encoder implementations will be uploaded shortly.

Setup

conda create --name imax python=3.8
conda activate imax
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip3 install git+https://github.com/openai/CLIP.git

The remaining dependencies can be found in the ./requirements.txt file and installed using pip install -r requirements.txt.

The CLIP weights can be downloaded via CLIP and should be placed in the ./clip_modules/ directory.

Datasets

The splits of dataset and its attributes can be found in ./utils/download_data.sh, the complete installation process can be found in CGE&CompCos. You can download the datasets using

bash ./utils/download_data.sh

Train

If you wish to try training our model from scratch, for example, for UT-Zappos:

python -u train.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_UT-Zap50k> \
--save_path <path_to_logs> \
--yml_path ./config/ut-zappos.yml \
--num_workers 4 \
--seed 0 \
--adapter

MIT-States:

python -u train.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_MIT-States> \
--save_path <path_to_logs> \
--yml_path ./config/mit-states.yml \
--num_workers 2 \
--seed 0 \
--adapter

C-GQA:

python -u train.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_C-GQA> \
--save_path <path_to_logs> \
--yml_path ./config/cgqa.yml \
--num_workers 2 \
--seed 0 \
--adapter

Or you can run sh files in ./run/, for example:

sh ./run/utzappos.sh

Test

Our trained weights can be found in here. If you want to test that the trained model is based on existing weights, run the following command:

python -u test.py \
--clip_arch ./clip_modules/ViT-L-14.pt \
--dataset_path <path_to_CG-QA> \
--yml_path <path_to_yml> \
--num_workers 4 \
--seed 0 \
--adapter \
--load_model <path_to_weights>

Acknowledgement

The code we publish is based on the following outstanding repositories, which have helped us a lot

DFSP
Troika

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。