CLVQA
[AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)
Install / Use
/learn @showlab/CLVQAREADME
CLVQA
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (AAAI2023)
[arXiv | Data & annotation(json/npy)]
<img src="./figures/gh_teaser.png" alt="CLVQA" style="zoom:67%;" />
Preparation
Installation
conda create -n mmclvqa python=3.8
conda activate mmclvqa
git clone https://github.com/showlab/CLVQA.git
cd CLVQA
cd mmclvqa
pip install --editable .
cd ..
pip install -r extra_requirements.txt
CLOVE Dataset and Annotation
We release the datasets and annotations in json format(link) and npy format(link). To use our code for training, please download the npy files.
- Example of data sample:
{ 'answer': 'kiosk', # answer 'answers': ['kiosk','kiosk',...], # answer in VQAv2 format, repeat 10 times if there is only one answer in the annotation 'feature_path': '440.npy', # feature path to retrieve features 'gqa_question': # GQA annotations, if applicable { 'annotations': { 'answer': {}, 'fullAnswer': {}, 'question': {}}, 'answer': 'kiosk', 'entailed': ['06778810', '06778808'], 'equivalent': ['06778808', '06778809'], 'fullAnswer': 'It is a kiosk.', 'groups': {'global': 'place', 'local': '02q-place'}, 'imageId': '440', 'isBalanced': True, 'question': 'What place is this?', 'semantic': [ { 'argument': 'scene', 'dependencies': [], 'operation': 'select'}, { 'argument': 'place', 'dependencies': [0], 'operation': 'query'}], 'semanticStr': 'select: scene->query: place [0]', 'types': { 'detailed': 'place', 'semantic': 'global', 'structural': 'query'}}, 'gt_scene_graph_mask': [1,0,0,0 ..., ], # Ground-truth SG mask for question answer generation corresponding to `gt_scene_graph_seq`. 1 represents the SG relation is related to the question-answer generation. 'gt_scene_graph_seq': [ # Ground-truth SG annotated for the image in this annotation datum. 'kiosk [SEP]', 'counter [SEP]', 'lady [SEP]', 'trash can [SEP]', ... ], 'image_id': '440', # image id 'image_source': 'vg', # image source 'ocr': [], # ocr info in the image, applicable in textvqa 'ocr_info': [], # ocr info in the image, applicable in textvqa 'ocr_tokens': [], # ocr tokens, applicable in text vqa 'pred_scene_graph_seq': [ # predicted SG extracted by an off-the-shelf model 'building behind man [SEP]', 'building behind woman [SEP]', 'man watching man [SEP]', 'person watching man [SEP]', 'building behind woman [SEP]', ... ], 'program': [ # program excuted to generate question {'argument': 'scene', 'dependencies': [], 'operation': 'select'}, { 'argument': 'place', 'dependencies': [0], 'operation': 'query'} ], 'question': 'What place is this?', # question 'question_id': 'g06778809', # question id 'raw_question_type': { # raw question type, applicable in original GQA annotation 'detailed': 'place', 'semantic': 'global', 'structural': 'query' }, 'set_name': 'train', # set name: train/val 'stage': 'object', # stage name for continual learning 'supporting_fact': [] # supporting facts, applicable in stage "knowledge" }
Training
Symbolic Replay Model (SRM)
Implementation for Symbolic Replay Model could be found in SRM/. We provide training scripts for SRM here. Specifically,
cd SRM/
# training SRM under scene-incremental setting, with task order a->b->c->d->e->f, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting scene \
--task_seq abcdef \
--model_name distilgpt2 \
--model_dir_root /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_scene_task_token \
--add_task_tokens \
--n_train_epochs 15
# training SRM under function-incremental setting, with task order o->a->r->l->k->s, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting functional \
--task_seq oarlks \
--model_name distilgpt2 \
--model_dir_root /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token \ --add_task_tokens \
--n_train_epochs 15
- We release our replayed samples for 6 task orders as reported in the paper.
- For the 6 tasks orders, you can inspect via these files: scene / function or refer to our paper.
UniVQA
Refer to scripts in this folder for one-stop training-and-testing (generated by generate_run_scripts.py). Specifically, training with replayed samples from SRM, with #replayed_samples : #current_task_samples = $1.5:1$, with task order $o \rightarrow a \rightarrow r \rightarrow l \rightarrow k \rightarrow s$:
ROOT=/Users/stan
DEVICE=0
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_attribute_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[a].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/stand_alone/functional/unicl_object/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_relation_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[r].npy,oarlks_REPLAY[a]_AT[r].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_logical_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[l].npy,oarlks_REPLAY[a]_AT[l].npy,oarlks_REPLAY[r]_AT[l].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
