CLVQA

[AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)

Generate Convert Improve

Install / Use

/learn @showlab/CLVQA

About this skill

Quality Score

0/100

README

CLVQA

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (AAAI2023)

[`arXiv` | Data & annotation(`json`/`npy`)]

Preparation

Installation

conda create -n mmclvqa python=3.8
conda activate mmclvqa

git clone https://github.com/showlab/CLVQA.git
cd CLVQA
cd mmclvqa
pip install --editable .

cd ..
pip install -r extra_requirements.txt

CLOVE Dataset and Annotation

We release the datasets and annotations in json format(link) and npy format(link). To use our code for training, please download the npy files.

Example of data sample:

{ 
'answer': 'kiosk',                                         # answer
'answers': ['kiosk','kiosk',...],                          # answer in VQAv2 format, repeat 10 times if there is only one answer in the annotation
'feature_path': '440.npy',                                 # feature path to retrieve features
'gqa_question':                                            # GQA annotations, if applicable
                { 'annotations': { 'answer': {},
                                    'fullAnswer': {},
                                    'question': {}},
                    'answer': 'kiosk',
                    'entailed': ['06778810', '06778808'],
                    'equivalent': ['06778808', '06778809'],
                    'fullAnswer': 'It is a kiosk.',
                    'groups': {'global': 'place', 'local': '02q-place'},
                    'imageId': '440',
                    'isBalanced': True,
                    'question': 'What place is this?',
                    'semantic': [ { 'argument': 'scene',
                                    'dependencies': [],
                                    'operation': 'select'},
                                { 'argument': 'place',
                                    'dependencies': [0],
                                    'operation': 'query'}],
                    'semanticStr': 'select: scene->query: place [0]',
                    'types': { 'detailed': 'place',
                            'semantic': 'global',
                            'structural': 'query'}},
'gt_scene_graph_mask': [1,0,0,0 ..., ],                  # Ground-truth SG mask for question answer generation corresponding to `gt_scene_graph_seq`. 1 represents the SG relation is related to the question-answer generation.           
'gt_scene_graph_seq': [                                   # Ground-truth SG annotated for the image in this annotation datum.
    'kiosk [SEP]', 'counter [SEP]', 'lady [SEP]', 'trash can [SEP]', ...
    ],
'image_id': '440',                                        # image id
'image_source': 'vg',                                     # image source
'ocr': [],                                                # ocr info in the image, applicable in textvqa
'ocr_info': [],                                           # ocr info in the image, applicable in textvqa
'ocr_tokens': [],                                         # ocr tokens, applicable in text vqa
'pred_scene_graph_seq': [                                 # predicted SG extracted by an off-the-shelf model
                            'building behind man [SEP]',
                            'building behind woman [SEP]',
                            'man watching man [SEP]',
                            'person watching man [SEP]',
                            'building behind woman [SEP]',
                            ...
                        ],
'program': [                                              # program excuted to generate question
            {'argument': 'scene', 'dependencies': [], 'operation': 'select'},
            { 'argument': 'place',
                'dependencies': [0],
                'operation': 'query'}
            ],
'question': 'What place is this?',                        # question
'question_id': 'g06778809',                               # question id
'raw_question_type': {                                    # raw question type, applicable in original GQA annotation
                        'detailed': 'place',
                        'semantic': 'global',
                        'structural': 'query'
                        },
'set_name': 'train',                                      # set name: train/val
'stage': 'object',                                        # stage name for continual learning
'supporting_fact': []                                     # supporting facts, applicable in stage "knowledge"
}

Training

Symbolic Replay Model (SRM)

Implementation for Symbolic Replay Model could be found in SRM/. We provide training scripts for SRM here. Specifically,

cd SRM/
# training SRM under scene-incremental setting, with task order a->b->c->d->e->f, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
 --cl_setting scene \
 --task_seq abcdef \
 --model_name distilgpt2 \
 --model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_scene_task_token  \
 --add_task_tokens \
 --n_train_epochs 15

# training SRM under function-incremental setting, with task order o->a->r->l->k->s, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting functional \
--task_seq oarlks \
--model_name distilgpt2 \
--model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token  \ --add_task_tokens \
--n_train_epochs 15

We release our replayed samples for 6 task orders as reported in the paper.
- scene
- function
For the 6 tasks orders, you can inspect via these files: scene / function or refer to our paper.

UniVQA

Refer to scripts in this folder for one-stop training-and-testing (generated by generate_run_scripts.py). Specifically, training with replayed samples from SRM, with #replayed_samples : #current_task_samples = $1.5:1$, with task order $o \rightarrow a \rightarrow r \rightarrow l \rightarrow k \rightarrow s$:

ROOT=/Users/stan
DEVICE=0
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_attribute_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[a].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/stand_alone/functional/unicl_object/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_relation_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[r].npy,oarlks_REPLAY[a]_AT[r].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_logical_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[l].npy,oarlks_REPLAY[a]_AT[l].npy,oarlks_REPLAY[r]_AT[l].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

19.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

showlab

View profile

View on GitHub

GitHub Stars41

CategoryEducation

Updated9d ago

Forks6

showlab/CLVQA

Languages

Python

Security Score

75/100

Audited on Mar 30, 2026

No findings

CLVQA

Install / Use

README

CLVQA

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (AAAI2023)

[arXiv | Data & annotation(json/npy)]

Preparation

Installation

CLOVE Dataset and Annotation

Training

Symbolic Replay Model (SRM)

UniVQA

Related Skills

[`arXiv` | Data & annotation(`json`/`npy`)]