SkillAgentSearch skills...

CLVQA

[AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)

Install / Use

/learn @showlab/CLVQA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CLVQA

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (AAAI2023)

[arXiv | Data & annotation(json/npy)]

<img src="./figures/gh_teaser.png" alt="CLVQA" style="zoom:67%;" />

Preparation

Installation

conda create -n mmclvqa python=3.8
conda activate mmclvqa

git clone https://github.com/showlab/CLVQA.git
cd CLVQA
cd mmclvqa
pip install --editable .

cd ..
pip install -r extra_requirements.txt

CLOVE Dataset and Annotation

We release the datasets and annotations in json format(link) and npy format(link). To use our code for training, please download the npy files.

  • Example of data sample:
    { 
    'answer': 'kiosk',                                         # answer
    'answers': ['kiosk','kiosk',...],                          # answer in VQAv2 format, repeat 10 times if there is only one answer in the annotation
    'feature_path': '440.npy',                                 # feature path to retrieve features
    'gqa_question':                                            # GQA annotations, if applicable
                    { 'annotations': { 'answer': {},
                                        'fullAnswer': {},
                                        'question': {}},
                        'answer': 'kiosk',
                        'entailed': ['06778810', '06778808'],
                        'equivalent': ['06778808', '06778809'],
                        'fullAnswer': 'It is a kiosk.',
                        'groups': {'global': 'place', 'local': '02q-place'},
                        'imageId': '440',
                        'isBalanced': True,
                        'question': 'What place is this?',
                        'semantic': [ { 'argument': 'scene',
                                        'dependencies': [],
                                        'operation': 'select'},
                                    { 'argument': 'place',
                                        'dependencies': [0],
                                        'operation': 'query'}],
                        'semanticStr': 'select: scene->query: place [0]',
                        'types': { 'detailed': 'place',
                                'semantic': 'global',
                                'structural': 'query'}},
    'gt_scene_graph_mask': [1,0,0,0 ..., ],                  # Ground-truth SG mask for question answer generation corresponding to `gt_scene_graph_seq`. 1 represents the SG relation is related to the question-answer generation.           
    'gt_scene_graph_seq': [                                   # Ground-truth SG annotated for the image in this annotation datum.
        'kiosk [SEP]', 'counter [SEP]', 'lady [SEP]', 'trash can [SEP]', ...
        ],
    'image_id': '440',                                        # image id
    'image_source': 'vg',                                     # image source
    'ocr': [],                                                # ocr info in the image, applicable in textvqa
    'ocr_info': [],                                           # ocr info in the image, applicable in textvqa
    'ocr_tokens': [],                                         # ocr tokens, applicable in text vqa
    'pred_scene_graph_seq': [                                 # predicted SG extracted by an off-the-shelf model
                                'building behind man [SEP]',
                                'building behind woman [SEP]',
                                'man watching man [SEP]',
                                'person watching man [SEP]',
                                'building behind woman [SEP]',
                                ...
                            ],
    'program': [                                              # program excuted to generate question
                {'argument': 'scene', 'dependencies': [], 'operation': 'select'},
                { 'argument': 'place',
                    'dependencies': [0],
                    'operation': 'query'}
                ],
    'question': 'What place is this?',                        # question
    'question_id': 'g06778809',                               # question id
    'raw_question_type': {                                    # raw question type, applicable in original GQA annotation
                            'detailed': 'place',
                            'semantic': 'global',
                            'structural': 'query'
                            },
    'set_name': 'train',                                      # set name: train/val
    'stage': 'object',                                        # stage name for continual learning
    'supporting_fact': []                                     # supporting facts, applicable in stage "knowledge"
    }
    

Training

Symbolic Replay Model (SRM)

Implementation for Symbolic Replay Model could be found in SRM/. We provide training scripts for SRM here. Specifically,

cd SRM/
# training SRM under scene-incremental setting, with task order a->b->c->d->e->f, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
 --cl_setting scene \
 --task_seq abcdef \
 --model_name distilgpt2 \
 --model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_scene_task_token  \
 --add_task_tokens \
 --n_train_epochs 15

# training SRM under function-incremental setting, with task order o->a->r->l->k->s, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting functional \
--task_seq oarlks \
--model_name distilgpt2 \
--model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token  \ --add_task_tokens \
--n_train_epochs 15
  • We release our replayed samples for 6 task orders as reported in the paper.
  • For the 6 tasks orders, you can inspect via these files: scene / function or refer to our paper.

UniVQA

Refer to scripts in this folder for one-stop training-and-testing (generated by generate_run_scripts.py). Specifically, training with replayed samples from SRM, with #replayed_samples : #current_task_samples = $1.5:1$, with task order $o \rightarrow a \rightarrow r \rightarrow l \rightarrow k \rightarrow s$:

ROOT=/Users/stan
DEVICE=0
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_attribute_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[a].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/stand_alone/functional/unicl_object/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_relation_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[r].npy,oarlks_REPLAY[a]_AT[r].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_logical_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[l].npy,oarlks_REPLAY[a]_AT[l].npy,oarlks_REPLAY[r]_AT[l].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \

Related Skills

View on GitHub
GitHub Stars41
CategoryEducation
Updated9d ago
Forks6

Languages

Python

Security Score

75/100

Audited on Mar 30, 2026

No findings