Official Implementation of our EMLNP 2025 Paper "When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs"

</div>

This repository contains the official code for training, inference, and evaluation of Model Parity Aligner (MPA).

News 🚀🚀🚀

2025/11/19: MPA is invited for presentation at the ACM ARCS 2026 (at IIT Hyderabad).
2025/11/07: MPA is selected for presentation at the 'Vision India' Session in ICVGIP 2025 (at IIT Mandi).
2025/11/05: Presented MPA at EMNLP 2025 (Suzhou, China).

To setup environment

# create new docker container(using the mentioned docker image)
$ docker run -it -d --name MPA --gpus=all -v <path-to-your-directory>:/workspace pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

# create new env MPA
$ conda create -n MPA python=3.13.5

# activate MPA
$ conda activate MPA

# install dependencies
$ pip install -r requirements.txt

Dataset

Now, we show the results of MPA on four datasets namely, TextVQA, STVQA, ChartQA, and OKVQA. Please follow the following instructions to successfully create the splits used for each dataset.

First, for TextVQA you can download the images and respective annotations from their official website. You can access the train, val, and test splits at the following paths:

train-split: /data/TextVQA/qwenTrainFormat_train.json
val-split: /data/TextVQA/qwenTrainFormat_eval.json
test-split: /data/TextVQA/TextVQA_0.5.1_val.json

Second, for STVQA you can download the images and respective annotations from their official website. You can access the train, val, and test splits at the following paths:

train-split: /data/STVQA/QwenTrainFormat_train_task_1_onePerImage_train.json
val-split: /data/STVQA/QwenTrainFormat_train_task_1_onePerImage_eval.json
test-split: /data/STVQA/train_task_1_onePerImage_val.json

Third, for ChartQA you can download the images and respective annotations from their official github repo. You can access the train, val, and test splits at the following paths:

train-split: /data/ChartVQA/train_onePerImage_QwenFormat_train.json
val-split: /data/ChartVQA/train_onePerImage_QwenFormat_eval.json
test-split: /data/ChartVQA/test_combined.json

Fourth, for OK-VQA you can download the images and respective annotations from their official website. You can access the train, val, and test splits at the following paths:

train-split: /data/OKVQA/okvqa_QwenFormat_train.json
val-split: /data/OKVQA/okvqa_QwenFormat_eval.json
test-split: /data/OKVQA/okvqa_val_combine.json

Pseudo Annotator (PA)

Now, in order to generate Pseudo Annotation of unlabeled images for task 'T', run the following command. This will create a new directory(if one does not already exists) inside the scripts directory and dump the PA json files further inside a directory following the date on which the experiment is being run. Note, demo files for the sake of demonstration are already present in the results directory.

# change to scripts dir
$ cd scripts/

# run the bash script PA.sh
$ bash PA.sh

Parity Identifier (PI)

This is the module that is responsible to identify samples that represent the knowledge gaps between S-VLM and L-VLM. Note, you have to pass the path of the PA output json file inside the respective dataloader in PI.py.

# run the bash script PI.sh
$ bash PI.sh

Parity Leveler

Now, Parity samples obtained by PI module are used to train the SVLM to enhance it. Also, note you have to pass the train json file generated during the PI step in PL.sh to train on the parity samples. Run the following command to do the same:

# run the bash script PL.sh
$ bash PL/Qwen2-VL-Finetune/scripts/PL.sh

Note, we use the following github repo to train the qwen-family models.

Evaluate

Now, to evaluate pre-trained and MPA trained models you can run the following command:

# run the bash script evaluate.sh
$ bash evaluate.sh

License

This code and data are released under the MIT license.

Cite

If you find this data/code/paper useful for your research, please consider citing.

@inproceedings{penamakuri2025big,
  title={When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs},
  author={Penamakuri, Abhirama Subramanyam and Singh, Navlika and Arora, Piyush and Mishra, Anand},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={31632--31649},
  year={2025}
}

Acknowledgements

We used code-base and pre-trained models of Qwen2vl.

MPA

Install / Use

README