Veagle: Advancement in Multimodal representation Learning

Rajat Chawla*, Arkajit Datta*, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chatterjee, Mukunda NS and Ishaan Bhola. *Equal Contribution

SuperAGI

<p align="center"> <a href="https://huggingface.co/SuperAGI/Veagle"><img src="docs/images/arch.png" width="70%"></a> <br> Model Architecture. </p>

Release

[1/18] 🔥 We released the training code of Veagle.
[1/18] 🔥 We released Veagle: Advancement in Multimodal representation Learning.

Installation

Clone the repository

git clone https://github.com/superagi/Veagle
cd Veagle

Run installation script

source venv/bin/activate
chmod +x install.sh
./install.sh

Inference

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/food.jpeg \
        --question "Is the food given in the image is healthy or not?"

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/dog.jpeg \
        --question "Write a poem that rhymes very well based on the above image."

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/astronaut.jpeg \
        --question "What is the significance of this moment in history?"

Train

After downloading the training datasets and specify their path in dataset configs, we are ready for training. We utilized 8x A100 SXM in our experiments. Please adjust hyperparamters according to your GPU resources in train config file. It may take transformers around 2 minutes to load the model, give some time for the model to start training. Make sure you have completed the installation procedure before you start training. Here we give an example of traning Veagle.

Pretraining of Veagle's visual assistant branch

torchrun --nnodes=1 --nproc_per_node=8 \
    train.py \
    --cfg-path train_configs/pretrain_veagle_mistral.yaml

Instruction Finetuning Veagle

You can run Finetuning after you have completed pretraining. Make sure to provide the pretrained model's path in the finetuning config.

torchrun --nnodes=1 --nproc_per_node=8 \
    train.py \
    --cfg-path train_configs/finetune_veagle_mistral.yaml

Acknowledgement

BLIP2 The model architecture of BLIVA follows BLIP-2. Don't forget to check this great open-source work if you don't know it before.
BLIVA The code base we took inspiration from.
mPLUG-Owl2 The code base we took inspiration from.

License

This repository's code is under BSD 3-Clause License. Many codes are based on BLIVA and mPLUG-Owl2 with BSD 3-Clause License here.

Veagle

Install / Use

README