VLFeedback
No description available
Install / Use
/learn @vlf-silkie/VLFeedbackREADME
VLFeedback
A GPT-4V annotated preference dataset for large vision language models.
[Project Page] [Datasets] [Silkie Model] [Paper]
Annotation Framework
<img src="imgs/annotate_framework.png" width="800px">Multimodal Instruciton Source
The instructions are sampled from various domains to cover different capabilities of LVLMs
<img src="imgs/instruction_source.png" width="800px">Model Pool
We construct a model pool consists of 12 LVLMs, including
- GPT-4V
- LLaVA-series
- LLaVA-v1.5-7B
- LLaVA-v1.5-13B
- LLaVA-RLHF-7b-v1.5-224
- LLaVA-RLHF-13b-v1.5-336
- Qwen-VL-7B
- IDEFICS-9b-Instruct
- Fuyu-8B
- InstructBLIP-serise
- InstructBLIP-Vicuna-7B
- InstructBLIP-Vicuna-13B
- VisualGLM-6B
- MMICL-Vicuna-13B
Silkie
We select Qwen-VL-Chat as the backbone model and perform DPO on our dataset.
<div align="center"> <img src="imgs/silkie.png" alt="Silkie Logo" width="128px"> <p>Generated by <a href="https://openai.com/dall-e-3">DALL·E 3</a></p> </div>The resulting model, Silkie, achieves comprehensive improvements on various benchmarks
<img src="imgs/silkie_ret.png" width="800px">Installation
To run our training scripts, create a virtual environment and install the dependencies first.
conda create -n silkie python=3.10 && conda activate silkie
pip install -r requirements.txt
Training
Our training scripts support both single-node and multi-node training.
We provide a launch_dpo.py script that handles both cases. If you want to launch a job locally, you can use:
python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR
If you want to launch a job on a Slurm cluster, specify GPUS_PER_NODE in launch_dpo.py and run:
python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR --gpus $NUM_GPUS
Citations
@article{2023vlfeedback,
author = {Lei Li and Zhihui Xie and Mukai Li and Shunian Chen and Peiyi Wang and Liang Chen and Yazheng Yang and Benyou Wang and Lingpeng Kong},
title = {Silkie: Preference Distillation for Large Visual Language Models},
publisher = {arXiv:2312.10665},
year = {2023}
}
Acknowledgements
We would like to thank the authors of trl and Qwen-VL for their great work.
Related Skills
node-connect
338.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.6kCommit, push, and open a PR
