SkillAgentSearch skills...

DepictQA

DepictQA: Depicted Image Quality Assessment with Vision Language Models

Install / Use

/learn @XPixelGroup/DepictQA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DepictQA: Depicted Image Quality Assessment with Vision Language Models

<p align="center"> <img src="docs/logo.png" width="600"> </p> <p align="center"> <font size='4'> <a href="https://depictqa.github.io/" target="_blank">🌏 Project Page</a> • 📀 Datasets ( <a href="https://huggingface.co/datasets/zhiyuanyou/DataDepictQA" target="_blank">huggingface</a> / <a href="https://modelscope.cn/datasets/zhiyuanyou/DataDepictQA" target="_blank">modelscope</a> ) </font> </p>

Official pytorch implementation of the papers:

  • DepictQA-Wild (DepictQA-v2), also named Enhanced DepictQA (EDQA): paper, project page.

    Zhiyuan You, Jinjin Gu, Xin Cai, Zheyuan Li, Kaiwen Zhu, Chao Dong, Tianfan Xue, "Enhancing Descriptive Image Quality Assessment with A Large-scale Multi-modal Dataset," TIP, 2025.

  • DepictQA-v1: paper, project page.

    Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, Chao Dong, "Depicting beyond scores: Advancing image quality assessment through multi-modal language models," ECCV, 2024.

<p align="center"> <img src="docs/res.png"> </p>

Update

📆 [2025.11] DepictQA-Wild (DepictQA-v2), also named Enhanced DepictQA (EDQA), was accepted to TIP.

📆 [2025.02] DeQA-Score was accepted to CVPR 2025.

📆 [2025.01] We released DeQA-Score, a distribution-based depicted image quality assessment model for score regression. Datasets, codes, and model weights (full tuning / LoRA tuning) were available.

📆 [2024.07] DepictQA datasets were released in <a href="https://huggingface.co/datasets/zhiyuanyou/DataDepictQA" target="_blank">huggingface</a> / <a href="https://modelscope.cn/datasets/zhiyuanyou/DataDepictQA" target="_blank">modelscope</a>.

📆 [2024.07] DepictQA-v1 was accepted to ECCV 2024.

📆 [2024.05] We released DepictQA-Wild (DepictQA-v2): a multi-functional in-the-wild descriptive image quality assessment model.

📆 [2023.12] We released DepictQA-v1, a multi-modal image quality assessment model based on vision language models.

Installation

  • Create environment.

    # clone this repo
    git clone https://github.com/XPixelGroup/DepictQA.git
    cd DepictQA
    
    # create environment
    conda create -n depictqa python=3.10
    conda activate depictqa
    pip install -r requirements.txt
    
  • Download pretrained models.

    • CLIP-ViT-L-14. Required.
    • Vicuna-v1.5-7B. Required.
    • All-MiniLM-L6-v2. Required only for confidence estimation of detailed reasoning responses.
    • Our pretrained delta checkpoint (see Models). Optional for training. Required for demo and inference.
  • Ensure that all downloaded models are placed in the designated directories as follows.

    |-- DepictQA
    |-- ModelZoo
        |-- CLIP
            |-- clip
                |-- ViT-L-14.pt
        |-- LLM
            |-- vicuna
                |-- vicuna-7b-v1.5
        |-- SentenceTransformers
            |-- all-MiniLM-L6-v2
    

    If models are stored in different directories, revise config.model.vision_encoder_path, config.model.llm_path, and config.model.sentence_model in config.yaml (under the experiments directory) to set new paths.

  • Move our pretrained delta checkpoint to a specific experiment directory (e.g., DQ495K, DQ495K_QPath) as follows.

    |-- DepictQA
        |-- experiments
            |-- a_specific_experiment_directory
                |-- ckpt
                    |-- ckpt.pt
    

    If the delta checkpoint is stored in another directory, revise config.model.delta_path in config.yaml (under the experiments directory) to set new path.

Models

| Training Data | Tune | Hugging Face | Description | | -------- | -------- | -------- | -------- | | DQ-495K + KonIQ + SPAQ | Abstractor, LORA | download | Vision abstractor to reduce token numbers. Trained on DQ-495K, KonIQ, and SPAQ datasets. Able to handle images with resolution larger than 1000+, and able to compare images with different contents. | | DQ-495K + Q-Instruct | Projector, LORA, | download | Trained on DQ-495K and Q-Instruct (see paper) datasets. Able to complete multiple-choice, yes-or-no, what, how questions, but degrades in assessing and comparison tasks. | | DQ-495K + Q-Pathway | Projector, LORA | download | Trained on DQ-495K and Q-Pathway (see paper) datasets. Performs well on real images, but degrades in comparison tasks. | | DQ-495K | Projector, LORA | download | Trained on DQ-495K dataset. Used in our paper. |

Demos

<p align="center"> <img src="docs/demo.png"> </p>

Online Demo

We provide an online demo (coming soon) deployed on huggingface spaces.

Gradio Demo

We provide a gradio demo for local test.

  • cd a specific experiment directory: cd experiments/a_specific_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

  • Launch controller: sh launch_controller.sh

  • Launch gradio server: sh launch_gradio.sh

  • Launch DepictQA worker: sh launch_worker.sh id_of_one_gpu

You can revise the server config in serve.yaml. The url of deployed demo will be http://{serve.gradio.host}:{serve.gradio.port}. The default url is http://0.0.0.0:12345 if you do not revise serve.yaml.

Note that multiple workers can be launched simultaneously. For each worker, serve.worker.host, serve.worker.port, serve.worker.worker_url, and serve.worker.model_name should be unique.

Datasets

  • Source codes for DQ-495K (used in DepictQA-v2) dataset construction are provided in here.

  • Download MBAPPS (used in DepictQA-v1) and DQ-495K (used in DepictQA-v2) datasets from <a href="https://huggingface.co/datasets/zhiyuanyou/DataDepictQA" target="_blank">huggingface</a> / <a href="https://modelscope.cn/datasets/zhiyuanyou/DataDepictQA" target="_blank">modelscope</a>. Move the dataset to the same directory of this repository as follows.

    |-- DataDepictQA
    |-- DepictQA
    

    If the dataset is stored in another directory, revise config.data.root_dir in config.yaml (under the experiments directory) to set new path.

Training

  • cd a specific experiment directory: cd experiments/a_specific_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14 and Vicuna-v1.5-7B are downloaded and (3) their paths are set in config.yaml.

  • Run training: sh train.sh ids_of_gpus.

Inference

Inference on Our Benchmark

  • cd a specific experiment directory: cd experiments/a_specific_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

  • Run a specific inference script (e.g., infer_A_brief.sh): sh infer_A_brief.sh id_of_one_gpu.

Inference on Custom Dataset

  • Construct *.json file for your dataset as follows.

    [
        {
            "id": unique id of each sample, required, 
            "image_ref": reference image, null if not applicable, 
            "image_A": image A, null if not applicable, 
            "image_B": image B, null if not applicable, 
            "query": input question, required, 
        }, 
        ...
    ]
    
  • cd your experiment directory: cd your_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

  • Construct your inference script as follows.

    #!/bin/bash
    src_dir=directory_of_src
    export PYTHONPATH=$src_dir:$PYTHONPATH
    export CUDA_VISIBLE_DEVICES=$1
    
    python $src_dir/infer.py \
        --meta_path json_path_1_of_your_dataset \
                    json_path_2_of_your_dataset \
        --dataset_name your_dataset_name_1 \
                       your_dataset_name_2 \
        --task_name task_name \
        --batch_size batch_size \
    

    --task_name can be set as follows.

    | Task Name | Description | | -------- | -------- | | quality_compare | AB comparison in full-reference | | quality_compare_noref | AB comparison in non-reference | | quality_single_A | Image A assessment in full-reference | | quality_single_A_noref | Image A assessment in non-reference | | quality_single_B | Image B assessment in full-reference | | quality_single_B_noref

View on GitHub
GitHub Stars206
CategoryDevelopment
Updated4d ago
Forks7

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings