LabelCritic

[ISBI 2025] Design Data Before Models: Using large vision-language models to automatically enhance medical dataset annotations.

Generate Convert Improve

Install / Use

/learn @MrGiovanni/LabelCritic

About this skill

Quality Score

0/100

README

Label Critic is an automated tool for reviewing AI-generated labels. It helps users select better annotations when multiple label options exist, and identify potentially incorrect labels when only a single annotation is available.

Label Critic uses pre-trained Large Vision-Language Models (LVLMs) as label critics, comparing or assessing annotations without training new models. In medical CT organ segmentation, it achieves 96.5% accuracy in selecting higher-quality annotations per scan and class.

Paper

Label Critic: Design Data Before Models Pedro R. A. S. Bassi, Qilong Wu, Wenxuan Li, Sergio Decherchi, Andrea Cavalli, Alan Yuille, Zongwei Zhou International Symposium on Biomedical Imaging (ISBI, 2025) Read More

📄 View the ISBI Poster

Getting Started

Installation

We recommend using Anaconda on Linux.

git clone https://github.com/PedroRASB/AnnotationVLM
cd AnnotationVLM
conda create -n vllm python=3.12 -y
conda activate vllm
conda install -y ipykernel
conda install -y pip
pip install vllm==0.6.1.post2
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
pip install -r requirements.txt
mkdir HFCache

Deploy Vision–Language Model Backend

export NCCL_P2P_DISABLE=1

TRANSFORMERS_CACHE=./HFCache \
HF_HOME=./HFCache \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve "Qwen/Qwen2-VL-72B-Instruct-AWQ" \
  --dtype=half \
  --tensor-parallel-size 4 \
  --limit-mm-per-prompt image=3 \
  --gpu_memory_utilization 0.9 \
  --port 8000

We recommend using ≥ 4 A40 GPUs (48GB VRAM each) for stable deployment. An estimate of 144GB VRAM (GPU memory) is required for deployment. You can try different VL models, e.g.: Qwen/Qwen2-VL-2B-Instruct-AWQ.

Compare Two Annotations

<details> <summary>Dataset format (click to expand)</summary> <div style="margin-left: 25px;">

Dataset
├── BDMAP_A0000001
|    ├── ct.nii.gz
│    └── predictions1
│          ├── liver_tumor.nii.gz
│          ├── kidney_tumor.nii.gz
│          ├── pancreas_tumor.nii.gz
│          ├── aorta.nii.gz
│          ├── gall_bladder.nii.gz
│          ├── kidney_left.nii.gz
│          ├── kidney_right.nii.gz
│          ├── liver.nii.gz
│          ├── pancreas.nii.gz
│          └──...
│    └── predictions2
│          ├── liver_tumor.nii.gz
│          ├── kidney_tumor.nii.gz
│          ├── pancreas_tumor.nii.gz
│          ├── aorta.nii.gz
│          ├── gall_bladder.nii.gz
│          ├── kidney_left.nii.gz
│          ├── kidney_right.nii.gz
│          ├── liver.nii.gz
│          ├── pancreas.nii.gz
│          └──...
...

</div> </details>

Compare two individual labels:

python CompareOrgan.py \
  --ct Dataset/BDMAP_A0000001/ct.nii.gz \
  --mask1 Dataset/BDMAP_A0000001/predictions1 \
  --mask2 Dataset/BDMAP_A0000001/predictions2 \
  --organ pancreas \
  --port 8000 \
  --log_file ./comparison_summary.log \
  --base_url "http://vllm_server_host"

This command compares the pancreas segmentation between two prediction folders (predictions1 and predictions2) for a single CT case.

Citation

@misc{bassi2024labelcriticdesigndata,
      title={Label Critic: Design Data Before Models}, 
      author={Pedro R. A. S. Bassi and Qilong Wu and Wenxuan Li and Sergio Decherchi and Andrea Cavalli and Alan Yuille and Zongwei Zhou},
      year={2024},
      eprint={2411.02753},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.02753}, 
}

Related Skills

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

2.1k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

openpencil

2.1k

HappyColorBlend

HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to

MrGiovanni

View profile

View on GitHub

GitHub Stars35

CategoryDesign

Updated2mo ago

Forks2

MrGiovanni/LabelCritic

Languages

Python

Security Score

80/100

Audited on Jan 29, 2026

No findings