AnomalyGPT

[AAAI 2024 Oral] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Generate Convert Improve

Install / Use

/learn @CASIA-LMC-Lab/AnomalyGPT

About this skill

Quality Score

0/100

README

AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

🌐 <a href="https://anomalygpt.github.io" target="_blank">Project Page</a> • 🤗 <a href="https://huggingface.co/spaces/FantasticGNU/AnomalyGPT" target="_blank">Online Demo</a> • 📃 <a href="https://arxiv.org/abs/2308.15366" target="_blank">Paper</a> • 🤖 <a href="https://huggingface.co/FantasticGNU/AnomalyGPT" target="_blank">Model</a> • 📹 <a href="https://www.youtube.com/watch?v=lcxBfy0YnNA" target="_blank">Video</a>

Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang

Catalogue:

<a href='#introduction'>1. Introduction</a>
<a href='#environment'>2. Running AnomalyGPT Demo</a>
- <a href='#install_environment'>2.1 Environment Installation</a>
- <a href='#download_imagebind_model'>2.2 Prepare ImageBind Checkpoint</a>
- <a href='#download_vicuna_model'>2.3 Prepare Vicuna Checkpoint</a>
- <a href='#download_anomalygpt'>2.4 Prepare Delta Weights of AnomalyGPT</a>
- <a href='#running_demo'>2.5 Deploying Demo</a>
<a href='#train_anomalygpt'>3. Train Your Own AnomalyGPT</a>
- <a href='#data_preparation'>3.1 Data Preparation</a>
- <a href='#training_configurations'>3.2 Training Configurations</a>
- <a href='#model_training'>3.3 Training AnoamlyGPT</a>
<a href='#examples'>4. Examples</a>

<a href='#license'>License</a>
<a href='#citation'>Citation</a>
<a href='#acknowledgments'>Acknowledgments</a>

1. Introduction: <a href='#all_catelogue'>[Back to Top]</a>

AnomalyGPT is the first Large Vision-Language Model (LVLM) based Industrial Anomaly Detection (IAD) method that can detect anomalies in industrial images without the need for manually specified thresholds. Existing IAD methods can only provide anomaly scores and need manually threshold setting, while existing LVLMs cannot detect anomalies in the image. AnomalyGPT can not only indicate the presence and location of anomaly but also provide information about the image.

We leverage a pre-trained image encoder and a Large Language Model (LLM) to align IAD images and their corresponding textual descriptions via simulated anomaly data. We employ a lightweight, visual-textual feature-matching-based image decoder to obtain localization result, and design a prompt learner to provide fine-grained semantic to LLM and fine-tune the LVLM using prompt embeddings. Our method can also detect anomalies for previously unseen items with few normal sample provided.

2. Running AnomalyGPT Demo <a href='#all_catelogue'>[Back to Top]</a>

2.1 Environment Installation

Clone the repository locally:

git clone https://github.com/CASIA-IVA-Lab/AnomalyGPT.git

Install the required packages:

pip install -r requirements.txt

2.2 Prepare ImageBind Checkpoint:

You can download the pre-trained ImageBind model using this link. After downloading, put the downloaded file (imagebind_huge.pth) in [./pretrained_ckpt/imagebind_ckpt/] directory.

2.3 Prepare Vicuna Checkpoint:

To prepare the pre-trained Vicuna model, please follow the instructions provided [here].

2.4 Prepare Delta Weights of AnomalyGPT:

We use the pre-trained parameters from PandaGPT to initialize our model. You can get the weights of PandaGPT trained with different strategies in the table below. In our experiments and online demo, we use the Vicuna-7B and openllmplayground/pandagpt_7b_max_len_1024 due to the limitation of computation resource. Better results are expected if switching to Vicuna-13B.

Please put the downloaded 7B/13B delta weights file (pytorch_model.pt) in the ./pretrained_ckpt/pandagpt_ckpt/7b/ or ./pretrained_ckpt/pandagpt_ckpt/13b/ directory.

After that, you can download AnomalyGPT weights from the table below.

After downloading, put the AnomalyGPT weights in the ./code/ckpt/ directory.

In our online demo, we use the supervised setting as our default model to attain an enhanced user experience. You can also try other weights locally.

2.5. Deploying Demo

Upon completion of previous steps, you can run the demo locally as

cd ./code/
python web_demo.py

3. Train Your Own AnomalyGPT <a href='#all_catelogue'>[Back to Top]</a>

Prerequisites: Before training the model, making sure the environment is properly installed and the checkpoints of ImageBind, Vicuna and PandaGPT are downloaded.

3.1 Data Preparation:

You can download MVTec-AD dataset from [this link] and VisA from [this link]. You can also download pre-training data of PandaGPT from [here]. After downloading, put the data in the [./data] directory.

The directory of [./data] should look like:

data
|---pandagpt4_visual_instruction_data.json
|---images
|-----|-- ...
|---mvtec_anomaly_detection
|-----|-- bottle
|-----|-----|----- ground_truth
|-----|-----|----- test
|-----|-----|----- train
|-----|-- capsule
|-----|-- ...
|----VisA
|-----|-- split_csv
|-----|-----|--- 1cls.csv
|-----|-----|--- ...
|-----|-- candle
|-----|-----|--- Data
|-----|-----|-----|----- Images
|-----|-----|-----|--------|------ Anomaly 
|-----|-----|-----|--------|------ Normal 
|-----|-----|-----|----- Masks
|-----|-----|-----|--------|------ Anomaly 
|-----|-----|--- image_anno.csv
|-----|-- capsules
|-----|-----|----- ...

3.2 Training Configurations

The table below show the training hyperparameters used in our experiments. The hyperparameters are selected based on the constrain of our computational resources, i.e. 2 x RTX3090 GPUs.

| Base Language Model | Epoch Number | Batch Size | Learning Rate | Maximum Length | | :---------------------: | :--------------: | :------------: | :---------------: | :----------------: | | Vicuna-7B | 50 | 16 | 1e-3 | 1024 |

3.3 Training AnomalyGPT

To train AnomalyGPT on MVTec-AD dataset, please run the following commands:

cd ./code
bash ./scripts/train_mvtec.sh

The key arguments of the training script are as follows:

--data_path: The data path for the json file pandagpt4_visual_instruction_data.json.
--image_root_path: The root path for training images of PandaGPT.
--imagebind_ckpt_path: The path of ImageBind checkpoint.
--vicuna_ckpt_path: The directory that saves the pre-trained Vicuna checkpoints.
--max_tgt_len: The maximum sequence length of training instances.
--save_path: The directory which saves the trained delta weights. This directory will be automatically created.

Related Skills

node-connect

340.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

340.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.1k

Commit, push, and open a PR