Protopdebug
Implementation of Concept-level Debugging of Part-Prototype Networks
Install / Use
/learn @abonte/ProtopdebugREADME
Concept-level Debugging of Part-Prototype Networks

Implementation of the paper
Bontempelli, A., Teso, S., Giunchiglia, F., & Passerini, A. (2023). Concept-level Debugging of Part-Prototype Networks.
Accepted for publication at ICLR 2023 (slides, Poster).
The code in this repository is an adaptation of the code in the following repositories:
| | Repository | License |
|----------------------------------------|--------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| ProtoPNet | https://github.com/cfchen-duke/ProtoPNet | See License |
| IAIA-BL loss | https://github.com/alinajadebarnett/iaiabl | See License |
| Covid data processing and data loaders | https://github.com/suinleelab/cxr_covid | See License |
Used datasets:
- CUB-200-2011: Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2022). CUB-200-2011 (1.0) [Data set]. CaltechDATA. https://doi.org/10.22002/D1.20098
- CUB-200-2011 Segmentations: Farrell, R. (2022). CUB-200-2011 Segmentations (1.0) [Data set]. CaltechDATA. https://doi.org/10.22002/D1.20097
- ChestX-ray14: Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, MohammadhadiBagheri, Ronald M. Summers.ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases, IEEE CVPR, pp. 3462-3471,2017
- GitHub-COVID: COVID-19 Image Data Collection: Prospective Predictions Are the Future Joseph Paul Cohen and Paul Morrison and Lan Dao and Karsten Roth and Tim Q Duong and Marzyeh Ghassemi arXiv:2006.11988, https://github.com/ieee8023/covid-chestxray-dataset, 2020
- PadChest: Aurelia Bustos, Antonio Pertusa, Jose-Maria Salinas, and Maria de la Iglesia-Vayá. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis, 66:101797, 2020. ISSN 1361-8415. doi: https://doi.org/10.1016/j.media.2020.101797. URL https://www.sciencedirect.com/science/article/pii/S1361841520301614.
- bimcv+: Maria de la Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco García-García, Marisa Caparrós, Germán González, and Jose María Salinas. Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients. 2020. doi: 10.48550/ARXIV.2006.01174. URL https://arxiv.org/abs/2006.01174.
Requirements
- Python 3.9.7
- Pytorch 1.11.0
- cudatoolkit=11.3
Installation
Clone the repository
git clone https://github.com/abonte/protopdebug ProtoPDebug
cd ProtoPNet
Create a new environment with Conda
conda env create -f environment.yml
conda activate protopnet
Repository structure
.
├── conf // experiment configuration used by Hydra
├── data // data processing scripts
├── datasets // COVID and CUB200 datasets are preprocessed in this folder
├── extract_confound.py // give supervision about the prototypes
├── global_analysis.py // find the nearest patch to each prototype, compute activation precision
├── local_analysis.py // find the nearest prototypes to all test images
├── old_code // code of the original repository, but not used in this
├── plot_stat.py // plot statistics about an experiment
├── pretrained_models // pre-trained model downloaded during the training
├── saved_models // results of the experiments
├── settings.py // default values of the models and datasets parameters
├── tests // test suite
├── user_experiment // data about the experiment with real users
└── visualization // script for plotting
Data preparation
CUB-200
- Download the dataset CUB-200-2011 from https://data.caltech.edu/records/20098 (1.1 GB).
- Extract
CUB_200_2011.tgzin thedatasetsfolder. - Download the image masks from https://data.caltech.edu/records/20097
- Extract it in
datasets/CUB-200-2011
CUB-200 artificial confound
Run the following command to add the synthetic confound to the first five classes.
python data_preprocessing.py --seed 0 bird -i datasets/CUB_200_2011 --classes 001 002 003 004 005
The data pipeline performs the following operations:
- Crop the images using information from
bounding_boxes.txt(included in the dataset) - Split the cropped images into training and test sets, using
train_test_split.txt(included in the dataset) - Put the cropped training images in the directory
./datasets/cub200_cropped/train_cropped/ - Put the cropped test images in the directory
./datasets/cub200_cropped/test_cropped/ - Augment the training set and create an augmented training set in
./datasets/cub200_cropped/train_cropped_augmented/ - Add artificial confound (colored box) on three classes
CUB-200 natural confound
python data_preprocessing.py cub200_shuffled_bg
./copy_top_20.sh
The scripts, in addition to the operations from 1 to 5 in the previous section,
change the backgrounds of the test set images by shifting the background of one class to
the next one (e.g., the background of class 1 becomes the background of class 2).
A the end of the process, the modified images are placed in the directory
./datasets/cub200_cropped/test_cropped_shuffled.
At the end of the data preparation, you should have the following structure
in the dataset folder
./datasets/
cub200_cropped/
clean_5_classes/
...
clean_top_20/
...
confound_artificial/
...
COVID datasets
Data processing of the COVID data set is based on the code of the paper
"AI for radiographic COVID-19 detection selects shortcuts over signal".
Follow the instruction in the README.md of the repository https://github.com/suinleelab/cxr_covid.
Use the scripts make_csv_bimcv_negative.py and make_h5.py of this repository
instead of the ones in the original repository.
Put the resulting *.h5 in the corresponding folders in the datasets/covid folder.
Only a subset of the data has been used, see the following list of which parts have been downloaded:
- ChestX-ray14:
images_001.tar.gz,images_002.tar.gz - GitHub-COVID: the complete repository
- PadChest:
0.zip - BIMCV-COVID+:
bimcv_covid19_posi_subjects_<1-10>.tgz
./datasets/
covid/
ChestX-ray14/
...
GitHub-COVID/
...
PadChest/
...
bimcv+/
...
Running
Training
Configurations are managed by Hydra and a basic tutorial
can be found here.
The structure of the configuration is in settings.py, which contains the default values.
The actual hyperparameters used for each experiment are in the conf folder.
Run main.py passing the following arguments:
experiment_name=<<NAME_OF_THE_EXPERIMENT>>choose an experiment name+experiment=natural_baseload a configuration file from theconfdirectory
Example:
python main.py experiment_name=firstExperiment +experiment=natural_base
You can override the loaded configuration values from the command line. For instance,
python main.py experiment_name=firstExperiment +experiment=natural_base epochs=30
To see the current configuration
python main.py --cfg job
The experiments can be tracked on Weights and Biases. To enable this
feature, add wandb=true to the command line when running the script.
Evaluation
./plot.sh <PATH-TO-MODEL> "<LIST-OF-CLASSES>"
Substitute PATH-TO-MODEL with the path to the model you want to analyze.
Specify the list of (0-based) index of the classes to use for the evaluation
(Cub200: "0 8 14 6 15", Covid dataset: "0 1").
The script performs the following operations:
- plot the statistics of the experiment;
- plot the prototypes projected at the same epoch of the supplied model;
- find the nearest patches to each prototype
- plot a grid of the nearest patches
Reproducibility
Experiment 1
./run_artificial_confound.sh
Experiment 2
ProtoPDebug
Since human intervention is required in the debugging loop, follow these steps to run ProtoPDebug:
-
First round, without any human supervision
python main.py experiment_name=\"first_round\" +experiment=natural_base -
Give supervision to the learned prototypes.
a) find the nearest patches to each prototype, substitute
<PATH-TO-MODEL>with the path to the model of the previous roundpython global_analysis.py <PATH-TO-MODEL>b) manually select the patches that represent the confounds you want to forbid
python extract_confound.py interactive <PATH-TO-MODEL> -classes 0 8 14 6 15 -n-img 10Only the
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
