<picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/dilyabareeva/quanda/refs/heads/main/assets/readme/quanda_panda_black_bg.png"> <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/dilyabareeva/quanda/refs/heads/main/assets/readme/quanda_panda_no_bg.png"> <img width="400" alt="quanda" src="https://raw.githubusercontent.com/dilyabareeva/quanda/refs/heads/main/assets/readme/quanda_panda_day_n_night.png"> </picture> Interpretability toolkit for quantitative evaluation of data attribution methods in PyTorch.

py_versions PyPI - Version PyPI - License

quanda is currently under active development. Note the release version to ensure reproducibility of your work. Expect changes to API.

📑 Shortcut to paper!

🐼 Library overview

Training data attribution (TDA) methods attribute model output on a specific test sample to the training dataset that it was trained on. They reveal the training datapoints responsible for the model's decisions. Existing methods achieve this by estimating the counterfactual effect of removing datapoints from the training set (Koh and Liang, 2017; Park et al., 2023; Bae et al., 2024) tracking the contributions of training points to the loss reduction throughout training (Pruthi et al., 2020), using interpretable surrogate models (Yeh et al., 2018) or finding training samples that are deemed similar to the test sample by the model (Caruana et. al, 1999; Hanawa et. al, 2021). In addition to model understanding, TDA has been used in a variety of applications such as debugging model behavior (Koh and Liang, 2017; Yeh et al., 2018; K and Søgaard, 2021; Guo et al., 2021), data summarization (Khanna et al., 2019; Marion et al., 2023; Yang et al., 2023), dataset selection (Engstrom et al., 2024; Chhabra et al., 2024), fact tracing (Akyurek et al., 2022) and machine unlearning (Warnecke et al., 2023).

Although there are various demonstrations of TDA’s potential for interpretability and practical applications, the critical question of how TDA methods should be effectively evaluated remains open. Several approaches have been proposed by the community, which can be categorized into three groups:

<details> <summary><big>Ground Truth</big></summary>As some of the methods are designed to approximate LOO effects, ground truth can often be computed for TDA evaluation. However, this counterfactual ground truth approach requires retraining the model multiple times on different subsets of the training data, which quickly becomes computationally expensive. Additionally, this ground truth is shown to be dominated by noise in practical deep learning settings, due to the inherent stochasticity of a typical training process (<a href="https://openreview.net/forum?id=xHKVVHGDOEk" target="_blank">Basu et al., 2021</a>; <a href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/ca774047bc3b46cc81e53ead34cd5d5a-Abstract-Conference.html" target="_blank">Nguyen et al., 2023</a>). </details> <details> <summary><big>Downstream Task Evaluators</big></summary>To remedy the challenges associated with ground truth evaluation, the literature proposes to assess the utility of a TDA method within the context of an end-task, such as model debugging or data selection (<a href="https://proceedings.mlr.press/v70/koh17a.html" target="_blank">Koh and Liang, 2017</a>; <a href="https://proceedings.mlr.press/v89/khanna19a.html" target="_blank">Khanna et al., 2019</a>; <a href="https://arxiv.org/abs/2111.04683" target="_blank">Karthikeyan et al., 2021</a>). </details> <details> <summary><big>Heuristics</big></summary>Finally, the community also used heuristics (desirable properties or sanity checks) to evaluate the quality of TDA techniques. These include comparing the attributions of a trained model and a randomized model (<a href="https://openreview.net/forum?id=9uvhpyQwzM_" target="_blank">Hanawa et al., 2021</a>) and measuring the amount of overlap between the attributions for different test samples (<a href="http://proceedings.mlr.press/v108/barshan20a/barshan20a.pdf" target="_blank">Barshan et al., 2020</a>). </details> quanda is designed to meet the need of a comprehensive and systematic evaluation framework, allowing practitioners and researchers to obtain a detailed view of the performance of TDA methods in various contexts.

Library Features

Unified TDA Interface: quanda provides a unified interface for various TDA methods, allowing users to easily switch between different methods.
Metrics: quanda provides a set of metrics to evaluate the effectiveness of TDA methods. These metrics are based on the latest research in the field.
Benchmarking: quanda provides a benchmarking tool to evaluate the performance of TDA methods on a given model, dataset and problem. As many TDA evaluation methods require access to ground truth, our benchmarking tools allow to generate a controlled setting with ground truth, and then compare the performance of different TDA methods on this setting.

Supported TDA Methods

| Method Name | Repository | Reference | |----------------------------|------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| | Similarity Influence | Captum | Caruana et al., 1999 | | Arnoldi Influence Function | Captum | Schioppa et al., 2022; Koh and Liang, 2017 | | TracIn | Captum | Pruthi et al., 2020 | | TRAK | TRAK | Park et al., 2023 | | Representer Point Selection | Representer Point Selection | Yeh et al., 2018 |

Metrics

Linear Datamodeling Score (Park et al., 2023): Measures the correlation between the (grouped) attribution scores and the actual output of models trained on different subsets of the training set. For each subset, the linear datamodeling score compares the actual model output to the sum of attribution scores from the subset using Spearman rank correlation.
Identical Class / Identical Subclass (Hanawa et al., 2021): Measures the proportion of identical classes or subclasses in the top-1 training samples over the test dataset. If the attributions are based on similarity, they are expected to be predictive of the class of the test datapoint, as well as different subclasses under a single label.
Model Randomization (Hanawa et al., 2021): Measures the correlation between the original TDA and the TDA of a model with randomized weights. Since the attributions are expected to depend on model parameters, the correlation between original and randomized attributions should be low.
Top-K Cardinality (Barshan et al., 2020): Measures the cardinality of the union of the top-K training samples. Since the attributions are expected to be dependent on the test input, they are expected to vary heavily for dif

Quanda

Install / Use

README

🐼 Library overview

Library Features

Supported TDA Methods

Metrics