Valda
A Python Data Valuation Package
Install / Use
/learn @uvanlp/ValdaREADME
Valda
Introduction
Valda is a Python package for data valuation in machine learning. If you are interested in
- analyzing the contribution of individual training examples to the final classification performance, or
- identifying some noisy examples in the training set,
you may be interested in the functions provided by this package.
The current version supports five different data valuation methods. It supports all the classifiers from Sklearn for valuation, and also user-defined classifier using PyTorch.
- Leave-one-out (LOO),
- Data Shapley with the TMC algorithm (TMC-Shapley) from Ghorbani and Zou (2019),
- Beta Shapley from Kwon and Zou (2022)
- Class-wise Shapley (CS-Shapley) from Schoch et al. (2022)
- Influence Function (IF) from Koh and Liang (2017)
- IF only works with the classifiers built with PyTorch, because it requires gradient computation.
- The current version only support the first-order gradient computation, and we will add the second-order computation soon.
Tutorial
Please checkout a simple tutorial on Google Colab, for how to use this package.
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
