Libcll
Complementary-label learning in Pytorch
Install / Use
/learn @ntucllab/LibcllREADME
libcll: Complementary Label Learning Benchmark
<img src="docs/libcll-cover.png" alt="libcll" style="zoom:25%;" />libcll is a Python library designed to simplify complementary-label learning (CLL) for researchers tackling real-world challenges. The package implements a wide range of popular CLL strategies, including CPE, the state-of-the-art algorithm as of 2023. Additionally, it includes unique datasets like CLImage and ACLImage, which feature complementary labels collected from human annotators and VLM annotators. To foster extensibility, libcll provides a unified interface for integrating additional strategies, datasets, and models, making it a versatile tool for advancing CLL research. For more details, refer to the associated technical report on arXiv.
Installation
- Python version >= 3.8, <= 3.12
- Pytorch version >= 1.11, <= 2.0
- Pytorch Lightning version >= 2.0
- To install
libclland develop locally:
git clone git@github.com:ntucllab/libcll.git
cd libcll
pip install -e .
Running
Supported Strategies
| Strategies | Type | Description | | ---------------------------------------------------------- | ---------------- | ------------------------------------------------------------ | | PC | None | Pairwise-Comparison Loss | | SCL | NL, EXP | Surrogate Complementary Loss with the negative log loss (NL) or with the exponential loss (EXP) | | URE | NN, GA, TNN, TGA | Unbiased Risk Estimator whether with gradient ascent (GA) or empirical transition matrix (T) | | FWD | None | Forward Correction | | DM | None | Discriminative Models with Weighted Loss | | CPE | I, F, T | Complementary Probability Estimates with different transition matrices (I, F, T) | | MCL | MAE, EXP, LOG | Multiple Complementary Label learning with different errors (MAE, EXP, LOG) | | OP | None | Order-Preserving Loss | | SCARCE | None | Selected-Completely-At-Random Complementary-label learning |
Supported Datasets
| Dataset | Number of Classes | Input Size | Description | | ----------- | --------------- | ----------- | ------------------------------------------------------------ | | MNIST | 10 | 28 x 28 | Grayscale images of handwritten digits (0 to 9). | | FMNIST | 10 | 28 x 28 | Grayscale images of fashion items. | | KMNIST | 10 | 28 x 28 | Grayscale images of cursive Japanese (“Kuzushiji”) characters. | | Yeast | 10 | 8 | Features of different localization sites of protein. | | Texture | 11 | 40 | Features of different textures. | | Dermatology | 6 | 130 | Clinical Attributes of different diseases. | | Control | 6 | 60 | Features of synthetically generated control charts. | | CIFAR10 | 10 | 3 x 32 x 32 | Colored images of different objects. | | CIFAR20 | 20 | 3 x 32 x 32 | Colored images of different objects. | | Micro ImageNet10 | 10 | 3 x 64 x 64 | Contains images of 10 classes designed for computer vision research. | | Micro ImageNet20 | 20 | 3 x 64 x 64 | Contains images of 20 classes designed for computer vision research. | | CLCIFAR10 | 10 | 3 x 32 x 32 | Colored images of distinct objects paired with complementary labels annotated by humans. | | CLCIFAR20 | 20 | 3 x 32 x 32 | Colored images of distinct objects paired with complementary labels annotated by humans. | | CLMicro ImageNet10 | 10 | 3 x 64 x 64 | Contains images of 10 classes designed for computer vision research paired with complementary labels annotated by humans. | | CLMicro ImageNet20 | 20 | 3 x 64 x 64 | Contains images of 20 classes designed for computer vision research paired with complementary labels annotated by humans. | | ACLCIFAR10 | 10 | 3 x 32 x 32 | Colored images of distinct objects paired with complementary labels annotated by Visual-Language Models. | | ACLCIFAR20 | 20 | 3 x 32 x 32 | Colored images of distinct objects paired with complementary labels annotated by Visual-Language Models. | | ACLMicro ImageNet10 | 10 | 3 x 64 x 64 | Contains images of 10 classes designed for computer vision research paired with complementary labels annotated by Visual-Language Models. | | ACLMicro ImageNet20 | 20 | 3 x 64 x 64 | Contains images of 20 classes designed for computer vision research paired with complementary labels annotated by Visual-Language Models. |
Quick Start: Complementary Label Learning on MNIST
To reproduce training results with the SCL-NL method on MNIST for each distribution:
Uniform Distribution
python scripts/train.py \
--do_train \
--do_predict \
--strategy SCL \
--type NL \
--model MLP \
--dataset MNIST \
--lr 1e-4 \
--batch_size 256 \
--valid_type Accuracy \
Biased Distribution (Weak Deviation)
python scripts/train.py \
--do_train \
--do_predict \
--strategy SCL \
--type NL \
--model MLP \
--dataset MNIST \
--lr 1e-4 \
--batch_size 256 \
--valid_type Accuracy \
--transition_matrix weak
Biased Distribution (Strong Deviation)
python scripts/train.py \
--do_train \
--do_predict \
--strategy SCL \
--type NL \
--model MLP \
--dataset MNIST \
--lr 1e-4 \
--batch_size 256 \
--valid_type Accuracy \
--transition_matrix strong
Noisy Distribution
python scripts/train.py \
--do_train \
--do_predict \
--strategy SCL \
--type NL \
--model MLP \
--dataset MNIST \
--lr 1e-4 \
--batch_size 256 \
--valid_type Accuracy \
--transition_matrix noisy
--noise 0.1
Multiple Complementary Label Learning
python scripts/train.py \
--do_train \
--do_predict \
--strategy SCL \
--type NL \
--model MLP \
--dataset MNIST \
--lr 1e-4 \
--batch_size 256 \
--valid_type Accuracy \
--num_cl 3
Run all the settings in the survey paper
The following scripts reproduce the results for one strategy presented in the survey paper. They include a grid search over learning rates from {1e-3, 5e-4, 1e-4, 5e-5, 1e-5}, followed by training with the best learning rate using four different random seeds.
./scripts/uniform.sh <strategy> <type>
./scripts/biased.sh <strategy> <type>
./scripts/noisy.sh <strategy> <type>
./scripts/multi.sh <strategy> <type>
./scripts/multi_hard.sh <strategy> <type>
For example:
./scripts/uniform.sh SCL NL
./scripts/biased.sh SCL NL
./scripts/noisy.sh SCL NL
./scripts/multi.sh SCL NL
./scripts/multi_hard.sh SCL NL
Documentation
The documentation for the latest release is available on readthedocs. Feedback, questions, and suggestions are highly encouraged. Contributions to improve the documentation are warmly welcomed and greatly appreciated!
Citing
If you find this package useful, please cite both the original works associated with each strategy and the following:
@techreport{libcll2024,
author = {Nai-Xuan Ye and Tan-Ha Mai and Hsiu-Hsuan Wang and Wei-I Lin and Hsuan-Tien Lin},
title = {libcll: an Extendable Python Toolkit for Complementary-Label Learning},
institution = {National Taiwan University},
url = {https://github.com/ntucllab/libcll},
note = {available as arXiv preprint \url{https://arxiv.org/abs/2411.12276}},
month = nov,
year = 2024
}
Acknowledgment
We would like to express our gratitude to the following repositories for sharing their code, which greatly facilitated the development of libcll:
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
