OKDDip
[AAAI-2020] Official implementation for "Online Knowledge Distillation with Diverse Peers".
Install / Use
/learn @DefangChen/OKDDipREADME
OKDDip
Online Knowledge Distillation with Diverse Peers (AAAI-2020) https://arxiv.org/abs/1912.00350
This is a PyTorch-1.0 implementation of the OKDDip algorithm together with the compared approaches (such as classic KD and online variants like ONE, CL-ILR, DML).
This paper attempts to alleviate homogenization problem during training of student models. Specifically, OKDDip performs two-level distillation during training with multiple auxiliary peers and one group leaders. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference.
pip install -r requirements.txt
Basic Usage
The default experimental parameter setting is :
--num_epochs 300 --batch_size 128 --lr 0.1 --schedule 150 225 --wd 5e-4
1. Baseline
Train resnet32 model on CIFAR10 dataset.
python train.py --model resnet32 --dataset CIFAR10
2. Classic KD
Train student resnet32 model with teacher resnet110 model on CIFAR10 dataset.
python train_kd.py --model resnet32 --T_model resnet110 --T_model_path ./CIFAR10/resnet110 --dataset CIFAR10
3. OKDDip
Train resnet32 model on CIFAR10 dataset.
python train_GL.py --model resnet32 --dataset CIFAR10
4. ONE
Train resnet32 model on CIFAR10 dataset.
python train_one.py --model resnet32 --dataset CIFAR10
5. CL-ILR
Train resnet32 model on CIFAR10 dataset.
python train_one.py --model resnet32 --dataset CIFAR10 --avg --bpscale
6. Individual Branch
Train resnet32 model on CIFAR10 dataset.
python train_one.py --model resnet32 --dataset CIFAR10 --ind
Notes: The codes in this repository is merged from different sources, and we have not tested them thoroughly. Hence, if you have any questions, please contact us without hesitation.
The results may slightly vary as the environment changed, just run it again! (Thanks the feedback from Zheng Li)
For reproducing OKDDip in Table 4, besides setting choose_E equal to True, we need to replace group leader by auxiliary peer to make the number of base learner equal.
Email: defchern α t zju dot edu d ο t cn
Reference
If you find this repository useful, please consider citing the following paper:
@inproceedings{chen2020online,
title={Online Knowledge Distillation with Diverse Peers.},
author={Chen, Defang and Mei, Jian-Ping and Wang, Can and Feng, Yan and Chen, Chun},
booktitle={AAAI},
pages={3430--3437},
year={2020}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
