Dropout
Code release for "Dropout Reduces Underfitting"
Install / Use
/learn @facebookresearch/DropoutREADME
Dropout Reduces Underfitting
Official PyTorch implementation for Dropout Reduces Underfitting
<p align="center"> <img src="https://user-images.githubusercontent.com/8370623/222586143-3500fa5b-c294-48c9-a5cf-5fac2659e519.png" width=50% height=50% class="center"> </p>Dropout Reduces Underfitting, ICML 2023<br> Zhuang Liu*, Zhiqiu Xu*, Joseph Jin, Zhiqiang Shen, Trevor Darrell (* equal contribution) <br>Meta AI, UC Berkeley and MBZUAI<br>
Figure: We propose early dropout and late dropout. Early dropout helps underfitting models fit the data better and achieve lower training loss. Late dropout helps improve the generalization performance of overfitting models.
Results on ImageNet-1K
Model weights are released as links on results.
Early Dropout
results with basic recipe (s.d. = stochastic depth)
| model| ViT-T | Mixer-S | Swin-F | ConvNeXt-F | |:---|:---:|:---:|:---:|:---:| | no dropout | 73.9 | 71.0 | 74.3 | 76.1 | | standard dropout | 67.9 | 67.1 | 71.6 | - | | standard s.d. | 72.6 | 70.5 | 73.7 | 75.5 | | early dropout | 74.3 | 71.3 | 74.7 | - | | early s.d. | 74.4 | 71.7 | 75.2 | 76.3 |
results with improved recipe
| model | ViT-T | Swin-F | ConvNeXt-F | |:------------|:-----:|:------:|:----------:| | no dropout | 76.3 | 76.1 | 77.5 | | standard dropout | 71.5 | 73.5 | - | | standard s.d. | 75.6 | 75.6 | 77.4 | | early dropout | 76.7 | 76.6 | - | | early s.d. | 76.7 | 76.6 | 77.7 |
Late Dropout
results with basic recipe
| model | ViT-B | Mixer-B | |:------------:|:-----:|:-------:| | standard s.d. | 81.6 | 78.0 | | late s.d. | 82.3 | 78.6 |
Installation
Please check INSTALL.md for installation instructions.
Training
Basic Recipe
We list commands for early dropout, early stochastic depth on ViT-T and late stochastic depth on ViT-B.
- For training other models, change
--modelaccordingly, e.g., tovit_tiny,mixer_s32,convnext_femto,mixer_b16,vit_base. - Our results were produced with 4 nodes, each with 8 gpus. Below we give example commands on both multi-node and single-machine setups.
Early dropout
multi-node
python run_with_submitit.py --nodes 4 --ngpus 8 \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 1 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
single-machine
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Early stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Late stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_base --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.4 --drop_mode late --drop_schedule constant --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Standard dropout / no dropout (replace $p with 0.1 / 0.0)
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout $p --drop_mode standard \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Improved Recipe
Our improved recipe extends training epochs from 300 to 600, and reduces both mixup and cutmix to 0.3.
Early dropout
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Early stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Evaluation
single-GPU
python main.py --model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data
multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data
Acknowledgement
This repository is built using the timm library and ConvNeXt codebase.
License
This project is released under the CC-BY-NC 4.0 license. Please see the LICENSE file for more information.
Citation
If you find this repository helpful, please consider citing:
@inproceedings{liu2023dropout,
title={Dropout Reduces Underfitting},
author={Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell},
year={2023},
booktitle={International Conference on Machine Learning},
}
Related Skills
node-connect
332.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
332.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.7kCommit, push, and open a PR
