Unic
PyTorch code and pretrained weights for the UNIC models.
Install / Use
/learn @naver/UnicREADME
Mert Bulent Sariyildiz · Philippe Weinzaepfel · Thomas Lucas · Diane Larlus · Yannis Kalantidis
NAVER LABS Europe
ECCV 2024

Installation
For training UNIC models on ImageNet-1K (by distilling from the four teachers we used in the paper), you need some Python packages, pretrained weights for the teacher models and the ImageNet-1K dataset.
Conda environment
- Create a conda environment with all the necessary packages for training and evaluation:
env_name="unic"
conda create -n ${env_name}
conda activate ${env_name}
conda install pytorch=2.1.1 pytorch-cuda=12.1 torchvision \
timm transformers einops torchmetrics optuna \
tensorboard matplotlib pandas scikit-learn-intelex omegaconf \
-c pytorch -c nvidia -c conda-forge
- Set the path of your conda in scripts/setup_env.sh, i.e. update the
conda_dirvariable. Then your environment will be automatically used by both the training and evaluation scripts.
Teacher models
- Download the teacher models we used in our work. We provide bash scripts to automatize this process, under the scripts/teachers folder. To download all teachers at once, use scripts/teachers/_prepare_all.sh:
(cd scripts/teachers && ./_prepare_all.sh <path_to_download_directory>)
- Once teacher checkpoints are downloaded, update the
TEACHER_CFGvariable in teachers/config.py to point to the correct paths.
Distillation dataset
- Download the ImageNet-1K dataset (ILSVRC-2012). Check out the official website for details.
Training UNIC models
- Use the main_unic.py script to train UNIC models.
By default, it distills the following four teachers into a ViT-Base/16 student:
- DINO (
dino_vitbase_16) - DeiT-III (
deit3_vitbase_16) - iBOT (
ibot_vitbase_16) - dBOT fine-tuned on ImageNet-1K classification (
dbotft_vitbase_16)
- DINO (
So make sure to download the teacher models (see the Teacher models section).
The architecture of the student encoder is compatible with DINOv2.
We trained our UNIC models on 4 GPUs, with minimum 32GB of memory per GPU. The default batch size is 128 per GPU, adjust it according to your GPU memory (learning rate will be scaled accordingly).
- To train a UNIC model, use the following commands (available in scripts/train_unic.sh):
# - Initialize the conda environment
# - Set ${MASTER_ADDR}, ${MASTER_PORT}, ${N_GPUS} for distributed training
source ./scripts/setup_env.sh
dataset_dir="/path/to/imagenet-1k"
output_dir="/path/to/output_dir"
mkdir -p ${output_dir}
torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nnodes=1 --nproc_per_node=${N_GPUS} main_unic.py \
--data_dir=${dataset_dir} \
--output_dir=${output_dir} \
--seed=${RANDOM}
Pretrained models
Distilled from teachers pretrained on ImageNet-1K
We provide a pretrained UNIC model with the ViT-Base/16 architecture, distilled from the four teachers mentioned above.
<table> <thead> <tr> <th>Model</th> <th style="white-space: nowrap;">Teachers</th> <th>Distillation<br>Dataset</th> <th>Distillation<br>Resolution</th> <th>Student<br>Architecture</th> <th>ImageNet‑1K<br>Classification</th> <th>ADE20K<br>Segmentation</th> <th>Model<br>Checkpoint</th> <th>Training<br>Arguments</th> </tr> </thead> <tbody> <tr> <td>UNIC</td> <td>DINO‑B/16<br>iBOT‑B/16<br>DeiT‑III‑B/16<br>dBOT‑ft‑B/16</td> <td style="text-align:center;">ImageNet‑1K</td> <td style="text-align:center;">224</td> <td style="text-align:center;">ViT‑Base/16</td> <td style="text-align:center;">83.8</td> <td style="text-align:center;">39.6<br>(<a href="https://download.europe.naverlabs.com/ComputerVision/unic/unic_head_ade20k.pth">Linear head link</a>)</td> <td style="text-align:center;"><a href="https://download.europe.naverlabs.com/ComputerVision/unic/unic.pth">Link<br>(870MB)</a></td> <td style="text-align:center;"><a href="https://download.europe.naverlabs.com/ComputerVision/unic/unic_args.json">Link</a></td> </tr> </tbody> </table>The relative performance of UNIC over the four teachers is shown below.
<div align="center"> <img src="assets/unic-performance.png" alt="Relative performance of UNIC over teachers" width="400"/> </div>Distilled from teachers pretrained on arbitrary datasets
We also provide a pretrained UNIC-L model with the ViT-Large/14 architecture distilled from DINOv2-G/14 and MetaCLIP-H/14 teachers.
<table> <thead> <tr> <th>Model</th> <th style="white-space: nowrap;">Teachers</th> <th>Distillation<br>Dataset</th> <th>Distillation<br>Resolution</th> <th>Student<br>Architecture</th> <th>ImageNet‑1K<br>k-NN (k=20)</th> <th>ImageNet‑1K<br>Zero‑shot</th> <th>ADE20K<br>Segmentation</th> <th>Model<br>Checkpoint</th> <th>Training<br>Arguments</th> </tr> </thead> <tbody> <tr> <td>UNIC‑L</td> <td>DINOv2‑G/14<br>MetaCLIP‑H/14</td> <td style="text-align:center;">ImageNet‑1K</td> <td style="text-align:center;">224/336</td> <td style="text-align:center;">ViT‑Large/14</td> <td style="text-align:center;">85.6</td> <td style="text-align:center;">81.4</td> <td style="text-align:center;">48.3<br>(<a href="https://download.europe.naverlabs.com/ComputerVision/unic/unic_l_head_ade20k.pth">Linear head link</a>)</td> <td style="text-align:center;"><a href="https://download.europe.naverlabs.com/ComputerVision/unic/unic_l.pth">Link<br>(2.2GB)</a></td> <td style="text-align:center;"><a href="https://download.europe.naverlabs.com/ComputerVision/unic/unic_l_args.json">Link</a></td> </tr> </tbody> </table>Comparison of UNIC-L to the teachers and recent AM-RADIO model is shown below.
<div align="center"> <img src="assets/unic-l-performance.png" alt="Relative performance of UNIC-L over teachers" width="550"/> </div>Evaluating UNIC models
Transfer learning tasks
The evaluation protocol for transfer learning tasks involves two steps:
- Extracting features from the encoder of a pretrained UNIC model
- Training logistic regression classifiers on top of the extracted features
We use the implementation from t-ReX, which is available at https://github.com/naver/trex. For convenience, the evaluation code is copied in the eval_transfer folder of this repository.
First, download the transfer datasets following the instructions in t-ReX repository. Once download finishes, update the hardcoded dataset paths in the eval_transfer/data/init.py file. Then, use the following command to evaluate a pretrained UNIC model on, e.g. the ImageNet-1K dataset (with labels):
source scripts/setup_env.sh
##########
# extract features
dataset="in1k"
image_size=224
pretrained="/path/to/unic/checkpoint.pth"
features_dir=$(dirname "${pretrained}")
features_dir=${features_dir}/transfer/features_${dataset}_${image_size}
if [ ! -f "${features_dir}/features_trainval.pth" ] || [ ! -f "${features_dir}/features_test.pth" ]; then
echo "Extracting features..."
python eval_transfer/main_ft_extract.py \
--output_dir="${features_dir}" \
--pretrained="${pretrained}" \
--dataset="${dataset}" \
--image_size="${image_size}"
fi
##########
# train logreg classifier using extracted features
features_norm="none"
clf_type="logreg_sklearn"
if [[ "${dataset}" == "in1k" ]] || [[ "${dataset}" == cog_* ]] || [[ "${dataset}" == inat* ]]; then
# for large datasets,
# we use SGD implemented in PyTorch and l2 normalize features
features_norm="l2"
clf_type="logreg_torch"
fi
echo ""
echo "Training classifier ..."
python -m sklearnex eval_transfer/main_clf.py --features_dir="${features_dir}" --features_norm=${features_norm} --clf_type=${clf_type}
See the --dataset argument in main_ft_extact.py for the list of available datasets.
Dense prediction tasks
Semantic segmentation on ADE20K
First, download the ADE20K dataset from [the official webs
