SkillAgentSearch skills...

UniRepLKNet

[CVPR 2024 & TPAMI 2025] UniRepLKNet

Install / Use

/learn @AILab-CVC/UniRepLKNet

README

<p align="center"> <h1 align="center"><strong>UniRepLKNet: CVPR 2024, TPAMI 2025</strong></h1> <p align="center"> </p> </p>

arXiv arXiv Hugging Face Models website <a href="#LICENSE--citation"> <img alt="License: Apache2.0" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue.svg"/> </a>

<p align="center" width="100%"> <img src="assets/UniRepLKNet.png" width="100%" height="60%"> </p>

🌟🌟🌟 News: The journal version Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations is accepted by IEEE TPAMI.

Motivation

  • We note that most architectures of the existing large-kernel ConvNets simply follow other models. The architectural design for large-kernel ConvNets remains under-explored.
  • The universal perception ability of Transformers is sparking in multimodal research areas (image, audio, video, time-series, etc). We are curious whether ConvNets can also deliver universal perception ability across multiple modalities with a unified architecture.

Highlights

A ConvNet unifies multiple modalities and outperforms modality-specific models. This paper summarizes architectural guidelines to build large-kernel CNN, which works amazingly well with images and other modalities. This is the latest contribution to both two influential areas - Structural Re-param (since RepVGG, Ding et al. 2021) and very-large-kernel ConvNet (since RepLKNet, Ding et al. 2022). ImageNet accuracy of 88.0%, COCO AP of 56.4, ADE20K mIoU of 55.6 with only ImageNet-22K pretraining. Higher actual speed and performance than recent models like ConvNeXt v2 and InternImage. With a unified architecture and extremely simple modality-specific preprocessing, achieves state-of-the-art performances on audio recognition and, most amazingly, Global Temperature & Wind Speed Forecasting (a challenging huge-scale time-series forecasting task), outperforming the existing global forecasting system.

More specifically, we contribute from two aspects:

  • We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep. Following such guidelines, our proposed large-kernel ConvNet shows leading performance in image recognition.
  • We discover that large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient. With certain modality-related preprocessing approaches, the proposed model achieves state-of-the-art performance on time-series forecasting and audio recognition tasks even without modality-specific customization to the architecture.

UniRepLKNet not only signifies a "comeback" for ConvNet in its original domain but also showcases large-kernel ConvNet’s potential to "conquer" new territories, highlighting further adaptability and broad utility across different modalities and tasks.

Code design

  1. There is some MMDetection- and MMSegmentation-related code in unireplknet.py so that you can directly copy-paste it into your MMDetection or MMSegmentation, e.g., here and here. If you do not want to use it with MMDetection or MMSegmentation, you can safely delete those lines of code.
  2. We have provided code to automatically build our models and load our released weights. See the functions here. You can also use timm.create_model to build the models. For example, model = timm.create_model('unireplknet_l', num_classes=num_classes_of_your_task, in_22k_pretrained=True) will call the function unireplknet_l defined here, which will build a UniRepLKNet-L and automatically download our checkpoints and load the weights.
    # The simplest way to use our model in your project is to copy-paste unireplknet.py into your working directory and create models. For example
    from unireplknet import *
    model = timm.create_model('unireplknet_l', num_classes=num_classes_of_your_task, in_22k_pretrained=True)
    
  3. As UniRepLKNet also uses the Structural Re-parameterization methodology, we provide a function reparameterize_unireplknet() that converts a trained UniRepLKNet into the inference structure, which equivalently removes the parallel branches in Dialted Reparam Blocks, Batch Norm layers, and the bias term in GRN. The pseudo-code of the full pipeline will be like
    training_model = unireplknet_l(...,  deploy=False)
    train(training_model)
    trained_results = evaluate(training_model)
    training_model.reparameterize_unireplknet()
    inference_results = evaluate(training_model)
    # you will see inference_results are identical to trained_results
    save(training_model, 'converted_weights.pth')
    # use the converted model
    deploy_model = unireplknet_l(..., deploy=True)
    load_weights(deploy_model, 'converted_weights.pth')
    deploy_results = evaluate(deploy_model)
    # you will see deploy_results == inference_results == trained_results
    
  4. You may want to read this if you are familiar with the timm library. We sincerely thank timm for providing a convenient re-parameterize function. The code design of UniRepLKNet is compatible with it. That is, calling some_unireplknet_model.reparameterize_unireplknet() is equivalent to calling timm.utils.reparameterize_model(some_unireplknet_model). So if you use our code with timm's codebase, e.g., timm's evaluation code, just add --reparam to your command so that timm.utils.reparameterize_model will be called (see here).

Models

We have provided five ways to download our checkpoints.

  1. Download via the Google Drive links shown below.
  2. Visit our huggingface repo at https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main and click the download icons.
  3. Use huggingface-hub in your python code. First, install huggingface_hub
pip install huggingface_hub

Then, use huggingface_hub like this in your python code, for example,

from huggingface_hub import hf_hub_download
repo_id = 'DingXiaoH/UniRepLKNet'
cache_file = hf_hub_download(repo_id=repo_id, filename=FILE_NAME)
checkpoint = torch.load(cache_file, map_location='cpu')
model.load_state_dict(checkpoint)

See our huggingface repo or our code for FILE_NAME (e.g., unireplknet_xl_in22k_pretrain.pth).

  1. Use the huggingface CLI. Check the tutorial.

  2. Automatically download our checkpoints by passing in_1k_pretrained=True, in_22k_pretrained=True, or in_22k_to_1k=True while calling our provided functions. See the code here.

ImageNet-1K Pretrained Weights

| name | resolution |acc@1 | #params | FLOPs | Weights | |:---:|:---:|:---:|:---:| :---:|:---:| | UniRepLKNet-A | 224x224 | 77.0 | 4.4M | 0.6G | ckpt | | UniRepLKNet-F | 224x224 | 78.6 | 6.2M | 0.9G | ckpt | | UniRepLKNet-P | 224x224 | 80.2 | 10.7M | 1.6G | ckpt | | UniRepLKNet-N | 224x224 | 81.6 | 18.3M | 2.8G | ckpt | | UniRepLKNet-T | 224x224 | 83.2 | 31M | 4.9G | ckpt | | UniRepLKNet-S | 224x224 | 83.9 | 56M | 9.1G | ckpt |

ImageNet-22K Pretrained Weights

| name | resolution | #params | FLOPs | ckpt | |:---:|:---:|:---:|:---:| :---:| | UniRepLKNet-S | 224x224 | 56M | 26.7G | ckpt | | UniRepLKNet-B | 224x224 | 98M | 47.2G | ckpt| | UniRepLKNet-L | 192x192 | 218M | 105.4G | ckpt| | UniRepLKNet-XL | 192x192 | 386M | 187G | ckpt|

Pretrained on ImageNet-22K then finetuned on ImageNet-1K

| name | resolution |acc@1 | #params | FLOPs | ckpt | |:---:|:---:|:---:|:---:| :---:| :---:| | UniRepLKNet-S | 384x384 | 86.4 | 56M | 26.7G | ckpt| | UniRepLKNet-B | 384x384 | 87.4 | 98M | 47.2G | ckpt| | UniRepLKNet-L | 384x384 | 87.9

View on GitHub
GitHub Stars1.1k
CategoryEducation
Updated2d ago
Forks63

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings