HorNet
[NeurIPS 2022] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Install / Use
/learn @raoyongming/HorNetREADME
HorNet <img width="32" alt="HorNet Icon" src="figs/hornet_icon.png">
Created by Yongming Rao*, Wenliang Zhao*, Yansong Tang, Jie Zhou, Ser-Nam Lim†, Jiwen Lu†
This repository contains PyTorch implementation for HorNet (NeurIPS 2022).
HorNet is a family of generic vision backbones that perform explicit high-order spatial interactions based on Recursive Gated Convolution.

Model Zoo
ImageNet-1K trained models:
| name | arch | Params | FLOPs | Top-1 | url |
| --- | --- | --- | --- | --- | --- |
| HorNet-T (7x7) | hornet_tiny_7x7 | 22M | 4.0G | 82.8 | Tsinghua Cloud|
| HorNet-T (GF) | hornet_tiny_gf | 23M | 3.9G | 83.0 | Tsinghua Cloud|
| HorNet-S (7x7) | hornet_small_7x7 | 50M | 8.8G | 83.8 | Tsinghua Cloud|
| HorNet-S (GF) | hornet_small_gf | 50M | 8.7G | 84.0 | Tsinghua Cloud|
| HorNet-B (7x7) | hornet_base_7x7 | 87M | 15.6G | 84.2 | Tsinghua Cloud|
| HorNet-B (GF) | hornet_base_gf | 88M | 15.5G | 84.3 | Tsinghua Cloud|
ImageNet-22K trained models:
| name | arch | Params | FLOPs | url |
| --- | --- | --- | --- | --- |
| HorNet-L (7x7) | hornet_large_7x7 | 209M | 34.8G | Tsinghua Cloud|
| HorNet-L (GF) | hornet_large_gf | 211M | 34.7G | Tsinghua Cloud|
| HorNet-L (GF)* | hornet_large_gf_img384 | 216M | 101.8G | Tsinghua Cloud|
*indicate the model is finetuned to 384x384 resolution on ImageNet-22k.
ImageNet Classification
Requirements
- torch==1.8.0
- torchvision==0.9.0
- timm==0.4.12
- tensorboardX
- six
- submitit (multi-node training)
Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be
│ILSVRC2012/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
Evaluation
To evaluate a pre-trained HorNet model on the ImageNet validation set with 8 GPUs, run:
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model hornet_tiny_7x7 --eval true --input_size 224 \
--resume /path/to/checkpoint \
--data_path /path/to/imagenet-1k
Training
To train HorNet models on ImageNet from scratch on a single machine, run:
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model hornet_tiny_7x7 --drop_path 0.2 --clip_grad 5\
--batch_size 128 --lr 4e-3 --update_freq 4 \
--model_ema true --model_ema_eval true \
--data_path /path/to/imagenet-1k \
--output_dir ./logs/hornet_tiny_7x7
We provide detailed training commands for our models in TRAINING.md.
Downstream Tasks
Please check the object_detection.md and semantic_segmentation.md for training and evaluation instructions on dense prediction tasks.
HorNet also achieves state-of-the-art performance on 3D object classification with our new framework (P2P) to leverage pre-trained image models for point cloud understanding.
License
MIT License
Acknowledgements
Our code is based on pytorch-image-models, DeiT and ConvNeXt. We would like to thank High-Flyer AI Research for their generous support of partial computational resources used in this project.
Citation
If you find our work useful in your research, please consider citing:
@article{liu2025hornet,
title={Efficient high-order spatial interactions for visual perception},
author={Liu, Zuyan and Rao, Yongming and Zhao, Wenliang and Zhou, Jie and Lu, Jiwen},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2025},
publisher={IEEE}
}
@article{rao2022hornet,
title={HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions},
author={Rao, Yongming and Zhao, Wenliang and Tang, Yansong and Zhou, Jie and Lim, Ser-Lam and Lu, Jiwen},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
