TransXNet
[TNNLS 2025] TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Install / Use
/learn @LMMMEng/TransXNetREADME
[TNNLS 2025] TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
This is an official PyTorch implementation of "TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition".
📝 Paper: Journal Version | arXiv Version
Introduction
TransXNet is a CNN-Transformer hybrid vision backbone that can model both global and local dynamics with a Dual Dynamic Token Mixer (D-Mixer), achieving superior performance over both CNN and Transformer-based models.
<center> <img src="assets/architecture.png" width="70%" height="auto"> </center>Image Classification
1. Requirements
We highly suggest using our provided dependencies to ensure reproducibility:
# Environments:
cuda==11.6
python==3.8.15
# Packages:
mmcv==1.7.1
timm==0.6.12
torch==1.13.1
torchvision==0.14.1
2. Data Preparation
Prepare ImageNet with the following folder structure, you can extract ImageNet by this script.
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
3. Main Results on ImageNet with Pretrained Models
| Models | Input Size | FLOPs (G) | Params (M) | Top-1 Acc.(%) | Download | |:-----------:|:----------:|:---------:|:----------:|:----------:|:----------:| | TransXNet-T | 224x224 | 1.8 | 12.8 | 81.6 | model | | TransXNet-S | 224x224 | 4.5 | 26.9 | 83.8 | model | | TransXNet-B | 224x224 | 8.3 | 48.0 | 84.6 | model | | TransXNet-B | 384x384 | 24.2 | 48.0 | 85.5 | model |
4. Train
To train TransXNet models on ImageNet-1K with 8 gpus (single node), run:
bash scripts/train_tiny.sh # train TransXNet-T
bash scripts/train_small.sh # train TransXNet-S
bash scripts/train_base.sh # train TransXNet-B
5. Validation
To evaluate TransXNet on ImageNet-1K, run:
MODEL=transxnet_t # transxnet_{t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint
Object Detection and Semantic Segmentation
Citation
If you find this project useful for your research, please consider citing:
@article{lou2023transxnet,
title={TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition},
author={Meng Lou and Shu Zhang and Hong-Yu Zhou and Sibei Yang and Chuan Wu and Yizhou Yu},
journal={IEEE Transactions on Neural Networks and Learning Systems},
year={2025}
}
Acknowledgment
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.
Contact
If you have any questions, please feel free to create issues or contact me at lmzmm.0921@gmail.com.
