HART
The official implementation of "Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer"
Install / Use
/learn @ZYangChen/HARTREADME
HART
The official implementation of "Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer"
<div align="center"> <img width="80%", src="./hart-poster.png"> </div><a href="https://arxiv.org/pdf/2501.01023" target='_blank'><img src="https://img.shields.io/badge/arXiv-PDF-f5cac3?logo=adobeacrobatreader&logoColor=red"/></a> <a href="https://kns.cnki.net/kcms2/article/abstract?v=VkQzIsHyPdiXOZ6uXUyfivU9sw0L6aCigQddB8XY3kQv2xxDD_PZgfE930ZrL792l6Ja8IZ4Q_rHF8P3ZJmixyHK5a8qnFYkwDoNMPsQqWXyV9Onp09yYpnB13Ge2IcnPc0ZBSK2p02CFOWrMTXb3KGkB7IK42dsaQcZl4PJP1pd7ZqVhcObiu58-VpKHCNl&uniplatform=NZKPT&language=CHS" target='_blank'><img src="https://img.shields.io/badge/中文版-PDF-f5cac3?logo=adobeacrobatreader&logoColor=red"/></a>
Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer <br> Ziyang Chen,Wenting Li, Yongjun Zhang✱, Bingshu Wang, Yabo Wu, Yong Zhao, C. L. Philip Chen <br> arXiv Report <br> Contact us: ziyangchen2000@gmail.com; zyj6667@126.com✱
@article{chen2025hart,
title={Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer},
author={Chen, Ziyang and Zhang, Yongjun and Li, Wenting and Wang, Bingshu and Wu, Yabo and Zhao, Yong and Chen, CL},
journal={arXiv preprint arXiv:2501.01023},
year={2025}
}
Requirements
Python = 3.8
CUDA = 11.3
conda create -n hart python=3.8
conda activate hart
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
Dataset
To evaluate/train our HART, you will need to download the required datasets.
- Sceneflow (Includes FlyingThings3D, Driving, Monkaa)
- Middlebury
- ETH3D
- KITTI
- TartanAir
- Falling Things (fat.zip)
- CARLA
- CREStereo Dataset
- InStereo2K
- Sintel Stereo
- ETH3D
By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder
├── datasets
├── FlyingThings3D
├── frames_finalpass
├── disparity
├── Monkaa
├── frames_finalpass
├── disparity
├── Driving
├── frames_finalpass
├── disparity
├── KITTI
├── KITTI_2015
├── testing
├── training
├── KITTI_2012
├── testing
├── training
├── Middlebury
├── MiddEval3
├── trainingF
├── trainingH
├── trainingQ
├── official_train.txt
├── 2005
├── 2006
├── 2014
├── 2021
├── ETH3D
├── two_view_training
├── two_view_training_gt
├── two_view_testing
├── TartanAir
├── fat
├── crestereo
├── HR-VS
├── carla-highres
├── InStereo2K
"official_train.txt" is available at here.
Training
bash ./scripts/train.sh
Evaluation
To evaluate a trained model on a validation set (e.g. Middlebury full resolution), run
python evaluate_stereo.py --restore_ckpt models/hart_sceneflow.pth --dataset middlebury_F
Weight is available here.
Acknowledgements
<ul> <li>This project borrows the code from <a href="https://github.com/mli0603/stereo-transformer">STTR</a>, <a href="https://github.com/David-Zhao-1997/High-frequency-Stereo-Matching-Network">DLNR</a>, <a href="https://github.com/gangweiX/IGEV">IGEV</a>, <a href="https://github.com/ZYangChen/MoCha-Stereo">MoCha-Stereo</a>. We thank the original authors for their excellent works!</li> <li>This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (QianKeHe[2024]Key001).</li> <li>This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (Project No. [2023]159). </li> </ul>Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
