HART

The official implementation of "Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer"

Generate Convert Improve

Install / Use

/learn @ZYangChen/HART

About this skill

Quality Score

0/100

README

HART

The official implementation of "Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer"

Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer <br> Ziyang Chen,Wenting Li, Yongjun Zhang✱, Bingshu Wang, Yabo Wu, Yong Zhao, C. L. Philip Chen <br> arXiv Report <br> Contact us: ziyangchen2000@gmail.com; zyj6667@126.com✱

@article{chen2025hart,
  title={Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer},
  author={Chen, Ziyang and Zhang, Yongjun and Li, Wenting and Wang, Bingshu and Wu, Yabo and Zhao, Yong and Chen, CL},
  journal={arXiv preprint arXiv:2501.01023},
  year={2025}
}

Requirements

Python = 3.8

CUDA = 11.3

conda create -n hart python=3.8
conda activate hart
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Dataset

To evaluate/train our HART, you will need to download the required datasets.

Sceneflow (Includes FlyingThings3D, Driving, Monkaa)
Middlebury
ETH3D
KITTI
TartanAir
Falling Things (fat.zip)
CARLA
CREStereo Dataset
InStereo2K
Sintel Stereo
ETH3D

By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── FlyingThings3D
        ├── frames_finalpass
        ├── disparity
    ├── Monkaa
        ├── frames_finalpass
        ├── disparity
    ├── Driving
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── KITTI_2015
        	├── testing
	        ├── training
        ├── KITTI_2012
        	├── testing
		├── training
    ├── Middlebury
        ├── MiddEval3
		├── trainingF
		├── trainingH
		├── trainingQ
	├── official_train.txt
        ├── 2005
        ├── 2006
        ├── 2014
        ├── 2021
    ├── ETH3D
        ├── two_view_training
        ├── two_view_training_gt
        ├── two_view_testing
    ├── TartanAir
    ├── fat
    ├── crestereo
    ├── HR-VS
        ├── carla-highres
    ├── InStereo2K

"official_train.txt" is available at here.

Training

bash ./scripts/train.sh

Evaluation

To evaluate a trained model on a validation set (e.g. Middlebury full resolution), run

python evaluate_stereo.py --restore_ckpt models/hart_sceneflow.pth --dataset middlebury_F

Weight is available here.

Acknowledgements

<ul> <li>This project borrows the code from <a href="https://github.com/mli0603/stereo-transformer">STTR</a>, <a href="https://github.com/David-Zhao-1997/High-frequency-Stereo-Matching-Network">DLNR</a>, <a href="https://github.com/gangweiX/IGEV">IGEV</a>, <a href="https://github.com/ZYangChen/MoCha-Stereo">MoCha-Stereo</a>. We thank the original authors for their excellent works!</li> <li>This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (QianKeHe[2024]Key001).</li> <li>This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (Project No. [2023]159). </li> </ul>

Related Skills

node-connect

349.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。