DUO
Code for ICCV 2025 paper — Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
Install / Use
/learn @hzcar/DUOREADME
Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
This is the official project repository for Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts 🔗 by Zixuan Hu, Dongxiao Li, Xinzhu Ma, Shixiang Tang, Xiaotong Li, Wenhan Yang and Ling-Yu Duan (ICCV 2025 Highlight).
Introduction
Accurate monocular 3D object detection (M3OD) is pivotal for safety-critical applications like autonomous driving, yet its reliability deteriorates significantly under real-world domain shifts caused by environmental or sensor variations. To address these shifts, Test-Time Adaptation (TTA) methods have emerged, enabling models to adapt to target distributions during inference. While prior TTA approaches recognize the positive correlation between low uncertainty and high generalization ability, they fail to address the dual uncertainty inherent to M3OD: semantic uncertainty (ambiguous class predictions) and geometric uncertainty (unstable spatial localization). To bridge this gap, we propose Dual Uncertainty Optimization (DUO), the first TTA framework designed to jointly minimize both uncertainties for robust M3OD. Through a convex optimization lens, we introduce an innovative convex structure of the focal loss and further derive a novel unsupervised version, enabling label-agnostic uncertainty weighting and balanced learning for high-uncertainty objects. In parallel, we design a semantic-aware normal field constraint that preserves geometric coherence in regions with clear semantic cues, reducing uncertainty from the unstable 3D representation. This dual-branch mechanism forms a complementary loop: enhanced spatial perception improves semantic classification, and robust semantic predictions further refine spatial understanding. Extensive experiments demonstrate the superiority of DUO over existing methods across various datasets and domain shift types.
Method: Dual Uncertainty Optimization (DUO) pipeline
<p align="center"> <img src="figures/pipeline.png" alt="method" width="100%" align=center /> </p>Getting Started
- Clone the repo:
git clone https://github.com/hzcar/DUO.git
Requirements
-
DUO depends on:
- CUDA 11.8
- Python 3.9
- Pytorch = 2.3.1 + cu118
-
Create a new environment:
conda create -n duo python=3.9 -y
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
- Compile and set up:
cd model/backbone/DCNv2
. make.sh
cd ../../..
python setup.py develop
ATTENTION: If you compile DCN and set up in other projects, you will need to recompile and set them up when you return to this project.
Data preparation
This repository provides code for training and evaluation on KITTI-C and nuScenes using MonoFlex and MonoGround.
You are also welcome to use your own datasets and models!
- 📥 Download the KITTI-C, nuScenes validation sets, all source model checkpoints from Huggingface
- 🔗 https://huggingface.co/datasets/hzcar/duo
- 📂 Place the downloaded checkpoints under:
DUO/[MonoFlex-KITTI | MonoFlex-nuscenes | MonoGround-KITTI | MonoGround-nuscenes]/checkpoint
⚠️ Note: The corruption process in KITTI-C involves certain randomness, which may cause fluctuations in detection performance. If you use the version provided in this repository, please make sure to acknowledge it properly in your citation.
Example 1: DUO on corruption shifts with MonoFlex.
cd MonoFlex-KITTI
python tools/tta_monotta.py --config runs/monoflex.yaml --ckpt checkpoint/model_moderate_best_soft.pth --eval --output kitti-c/dir --domain [gaussian_noise_5/shot_noise_5/.../saturate_5] --method [tent/deyo/monotta/duo]
You can test all corruption types under level-5 by directly running
. test_tta.sh
If you want to test the source model without test-time adaptation, please run
. test_source.sh
Example 2: DUO on Real-World scenario with MonoFlex.
cd MonoFlex-nuscenes
python tools/tta_monotta.py --config runs/monoflex.yaml --ckpt checkpoint/day_monoflex.pth --eval --output nuscenes/night/ --domain night/val/ --method [tent/deyo/monotta/duo]
You can test all 4 real-world adaptation tasks in nuScenes by directly running
. test_tta.sh
If you want to test the source model without test-time adaptation, please run
. test_source.sh
🔧 Extending DUO to More Detection Tasks
To extend DUO to broader detection tasks, we provide a modular and easily pluggable implementation of our unsupervised focal loss:
class CFLoss(torch.nn.Module):
def __init__(self, alpha, gamma, temp, reduction):
super(CFLoss, self).__init__()
self.alpha = alpha
self.gamma = gamma
self.temp = temp
self.reduction = reduction
def forward(self, logits):
p = F.softmax(logits / self.temp, dim=-1)
log_p = torch.log(p + 1e-12)
diag_p = torch.diag_embed(p)
outer_pp = torch.einsum('bi,bj->bij', p, p)
I = torch.eye(p.size(1), device=p.device).unsqueeze(0)
term1 = self.gamma * (1 - log_p).unsqueeze(2) * outer_pp
term2 = self.gamma * torch.diag_embed(p * log_p)
M = I + term1 - term2
pse_label = torch.linalg.solve(M, p.unsqueeze(2)).squeeze(2)
weight = (1 - p) ** self.gamma
loss = -self.alpha * torch.sum(weight * log_p * pse_label, dim=-1)
if self.reduction == 'mean':
return loss.mean()
elif self.reduction == 'sum':
return loss.sum()
else:
return loss
⚠️ Note: In our experiments, we set $\alpha = 4$ and $\gamma = 2$, consistent with the focal loss configuration used in supervised training.
To avoid numerical singularities during matrix inversion, one can scale the identity term or use a broadcasted all-one matrix, both of which improve conditioning and ensure stable optimization.
This core loss function is designed as a drop-in replacement for the original loss, making it straightforward to integrate into existing unsupervised or semi-supervised pipelines.
Correspondence
Please contact Zixuan Hu by [hzxuan at pku.edu.cn] if you have any questions. 📬
Acknowledgment
This repository is built using the MonoTTA and ReCAP repository. Thanks for these excellent projects!
Citation
If our DUO method are helpful in your research, please consider citing our paper:
@article{hu2025adaptive,
title={Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts},
author={Hu, Zixuan and Li, Dongxiao and Ma, Xinzhu and Tang, Shixiang and Li, Xiaotong and Yang, Wenhan and Duan, Ling-Yu},
journal={arXiv preprint arXiv:2508.20488},
year={2025}
}
