Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts

This is the official project repository for Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts 🔗 by Zixuan Hu, Dongxiao Li, Xinzhu Ma, Shixiang Tang, Xiaotong Li, Wenhan Yang and Ling-Yu Duan (ICCV 2025 Highlight).

Introduction

Accurate monocular 3D object detection (M3OD) is pivotal for safety-critical applications like autonomous driving, yet its reliability deteriorates significantly under real-world domain shifts caused by environmental or sensor variations. To address these shifts, Test-Time Adaptation (TTA) methods have emerged, enabling models to adapt to target distributions during inference. While prior TTA approaches recognize the positive correlation between low uncertainty and high generalization ability, they fail to address the dual uncertainty inherent to M3OD: semantic uncertainty (ambiguous class predictions) and geometric uncertainty (unstable spatial localization). To bridge this gap, we propose Dual Uncertainty Optimization (DUO), the first TTA framework designed to jointly minimize both uncertainties for robust M3OD. Through a convex optimization lens, we introduce an innovative convex structure of the focal loss and further derive a novel unsupervised version, enabling label-agnostic uncertainty weighting and balanced learning for high-uncertainty objects. In parallel, we design a semantic-aware normal field constraint that preserves geometric coherence in regions with clear semantic cues, reducing uncertainty from the unstable 3D representation. This dual-branch mechanism forms a complementary loop: enhanced spatial perception improves semantic classification, and robust semantic predictions further refine spatial understanding. Extensive experiments demonstrate the superiority of DUO over existing methods across various datasets and domain shift types.

Method: Dual Uncertainty Optimization (DUO) pipeline

Getting Started

Clone the repo:

git clone https://github.com/hzcar/DUO.git

Requirements

DUO depends on:
- CUDA 11.8
- Python 3.9
- Pytorch = 2.3.1 + cu118
Create a new environment:

conda create -n duo python=3.9 -y

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

Compile and set up:

cd model/backbone/DCNv2

. make.sh

cd ../../..

python setup.py develop

ATTENTION: If you compile DCN and set up in other projects, you will need to recompile and set them up when you return to this project.

Data preparation

This repository provides code for training and evaluation on KITTI-C and nuScenes using MonoFlex and MonoGround.

You are also welcome to use your own datasets and models!

📥 Download the KITTI-C, nuScenes validation sets, all source model checkpoints from Huggingface
🔗 https://huggingface.co/datasets/hzcar/duo
📂 Place the downloaded checkpoints under: DUO/[MonoFlex-KITTI | MonoFlex-nuscenes | MonoGround-KITTI | MonoGround-nuscenes]/checkpoint

⚠️ Note: The corruption process in KITTI-C involves certain randomness, which may cause fluctuations in detection performance. If you use the version provided in this repository, please make sure to acknowledge it properly in your citation.

Example 1: DUO on corruption shifts with MonoFlex.

cd MonoFlex-KITTI
python tools/tta_monotta.py --config runs/monoflex.yaml --ckpt checkpoint/model_moderate_best_soft.pth --eval --output kitti-c/dir --domain [gaussian_noise_5/shot_noise_5/.../saturate_5] --method [tent/deyo/monotta/duo]

You can test all corruption types under level-5 by directly running

. test_tta.sh

If you want to test the source model without test-time adaptation, please run

. test_source.sh

Example 2: DUO on Real-World scenario with MonoFlex.

cd MonoFlex-nuscenes
python tools/tta_monotta.py --config runs/monoflex.yaml --ckpt checkpoint/day_monoflex.pth --eval --output nuscenes/night/ --domain night/val/ --method [tent/deyo/monotta/duo]

You can test all 4 real-world adaptation tasks in nuScenes by directly running

. test_tta.sh

If you want to test the source model without test-time adaptation, please run

. test_source.sh

🔧 Extending DUO to More Detection Tasks

To extend DUO to broader detection tasks, we provide a modular and easily pluggable implementation of our unsupervised focal loss:

class CFLoss(torch.nn.Module):
    def __init__(self, alpha, gamma, temp, reduction):
        super(CFLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.temp = temp
        self.reduction = reduction

    def forward(self, logits):
        p = F.softmax(logits / self.temp, dim=-1)
        log_p = torch.log(p + 1e-12)
        diag_p = torch.diag_embed(p)
        outer_pp = torch.einsum('bi,bj->bij', p, p)
        
        I = torch.eye(p.size(1), device=p.device).unsqueeze(0)
        term1 = self.gamma * (1 - log_p).unsqueeze(2) * outer_pp
        term2 = self.gamma * torch.diag_embed(p * log_p)
        M = I + term1 - term2
        
        pse_label = torch.linalg.solve(M, p.unsqueeze(2)).squeeze(2)

        weight = (1 - p) ** self.gamma
        loss = -self.alpha * torch.sum(weight * log_p * pse_label, dim=-1)

        if self.reduction == 'mean':
            return loss.mean()
        elif self.reduction == 'sum':
            return loss.sum()
        else:
            return loss

⚠️ Note: In our experiments, we set $\alpha = 4$ and $\gamma = 2$, consistent with the focal loss configuration used in supervised training.

To avoid numerical singularities during matrix inversion, one can scale the identity term or use a broadcasted all-one matrix, both of which improve conditioning and ensure stable optimization.

This core loss function is designed as a drop-in replacement for the original loss, making it straightforward to integrate into existing unsupervised or semi-supervised pipelines.

Correspondence

Please contact Zixuan Hu by [hzxuan at pku.edu.cn] if you have any questions. 📬

Acknowledgment

This repository is built using the MonoTTA and ReCAP repository. Thanks for these excellent projects!

Citation

If our DUO method are helpful in your research, please consider citing our paper:

@article{hu2025adaptive,
  title={Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts},
  author={Hu, Zixuan and Li, Dongxiao and Ma, Xinzhu and Tang, Shixiang and Li, Xiaotong and Yang, Wenhan and Duan, Ling-Yu},
  journal={arXiv preprint arXiv:2508.20488},
  year={2025}
}

DUO

Install / Use

README