MINTO
🌿 [ICLR 2026] Official codebase for MINTO. 🌿 MINTO is a simple, yet effective target bootstrapping method for temporal-difference RL that enables faster, more stable learning and consistently improves performance across algorithms and benchmarks.
Install / Use
/learn @AhmedMagdyHendawy/MINTOREADME
Official Implementation of MINTO 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].
TL;DR
🌿 MINTO is a simple, yet effective target bootstrapping method for temporal-difference RL that enables faster, more stable learning and consistently improves performance across algorithms and benchmarks.
🌿 MINTO computes the target value by considering the MINimum estimate between the Target and Online network, hence introducing fresh 🌿 and more recent value estimates in a stable manner 🛡️ by mitigating the potential overestimation bias of using the online network for bootstrapping.

Code Structure
MINTO integrates easily into value-based and actor-critic methods with minimal overhead. Hence, we evaluate it across diverse benchmarks, spanning online and offline RL, as well as discrete and continuous action spaces. To conduct our experiments, we utilized variants of three different repositories:
- Online RL (discrete): Based on slimDQN.
- Offline RL: Based on slimCQL.
- Online RL (continuous): Based on SimbaV2.
To reproduce the main results in the paper, see the corresponding subfolders and their installation guides.
Subfolders:
online_rl_discrete/for online RL (Atari, discrete).offline_rl/for offline RL (Atari, discrete).online_rl_continuous/for continuous control (e.g., MuJoCo).
Quick Start
Example (online RL and Discrete Control):
cd online_rl_discrete
conda create -n minto python=3.10
conda activate minto
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]
bash run_dqn.sh min Breakout
Citation
If you use this codebase or find our work helpful, please consider citing our paper as follows:
@inproceedings{hendawy2025use,
title={Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning},
author={Hendawy, Ahmed and Metternich, Henrik and Vincent, Th{\'e}o and Kallel, Mahdi and Peters, Jan and D'Eramo, Carlo},
journal={International Conference on Learning Representations (ICLR)},
year={2026}
}
