Official Implementation of `MINTO` 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

TL;DR

🌿 MINTO is a simple, yet effective target bootstrapping method for temporal-difference RL that enables faster, more stable learning and consistently improves performance across algorithms and benchmarks.

🌿 MINTO computes the target value by considering the MINimum estimate between the Target and Online network, hence introducing fresh 🌿 and more recent value estimates in a stable manner 🛡️ by mitigating the potential overestimation bias of using the online network for bootstrapping.

Code Structure

MINTO integrates easily into value-based and actor-critic methods with minimal overhead. Hence, we evaluate it across diverse benchmarks, spanning online and offline RL, as well as discrete and continuous action spaces. To conduct our experiments, we utilized variants of three different repositories:

Online RL (discrete): Based on slimDQN.
Offline RL: Based on slimCQL.
Online RL (continuous): Based on SimbaV2.

To reproduce the main results in the paper, see the corresponding subfolders and their installation guides.

Subfolders:

online_rl_discrete/ for online RL (Atari, discrete).
offline_rl/ for offline RL (Atari, discrete).
online_rl_continuous/ for continuous control (e.g., MuJoCo).

Quick Start

Example (online RL and Discrete Control):

cd online_rl_discrete
conda create -n minto python=3.10
conda activate minto
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]
bash run_dqn.sh min Breakout

Citation

If you use this codebase or find our work helpful, please consider citing our paper as follows:

@inproceedings{hendawy2025use,
  title={Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning},
  author={Hendawy, Ahmed and Metternich, Henrik and Vincent, Th{\'e}o and Kallel, Mahdi and Peters, Jan and D'Eramo, Carlo},
  journal={International Conference on Learning Representations (ICLR)},
  year={2026}
}

MINTO

Install / Use

README

Official Implementation of `MINTO` 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

TL;DR

Code Structure

Quick Start

Citation

MINTO

Install / Use

README

Official Implementation of MINTO 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

TL;DR

Code Structure

Quick Start

Citation

Official Implementation of `MINTO` 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].