CrossQ

Official code release for "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity"

Generate Convert Improve

Install / Use

/learn @adityab/CrossQ

About this skill

Quality Score

0/100

README

<img src="http://adityab.github.io/CrossQ/static/images/crossq-fancy.png" align="center" width="300px"/>

[🌏 Webpage] [📕 Paper ] [💬 ICLR 2024 OpenReview (top 5% spotlight)]

Official code release for the ICLR 2024 paper 👇

CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

Bhatt A.*, Palenicek D.*, Belousov B., Argus M., Amiranashvili A., Brox T., Peters J.

Setup

Execute the following commands to set up a conda environment to run experiments

conda create -n crossq python=3.11.5
conda activate crossq
conda install -c nvidia cuda-nvcc=12.3.52

pip install -e .
pip install "jax[cuda12_pip]==0.4.19" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Running Experiments

The main entry point for running experiments is train.py. You can configure experiments with the appropriate environment and agent flags. For more info run python train.py --help.

To train with WandB logging, run the following command to train a CrossQ agent on the Humanoid-v4 environment with seed 9, which will log the results to your WandB entity and project:

python train.py -algo crossq -env Humanoid-v4 -seed 9 -wandb_mode 'online' -wandb_entity my_team -wandb_project crossq

To train without WandB logging, run the following command, and in a different terminal run tensorboard --logdir logs to visualize training progress:

python train.py -algo crossq -env Humanoid-v4 -seed 9 -wandb_mode 'disabled'

To train on a cluster, we provide examples of slurm scripts in /slurm to run various experiments, baselines and ablations performed in the paper on a slurm cluster. These configurations are very cluster specific and probably need to be adjusted for your specific cluster. However, they should surve as a starting point.

Citing this Project and the Paper

To cite our paper and/or this repository in publications:

@inproceedings{
  bhatt2024crossq,
  title={CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity},
  author={Aditya Bhatt and Daniel Palenicek and Boris Belousov and Max Argus and Artemij Amiranashvili and Thomas Brox and Jan Peters},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=PczQtTsTIX}
}

Acknowledgements

The implementation is built upon code from Stable Baselines JAX.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

13.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

000-main-rules

Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce

adityab

View profile

View on GitHub

GitHub Stars89

CategoryEducation

Updated17d ago

Forks5

adityab/CrossQ

Languages

Python

Security Score

85/100

Audited on Mar 11, 2026

No findings