Fastpbrl
Vectorization techniques for fast population-based training.
Install / Use
/learn @instadeepai/FastpbrlREADME
Fast Population-Based Reinforcement Learning
This repository contains the code for the paper "Fast Population-Based Reinforcement Learning on a Single Machine paper from InstaDeep", (Flajolet et al., 2022) :computer::zap:.
First-time setup
Install Docker
This code requires docker to run. To install docker please follow the online instructions here. To enable the code to run on GPU, please install Nvidia-docker (as well as the latest nvidia driver available for your GPU).
Build and run a docker image
Once docker and docker Nvidia are installed, you can simply build the docker image with the following command:
make build
and, once the image is built, start the container with:
make dev_container
Inside the container, you can run the nvidia-smi command to verify that your GPU is found.
Run preconfigured scripts
Replicate the experiments from the paper
We provide scripts and commands to replicate the experiments discussed in the paper. All these commands are defined in the Makefile at the root of the repository.
To replicate the experiments corresponding to Figure 2 (where we measure the runtime of a population-wide update step with various implementations), run:
make run_timing_sactd3
make run_timing_dqn
To replicate the experiments discussed in Section 5 (which correspond to full training runs), run the following:
make run_td3_cemrl
make run_td3_dvd
make run_td3_pbt
make run_sac_pbt
Note that dvd training runs are unstable and sometimes crash early on due to NaNs.
We use tensorboard to log metrics during the training run. The tensorboard command
to run to visualize them is printed when the experiment starts.
Launch a test script
Run the following command to start a short test which validates that the code in the training scripts is working as expected.
make test_training_scripts
Contributors
<a href="https://github.com/thomashirtz" title="Thomas Hirtz"><img src="https://github.com/thomashirtz.png" height="auto" width="50" style="border-radius:50%"></a> <a href="https://github.com/flajolet" title="Arthur Flajolet"><img src="https://github.com/flajolet.png" height="auto" width="50" style="border-radius:50%"></a> <a href="https://github.com/cibeah" title="Claire Bizon Monroc"><img src="https://github.com/cibeah.png" height="auto" width="50" style="border-radius:50%"></a> <a href="https://github.com/ranzenTom" title="Thomas Pierrot"><img src="https://github.com/ranzenTom.png" height="auto" width="50" style="border-radius:50%"></a>
Citing this work
If you use the code or data in this package, please cite:
@inproceedings{flajolet2022fast,
title={Fast Population-Based Reinforcement Learning on a Single Machine},
author={Flajolet, Arthur and Monroc, Claire Bizon and Beguir, Karim and Pierrot, Thomas},
booktitle={International Conference on Machine Learning},
pages={6533--6547},
year={2022},
organization={PMLR}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
18.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
