P3s

Implementation of Population-Guided Parallel Policy Search for Reinforcement Learning

Generate Convert Improve

Install / Use

/learn @wyjung0625/P3s

About this skill

Quality Score

0/100

README

Population-Guided Parallel Policy Search (P3S)

The algorithm is based on the paper "Population-Guided Parallel Policy Search for Reinforcement Learning" submitted to ICLR 2020. The P3S codes are modified from the code of Soft Actor-Critic (SAC) (https://github.com/haarnoja/sac)

Getting Started

To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.

Clone rllab

cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}

Download and copy mujoco files to rllab path: If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the .dylib files instead of .so files.

mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp

Copy your Mujoco license key (mjkey.txt) to rllab path:

cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco

Go to "p3s" directory

cd <p3s_folder>

Create and activate conda environment

cd p3s # TODO.before_release: update folder name
conda env create -f environment.yml
source activate p3s

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

source deactivate
conda remove --name p3s --all

Examples

Training and simulating an agent

python ./examples/mujoco_all_p3s_td3.py --env=ant
python ./examples/mujoco_all_p3s_td3.py --env=half-cheetah
python ./examples/mujoco_all_p3s_td3.py --env=hopper
python ./examples/mujoco_all_p3s_td3.py --env=walker
python ./examples/mujoco_all_p3s_td3.py --env=delayed_ant
python ./examples/mujoco_all_p3s_td3.py --env=delayed_half-cheetah
python ./examples/mujoco_all_p3s_td3.py --env=delayed_hopper
python ./examples/mujoco_all_p3s_td3.py --env=delayed_walker

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

13.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

000-main-rules

Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce

wyjung0625

View profile

View on GitHub

GitHub Stars22

CategoryEducation

Updated2y ago

Forks5

wyjung0625/p3s

Languages

Python

Security Score

60/100

Audited on Feb 17, 2024

No findings