DiffPPO
Combining Diffusion Models with PPO to Improve Sample Efficiency and Exploration in Reinforcement Learning
Install / Use
/learn @TianciGao/DiffPPOREADME
PPO-DAP: Combining Diffusion Models with PPO to Improve Sample Efficiency and Exploration in Reinforcement Learning
Overview
PPO-DAP is a reinforcement learning framework that integrates diffusion models with Proximal Policy Optimization (PPO) to enhance sample efficiency and exploration capabilities. This project, implemented using the robomimic framework, utilizes the D4RL dataset for experiments, demonstrating improved performance in environments with limited data.

Training Artifacts
All training datasets, pretrained models, training logs, and videos can be accessed through the following Google Drive link:
Google Drive - PPO-DAP Training Artifacts
Project Structure
├── datasets/ # Directory for storing datasets
├── models/ # Pretrained models
├── scripts/ # Scripts for training, evaluation, and visualization
│ ├── train.py # Script for training the model
│ ├── evaluate.py # Script for evaluating the model
│ └── visualize_results.py # Script for visualizing results
├── notebooks/ # Jupyter Notebooks for analysis and visualization
├── configs/ # Configuration files
│ └── PPO.json # Configuration for the PPO algorithm
├── README.md # Project documentation
└── requirements.txt # Python dependencies
Getting Started
Prerequisites
To get started with PPO-DAP, ensure that you have the following software installed:
- Python 3.8
- Conda (optional, but recommended for managing environments)
Installation
-
Clone the repository:
git clone https://github.com/yourusername/PPO-DAP.git cd PPO-DAP -
Create and activate a Python virtual environment:
conda create -n ppo-dap_env python=3.8 conda activate diffppo_env -
Install the required dependencies:
pip install -r requirements.txt
Dataset
The project utilizes the D4RL dataset. You can download the dataset using the provided script:
bash scripts/download_dataset.sh
Alternatively, you can refer to the D4RL documentation for more details.
Usage
Training
To train the model, use the following command:
python scripts/train.py --config configs/PPO.json
Evaluation
After training, evaluate the model's performance using:
python scripts/evaluate.py --model-path models/my_trained_model.pth
Visualization
Visualize the training results with:
python scripts/visualize_results.py --log-dir logs/
Results
The experiments conducted in this project demonstrate that integrating diffusion models to generate synthetic trajectories significantly enhances the sample efficiency and exploration capabilities of the PPO algorithm. Below is an example of the cumulative rewards achieved across different tasks:
Contribution
We welcome contributions to PPO-DAP. If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b new-feature). - Commit your changes (
git commit -am 'Add new feature'). - Push to the branch (
git push origin new-feature). - Create a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
13.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
