GOODRL
[ICLR 2025] Graph Assisted Offline-Online Deep Reinforcement Learning (GOODRL) for Dynamic Workflow Scheduling (DWS)
Install / Use
/learn @YifanYang1995/GOODRLREADME
[ICLR 2025] Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling
This repository contains the implementation of Graph Assisted Offline-Online Deep Reinforcement Learning (GOODRL) for Dynamic Workflow Scheduling (DWS), which addresses the dynamic workflow scheduling problem. The paper introduces three key innovations in <font color="green"><b>dynamic graph representations</b></font>, <font color="green"><b>decoupled Actor-Critic encoders for RL stability</b></font>, and <font color="green"><b>training methods for unpredictable changes</b></font>.
🤩 Citation
If you find GOODRL helpful for your research or applied projects:
@InProceedings{yang2025graph,
title={Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling},
author={Yang, Yifan and Chen, Gang and Ma, Hui and Zhang, Cong and Cao, Zhiguang and Zhang, Mengjie},
booktitle={International Conference on Learning Representations},
year={2025}
}
🖥️ Requirements
- Python >= 3.11.5
- PyTorch >= 2.4.1
- PyTorch Geometric >= 2.5.3
Install dependencies:
pip install --upgrade pip
pip install rl-zoo3 deap torch_geometric gym joblib openpyxl
Note: The current implementation is CPU-based. GPU adaptations may require modifications. Multi-CPU parallel execution is supported by adjusting the following:
Non-parallel version:
meanFlowTimes = [] for t in range(configs.valid_num): meanFlowTime = validation_H(t, configs) meanFlowTimes.append(meanFlowTime)Parallel version:
meanFlowTimes = Parallel(n_jobs=-1)(delayed(validation_H)(t, configs) for t in range(configs.valid_num))
🤔 How to Use the Code
Offline Phase – Imitation Learning
Run the following command for offline imitation learning:
python step1.py --vm_types 6 --each_vm_type_num 4 --arr_rate 5.4 --lr_a 0.0001 --log_interval 1 --max_updates 10
- Execute multiple independent runs and select the best-performing actor for the next stage: offline PPO.
- The trained actors from this stage can be found in
./validation_data/step1.
Offline Phase – PPO
Run the following command for offline PPO training:
python step2.py --vm_types 6 --each_vm_type_num 4 --arr_rate 5.4 --lr_a 0.0003 --lr_c 0.001 --warmup_critic 200
- Execute multiple independent runs and select the best-performing actor for offline testing and as the initialization for the online PPO phase.
- The trained actor and critic models from this stage are saved in
./validation_data/step2.
Online Phase – Online PPO
Run the following command for online PPO training:
python step3.py --vm_types 6 --each_vm_type_num 4 --arr_rate 9 --online_start_ac 5_5_5.4 --wf_num 10000 --max_updates 500 --warmup_steps 50000 --lr_a 0.00005 --lr_c 0.0001 --n_epochs 5
- In each run, the final result represents the online performance after processing
wf_numworkflows.
Additionally:
mainGP.pyis the training script for GPHH.mainESRL.pyis the training script for ESRL.
✨ Future Updates
The code and documentation will be continuously updated, including multi-objective versions and applications in FJSS environments. For any inquiries, feel free to contact us via 💌.
🫡 Acknowledgements
The implementation references the following works:
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
20.0kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
