GOODRL

[ICLR 2025] Graph Assisted Offline-Online Deep Reinforcement Learning (GOODRL) for Dynamic Workflow Scheduling (DWS)

Generate Convert Improve

Install / Use

/learn @YifanYang1995/GOODRL

About this skill

Quality Score

0/100

README

[ICLR 2025] Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

This repository contains the implementation of Graph Assisted Offline-Online Deep Reinforcement Learning (GOODRL) for Dynamic Workflow Scheduling (DWS), which addresses the dynamic workflow scheduling problem. The paper introduces three key innovations in dynamic graph representations, decoupled Actor-Critic encoders for RL stability, and training methods for unpredictable changes.

🤩 Citation

If you find GOODRL helpful for your research or applied projects:

@InProceedings{yang2025graph,
    title={Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling},
    author={Yang, Yifan and Chen, Gang and Ma, Hui and Zhang, Cong and Cao, Zhiguang and Zhang, Mengjie},
    booktitle={International Conference on Learning Representations},
    year={2025}
}

🖥️ Requirements

Python >= 3.11.5
PyTorch >= 2.4.1
PyTorch Geometric >= 2.5.3

Install dependencies:

pip install --upgrade pip
pip install rl-zoo3 deap torch_geometric gym joblib openpyxl

Note: The current implementation is CPU-based. GPU adaptations may require modifications. Multi-CPU parallel execution is supported by adjusting the following:

Non-parallel version:
meanFlowTimes = []
for t in range(configs.valid_num):
    meanFlowTime = validation_H(t, configs)
    meanFlowTimes.append(meanFlowTime)
Parallel version:
meanFlowTimes = Parallel(n_jobs=-1)(delayed(validation_H)(t, configs) for t in range(configs.valid_num))

🤔 How to Use the Code

Offline Phase – Imitation Learning

Run the following command for offline imitation learning:

python step1.py --vm_types 6 --each_vm_type_num 4 --arr_rate 5.4 --lr_a 0.0001 --log_interval 1 --max_updates 10

Execute multiple independent runs and select the best-performing actor for the next stage: offline PPO.
The trained actors from this stage can be found in ./validation_data/step1.

Offline Phase – PPO

Run the following command for offline PPO training:

python step2.py --vm_types 6 --each_vm_type_num 4 --arr_rate 5.4 --lr_a 0.0003 --lr_c 0.001 --warmup_critic 200

Execute multiple independent runs and select the best-performing actor for offline testing and as the initialization for the online PPO phase.
The trained actor and critic models from this stage are saved in ./validation_data/step2.

Online Phase – Online PPO

Run the following command for online PPO training:

python step3.py --vm_types 6 --each_vm_type_num 4 --arr_rate 9 --online_start_ac 5_5_5.4 --wf_num 10000 --max_updates 500 --warmup_steps 50000 --lr_a 0.00005 --lr_c 0.0001 --n_epochs 5

In each run, the final result represents the online performance after processing wf_num workflows.

Additionally:

mainGP.py is the training script for GPHH.
mainESRL.py is the training script for ESRL.

✨ Future Updates

The code and documentation will be continuously updated, including multi-objective versions and applications in FJSS environments. For any inquiries, feel free to contact us via 💌.

🫡 Acknowledgements

The implementation references the following works:

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

20.0k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary