Ofcourse
Order Fulfillment by Multi-Agent Reinforcement Learning
Install / Use
/learn @GitYiheng/OfcourseREADME
OFCOURSE
OFCOURSE is a simulated environment enables multi-agent reinforcement learning for order fulfillment.

Installation
This repository requires Python >= 3.7. Miniconda/Anaconda is our recommended Python distribution.
To get started:
- Clone this repository and move to the OFCOURSE directory:
>>> git clone https://github.com/GitYiheng/ofcourse.git && cd ofcourse
- Install the dependencies:
>>> pip install -r requirements.txt
Reproducing Paper Results
Task 1 — Fulfillment of Physical and Virtual Orders in One System
>>> sh ./run_exp/exp1/run_exp1_ppo.sh
>>> sh ./run_exp/exp1/run_exp1_happo.sh
>>> sh ./run_exp/exp1/run_exp1_ippo.sh
>>> sh ./run_exp/exp1/run_exp1_clo.sh
Task 2 — Cross-Border Order Fulfillment
>>> sh ./run_exp/exp2/run_exp2_ppo.sh
>>> sh ./run_exp/exp2/run_exp2_happo.sh
>>> sh ./run_exp/exp2/run_exp2_ippo.sh
>>> sh ./run_exp/exp2/run_exp2_clo.sh
For these two tasks, the fulfillment agents are defined in env/define_exp1_env.py and env/define_exp2_env.py.
Training
# file name: main.py
from algo.runner import Runner # import runner
from algo.arguments import get_args # import argument parser
args = get_args() # parse arguments
runner = Runner(args) # create a runner instance with specified arguments
runner.run() # start learning or evaluation
Train happo on exp1:
>>> python main.py --env=exp1 --algo=happo --mode=learn --log_dir=runs/exp1_happo --seed=10
Monitor the training progress with TensorBoard:
>>> tensorboard --log_dir=runs
Import Existing Environment
OFCOURSE is structured according to the format of OpenAI Gym. It is the standard API to communicate between reinforcement learning algorithms and environments.
from env.exp1_env import Exp1Env # import env
env = Exp1() # create an env instance
obs = env.reset() # start a new episode
num_steps = 10 # number of steps
for _t in range(num_steps):
sampled_actions = env.action_space.sample() # sample actions (not from algo)
obs, rewards, dones, _ = env.step(sampled_actions) # interact with env
if all(dones):
obs = env.reset() # start a new episode when current one ends
Customize Environment
Customized fulfillment systems can be constructed in OFCOURSE. Here, we use Task 1 (Fulfillment of Physical and Virtual Orders in One System) from the paper as an example.
<p align="center"><img src="figs/physical_virtual.png" height="120"></p>Import Modules
import numpy as np
from env.resource import Resource
from env.order import Order
from env.container import Buffer, Inventory
from env.operation import OpStore, OpRoute, OpConsoRoute, OpDispatch
from env.fulfillment_unit import FulfillmentUnit
from env.agent import Agent
from env.order_source import OrderSource
System Variables
Before defining the fulfillment system, we first define the buffer length and inventory capacity.
# ---------- PARAMS ---------- #
buffer_len = 5
inventory_limit = 32
Agents
There are two agents in the fulfillment system. Agent 0 is consisted of 6 fulfillment units and agent 1 is composed of 4 fulfillment units, where they share the first three stages.
# ---------- AGENT 0 ---------- #
agent0 = Agent()
agent0.add_fulfillment_unit(agent0_layer5)
agent0.add_fulfillment_unit(agent0_layer4)
agent0.add_fulfillment_unit(agent0_layer3)
agent0.add_fulfillment_unit(agent0_layer2)
agent0.add_fulfillment_unit(agent0_layer1)
agent0.add_fulfillment_unit(agent0_layer0)
# ---------- AGENT 1 ---------- #
agent1 = Agent()
agent1.add_fulfillment_unit(agent1_layer3)
agent1.add_fulfillment_unit(agent1_layer2)
agent1.add_fulfillment_unit(agent1_layer1)
agent1.add_fulfillment_unit(agent1_layer0)
Fulfillment Stage
Taking the third stage (i.e. the consolidation warehouse) of agent 0 for example, it has two Containers and three Operations. Each Container has its associated Resource, in which we define Resource before attaching it to the corresponding Container. Here, one Container is an Inventory and another Container is a Buffer. In regard to Operations, we have one Operation for storing incoming Orders to the Inventory and two Operations for consolidating and dispatching Orders toward their destinated Buffers.
# 3RD STAGE IN AGENT 0
agent0_layer3 = FulfillmentUnit()
agent0_layer3_inventory_resource = Resource(constraint=32, normal_price=0.6, overage_price=2.0, occupied=0)
agent0_layer3_buffer0_resource = Resource(constraint=-1, normal_price=0.0, overage_price=0.0, occupied=0)
agent0_layer3_inventory = Inventory(resource=agent0_layer3_inventory_resource, inventory_limit=inventory_limit)
agent0_layer3_buffer0 = Buffer(resource=agent0_layer3_buffer0_resource, buffer_len=buffer_len)
agent0_layer3.add_container(container=agent0_layer3_inventory)
agent0_layer3.add_container(container=agent0_layer3_buffer0)
agent0_layer3_op0 = OpStore(buffers_orig=[agent0_layer3_buffer0], inventory_dest=agent0_layer3_inventory, op_price=0.1, op_time=1)
agent0_layer3_op1 = OpConsoRoute(buffers_orig=[agent0_layer3_buffer0], inventory_orig=agent0_layer3_inventory, buffer_dest=agent0_layer4_buffer0, op_price=4.0, op_time=3)
agent0_layer3_op2 = OpConsoRoute(buffers_orig=[agent0_layer3_buffer0], inventory_orig=agent0_layer3_inventory, buffer_dest=agent0_layer4_buffer1, op_price=8.0, op_time=2)
agent0_layer3.add_operation(operation=agent0_layer3_op0)
agent0_layer3.add_operation(operation=agent0_layer3_op1)
agent0_layer3.add_operation(operation=agent0_layer3_op2)
Order Source Management
The order source is a mechanism that takes in the simulation step as its input and generates a set of order instances as its output. Currently, orders are placed according to a prescribed repeating pattern. External order source management will be added soon.
Data Collection and Generation
The fulfillment systems presented in the paper are inspired by practical problems: experiment 1 (fulfillment of physical and virtual orders in one system) originates from Cainiao's domestic fulfillment business and experiment 2 (cross-border order fulfillment) stems from the fulfillment business of AliExpress. Due to data disclosure regulation of the company, synthetic data is used for demonstration, which can be found in exp1 and exp2.
Action Space and Observation Space
See docs/act_obs.md.
Citation
@inproceedings{zhu2023ofcourse,
title={OFCOURSE: A Multi-Agent Reinforcement Learning Environment for Order Fulfillment},
author={Yiheng Zhu and Yang Zhan and Xuankun Huang and Yuwei Chen and Yujie Chen and Jiangwen Wei and Wei Feng and Yinzhi Zhou and Haoyuan Hu and Jieping Ye},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023},
url={https://openreview.net/forum?id=0RSQEh9lRG}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
18.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
