Ofcourse

Order Fulfillment by Multi-Agent Reinforcement Learning

Generate Convert Improve

Install / Use

/learn @GitYiheng/Ofcourse

About this skill

Quality Score

0/100

README

OFCOURSE

OFCOURSE is a simulated environment enables multi-agent reinforcement learning for order fulfillment.

OFCOURSE Teaser

Installation

This repository requires Python >= 3.7. Miniconda/Anaconda is our recommended Python distribution.

To get started:

Clone this repository and move to the OFCOURSE directory:

>>> git clone https://github.com/GitYiheng/ofcourse.git && cd ofcourse

Install the dependencies:

>>> pip install -r requirements.txt

Reproducing Paper Results

Task 1 — Fulfillment of Physical and Virtual Orders in One System

>>> sh ./run_exp/exp1/run_exp1_ppo.sh
>>> sh ./run_exp/exp1/run_exp1_happo.sh
>>> sh ./run_exp/exp1/run_exp1_ippo.sh
>>> sh ./run_exp/exp1/run_exp1_clo.sh

Task 2 — Cross-Border Order Fulfillment

>>> sh ./run_exp/exp2/run_exp2_ppo.sh
>>> sh ./run_exp/exp2/run_exp2_happo.sh
>>> sh ./run_exp/exp2/run_exp2_ippo.sh
>>> sh ./run_exp/exp2/run_exp2_clo.sh

For these two tasks, the fulfillment agents are defined in env/define_exp1_env.py and env/define_exp2_env.py.

Training

# file name: main.py
from algo.runner import Runner                          # import runner
from algo.arguments import get_args                     # import argument parser
args = get_args()                                       # parse arguments
runner = Runner(args)                                   # create a runner instance with specified arguments
runner.run()                                            # start learning or evaluation

Train happo on exp1:

>>> python main.py --env=exp1 --algo=happo --mode=learn --log_dir=runs/exp1_happo --seed=10

Monitor the training progress with TensorBoard:

>>> tensorboard --log_dir=runs

Import Existing Environment

OFCOURSE is structured according to the format of OpenAI Gym. It is the standard API to communicate between reinforcement learning algorithms and environments.

from env.exp1_env import Exp1Env                       # import env
env = Exp1()                                           # create an env instance
obs = env.reset()                                      # start a new episode
num_steps = 10                                         # number of steps
for _t in range(num_steps):
    sampled_actions = env.action_space.sample()        # sample actions (not from algo)
    obs, rewards, dones, _ = env.step(sampled_actions) # interact with env
    if all(dones):
        obs = env.reset()                              # start a new episode when current one ends

Customize Environment

Customized fulfillment systems can be constructed in OFCOURSE. Here, we use Task 1 (Fulfillment of Physical and Virtual Orders in One System) from the paper as an example.

Import Modules

import numpy as np
from env.resource import Resource
from env.order import Order
from env.container import Buffer, Inventory
from env.operation import OpStore, OpRoute, OpConsoRoute, OpDispatch
from env.fulfillment_unit import FulfillmentUnit
from env.agent import Agent
from env.order_source import OrderSource

System Variables

Before defining the fulfillment system, we first define the buffer length and inventory capacity.

# ---------- PARAMS ---------- #
buffer_len = 5
inventory_limit = 32

Agents

There are two agents in the fulfillment system. Agent 0 is consisted of 6 fulfillment units and agent 1 is composed of 4 fulfillment units, where they share the first three stages.

# ---------- AGENT 0 ---------- #
agent0 = Agent()
agent0.add_fulfillment_unit(agent0_layer5)
agent0.add_fulfillment_unit(agent0_layer4)
agent0.add_fulfillment_unit(agent0_layer3)
agent0.add_fulfillment_unit(agent0_layer2)
agent0.add_fulfillment_unit(agent0_layer1)
agent0.add_fulfillment_unit(agent0_layer0)

# ---------- AGENT 1 ---------- #
agent1 = Agent()
agent1.add_fulfillment_unit(agent1_layer3)
agent1.add_fulfillment_unit(agent1_layer2)
agent1.add_fulfillment_unit(agent1_layer1)
agent1.add_fulfillment_unit(agent1_layer0)

Fulfillment Stage

Taking the third stage (i.e. the consolidation warehouse) of agent 0 for example, it has two Containers and three Operations. Each Container has its associated Resource, in which we define Resource before attaching it to the corresponding Container. Here, one Container is an Inventory and another Container is a Buffer. In regard to Operations, we have one Operation for storing incoming Orders to the Inventory and two Operations for consolidating and dispatching Orders toward their destinated Buffers.

# 3RD STAGE IN AGENT 0
agent0_layer3 = FulfillmentUnit()
agent0_layer3_inventory_resource = Resource(constraint=32, normal_price=0.6, overage_price=2.0, occupied=0)
agent0_layer3_buffer0_resource = Resource(constraint=-1, normal_price=0.0, overage_price=0.0, occupied=0)
agent0_layer3_inventory = Inventory(resource=agent0_layer3_inventory_resource, inventory_limit=inventory_limit)
agent0_layer3_buffer0 = Buffer(resource=agent0_layer3_buffer0_resource, buffer_len=buffer_len)
agent0_layer3.add_container(container=agent0_layer3_inventory)
agent0_layer3.add_container(container=agent0_layer3_buffer0)
agent0_layer3_op0 = OpStore(buffers_orig=[agent0_layer3_buffer0], inventory_dest=agent0_layer3_inventory, op_price=0.1, op_time=1)
agent0_layer3_op1 = OpConsoRoute(buffers_orig=[agent0_layer3_buffer0], inventory_orig=agent0_layer3_inventory, buffer_dest=agent0_layer4_buffer0, op_price=4.0, op_time=3)
agent0_layer3_op2 = OpConsoRoute(buffers_orig=[agent0_layer3_buffer0], inventory_orig=agent0_layer3_inventory, buffer_dest=agent0_layer4_buffer1, op_price=8.0, op_time=2)
agent0_layer3.add_operation(operation=agent0_layer3_op0)
agent0_layer3.add_operation(operation=agent0_layer3_op1)
agent0_layer3.add_operation(operation=agent0_layer3_op2)

Order Source Management

The order source is a mechanism that takes in the simulation step as its input and generates a set of order instances as its output. Currently, orders are placed according to a prescribed repeating pattern. External order source management will be added soon.

Data Collection and Generation

The fulfillment systems presented in the paper are inspired by practical problems: experiment 1 (fulfillment of physical and virtual orders in one system) originates from Cainiao's domestic fulfillment business and experiment 2 (cross-border order fulfillment) stems from the fulfillment business of AliExpress. Due to data disclosure regulation of the company, synthetic data is used for demonstration, which can be found in exp1 and exp2.

Action Space and Observation Space

See docs/act_obs.md.

Citation

@inproceedings{zhu2023ofcourse,
    title={OFCOURSE: A Multi-Agent Reinforcement Learning Environment for Order Fulfillment},
    author={Yiheng Zhu and Yang Zhan and Xuankun Huang and Yuwei Chen and Yujie Chen and Jiangwen Wei and Wei Feng and Yinzhi Zhou and Haoyuan Hu and Jieping Ye},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2023},
    url={https://openreview.net/forum?id=0RSQEh9lRG}
}

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

GitYiheng

View profile

View on GitHub

GitHub Stars27

CategoryEducation

Updated26d ago

Forks2

GitYiheng/ofcourse

Languages

Python

Security Score

90/100

Audited on Mar 11, 2026

No findings