Text2reward

[ICLR 2024 Spotlight] Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

Generate Convert Improve

Install / Use

/learn @xlang-ai/Text2reward

About this skill

Quality Score

0/100

README

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

Code for paper Text2Reward: Reward Shaping with Language Models for Reinforcement Learning. Please refer to our project page for more demonstrations and up-to-date related resources.

Updates

2023-10-09: We released our code.
2023-09-20: We release the paper and website of text2reward.

Dependencies

To establish the environment, run this code in the shell:

# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0

TroubleShooting

If you have not installed mujoco yet, please follow the instructions from here to install it. After that, please try the following commands to confirm the successful installation:

$ python3
>>> import mujoco_py

If you encounter the following errors when running ManiSkill2, we refer you to read the documents here.
- RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
- Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
- Segmentation fault (core dumped)

Usage

Reimplement

To reimplement our experiment results, you can run the following scripts:

ManiSkill2:

bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh

It's normal to encounter the following warnings:

[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.

MetaWorld:

bash run_oracle.sh
bash run_zero_shot.sh

Generate new reward code

Firstly please add the following environment variable to your .bashrc (or .zshrc, etc.).

export PYTHONPATH=$PYTHONPATH:~/path/to/text2reward

Then navigate to the directory text2reward/code_generation/single_flow and run the following scripts:

# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh

Run new experiment

By default, the run_oracle.sh script above uses the expert-written rewards provided by the environment; the run_zero_shot.sh and run_few_shot.sh scripts use the generated rewards used in our experiments. If you want to run a new experiment based on the reward you provide, just follow the bash script above and modify the --reward_path parameter to the path of your own reward.

Citation

If you find our work helpful, please cite us:

@inproceedings{xietext2reward,
  title={Text2Reward: Reward Shaping with Language Models for Reinforcement Learning},
  author={Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao},
  booktitle={The Twelfth International Conference on Learning Representations}
}

Contributors

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

17.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

xlang-ai

View profile

View on GitHub

GitHub Stars203

CategoryEducation

Updated3d ago

Forks12

xlang-ai/text2reward

Languages

Jupyter Notebook

Security Score

85/100

Audited on Mar 30, 2026

No findings