SkillAgentSearch skills...

LEGION

Official implementation of paper on Nature Machine Intelligence: "Preserving and Combining Knowledge in Robotic Lifelong Reinforcement Learning"

Install / Use

/learn @Ghiara/LEGION
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

License: MIT Python 3.6+ Code style: black Zulip Chat DOI

Preserving and Combining Knowledge in Robotic Lifelong Reinforcement Learning

<div align="center">

Official implementation of LEGION: A Language Embedding based Generative Incremental Off-policy Reinforcement Learning Framework with Non-parametric Bayes

[Project Website] [Paper]

Yuan Meng<sup>1,*</sup>, Zhenshan Bing<sup>1,2,*,†</sup>, Xiangtong Yao<sup>1,*</sup>, Kejia Chen<sup>1</sup>,

Kai Huang<sup>3,†</sup>, Yang Gao<sup>2,†</sup>, Fuchun Sun<sup>4,†</sup>, Alois Knoll<sup>1</sup>.

</div> <p align="center"> <small><sup>1</sup>School of Computation, Information and Technology, Technical University of Munich, Germany</small> <br><small><sup>2</sup>State Key Laboratory for Novel Software Technology, Nanjing University, China</small> <br><small><sup>3</sup>Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University, China</small> <br><small><sup>4</sup>Department of Computer Science and Technology, Tsinghua University, China</small> <small><br><sup>*</sup>Indicates Equal Contribution</small> <small><br><sup>&dagger;</sup>To whom correspondence should be addressed; E-mail: zhenshan.bing@tum.de, huangk36@mail.sysu.edu.cn, gaoy@nju.edu.cn, fcsun@tsinghua.edu.cn</small> </p>

Repository Agenda

  1. Introduction

  2. Installation & Setup

  3. Training and Evaluation

  4. Repository Structure

  5. Data Availability

  6. Acknowledgements

Introduction

Humans can continually accumulate knowledge and develop increasingly complex behaviors and skills throughout their lives, which is a capability known as lifelong learning. Although this lifelong learning capability is considered an essential mechanism that makes up generalized intelligence, recent advancements in artificial intelligence predominantly excel in narrow, specialized domains and generally lack of this lifelong learning capability. Our study introduces a robotic lifelong reinforcement learning framework that addresses this gap by incorporating a non-parametric Bayesian model into the knowledge space. Additionally, we enhance the agent's semantic understanding of tasks by integrating language embeddings into the framework. Our proposed embodied agent can consistently accumulate knowledge from a continuous stream of one-time feeding tasks. Furthermore, our agent can tackle challenging real-world long-horizon tasks by combining and reapplying its acquired knowledge from the original tasks stream. The proposed framework advances our understanding of the robotic lifelong learning process and may inspire the development of more broadly applicable intelligence.

LEGION long horizon task demonstration

<!-- [![Movie1](/docs/static/images/movie_cover.png "Long horzion task demonstration")](https://www.cit.tum.de/cit/startseite/) -->

Movie1

LEGION Framework for Training

  • Training: Our framework receives language semantic information and environment observations as input to make policy decisions and output action patterns, it trains on only one task at a time. L represents the loss functions and is explained in the Method section Upstream task inference. train

LEGION Framework for Deployment

  • Deployment: In the real-world demonstration, the agent parameters remain frozen, the agent receives input signal from real-world hardware and outputs corresponding action signals, both sim2real and real2sim modules process the data to align the gap between the simulation and real world. deployment

Installation

To set up the repository, follow the steps below:

  • Clone the repository: git clone https://github.com/Ghiara/LEGION.git.

  • Please refer to INSTALL.md for detailed installation steps and environment setup.

[!TIP] This project works best with the following versions: mujoco200, mujoco-py==2.0.2.8, gym=0.20.0, protobuf==3.20.0, cython<3. It's recommended to install these dependencies manually before installing the MetaWorld environment to avoid compatibility issues.

Train

To reproduce the results reported in our paper, we provide a separate file containing the exact training command lines used during our experiments.

[!IMPORTANT] Please follow the instructions in TRAIN_EVAL.md to run the code properly.

For example, to train the LEGION framework with lifelong learning mode:

python3 -u main.py \
setup=continuouslearning \
env=metaworld-mt10 \
env.use_kuka_env=True \
agent=sac_dpmm \
agent.encoder.type_to_select=vae \
agent.encoder.vae.should_reconstruct=True \
agent.multitask.should_use_task_encoder=True \
agent.multitask.should_use_disentangled_alpha=True \
agent.multitask.encoder_input_setup=context_obs \
agent.multitask.dpmm_cfg.dpmm_update_start_step=10000 \
agent.multitask.dpmm_cfg.dpmm_update_freq=100000 \
agent.multitask.dpmm_cfg.kl_div_update_freq=50 \
agent.multitask.dpmm_cfg.sF=0.00001 \
agent.multitask.dpmm_cfg.beta_kl_z=0.001 \
experiment.training_mode=crl_queue \
experiment.should_reset_optimizer=True \
experiment.should_reset_critics=False \
experiment.should_reset_vae=False \
experiment.eval_freq=10000 \
experiment.num_eval_episodes=10 \
experiment.num_train_steps=1000000 \
agent.multitask.num_envs=10 \
experiment.save_video=False \
setup.seed=0 \
setup.device=cuda:0

FileStructure

We use Hydra to manage the training process.

  • The configs for all instances can be found under config folder.
  • The agent implementation can be found under mtrl/agent folder.
  • The enviroments can be found at mtrl/env.
  • The training script is implemented at mtrl/experiment.

The detailed structure of this project is shown as follows:

LEGION
    |- config                               -- config files folder
    |- metadata                             -- language embedding folder
    |- mtrl                                 -- implementation of our agent
        |- agent
            |- components
                |- actor.py                 -- downstream SAC actor
                |- bnp_model.py             -- Bayesian nonparametric model
                |- critic.py                -- downstream SAC critic
                |- decoder.py               -- upstream decoder for dynamic/semantic rebuild
                |- encoder.py               -- upstream encoder task inference
                |- task_encoder.py          -- upstream encoder for language processing
            ...
            |- sac_dpmm.py                  -- our LEGION agent implementation
            ...
        |- env                              -- environment builder utils
        |- experiment
            ...
            |- continuouslearning.py        -- our implementation of training script
            ...           
    |- scripts
        |- INSTALL.md                       -- package installation guidelines
        |- TRAIN_EVAL.md                    -- training & evaluation clis
    |- source (after followed INSTALL.md)
        |- bnpy                             -- third party Bayesian non-parametric library
        |- Metaworld-KUKA-IIWA-R800         -- third party metaworld environment
        |- mtenv                            -- third party environment manager library
    main.py                                 -- main entry of repository
    README.md                               -- this file

Data

The original training and evaluation data that we presented in the paper are avaiable at here.

Acknowledgements

  • Project file pre-commit, mypy config, towncrier config, circleci etc are based on same files from Hydra.

  • Implementation Inherited from MTRL library.

  • Documentation of MTRL repository refer to: https://mtrl.readthedocs.io.

Citation

To cite this article:

@article{meng2025preserving,
  title={Preserving and combining knowledge in robotic lifelong reinforcement learning},
  author={Meng, Yuan and Bing, Zhenshan and Yao, Xiangtong and Chen, Kejia and Huang, Kai and Gao, Yang and Sun, Fuchun and Knoll, Alois},
  journal={Nature Machine Intelligence},
  pages={1--14},
  year={2025},
  publisher={Nature Publishing Group UK London}
}
View on GitHub
GitHub Stars121
CategoryEducation
Updated17d ago
Forks13

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings