SkillAgentSearch skills...

Lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Install / Use

/learn @flowersteam/Lamorel
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

Language Models for Reinforcement Learning - Lamorel

Lamorel is a Python library designed for people eager to use Large Language Models (LLMs) in interactive environments (e.g. RL setups).


** News **

  • 2025/12/11 - Benchmark: We benchmarked lamorel's inference efficiency against transformers and vLLM. Find the results in our wiki.
  • 2025/10/20 - V0.3 in beta:
    • A new version of lamorel (V0.3 ) is available in beta on the branch V0.3
    • It introduces important changes such as:
      • lamorel can now deploy multiple and different LLMs within the same experiment
      • an unsloth backend has been introduced
      • more control on the distributed setup from the config, including which GPU each process can use
      • lamorel now directly relies on torch.distributed instead of Accelerate
    • Don't hesitate to test it and report any issue!
  • 2023/11/21 - V0.2:
    • The support of Decoder-Only models has been largely improved.
    • Optimizations:
      • contexts sent to lamorel are automatically padded, easing the use of custom modules (see examples).
      • batching has been improved.
      • pre_encode_inputs: true now works for all models, allowing one to cache contexts.
      • quantization has been added (please use pip install .[quantization] and set load_in_4bit: true in your config) using bitsandbytes through Accelerate and Transformers.
    • Simply setting load_in_4bit: true in the fongi of PPO_LoRA_finetuning example results in using QLoRA.
    • Tests have been added to ensure scoring and training properly work.
    • PPO_finetuning and PPO_LoRA_finetuning have been improved:
      • gradient accumulation has been fixed.
      • you can now load finetuned weights with loading_path.
      • the environment is now vectorized for faster training.
    • A new example shows how to consider tokens as actions as in an RLHF setup (this example can be used for RLHF purposes by modifying the reward).
  • 2023/07/12: an example showing how to use LoRA through the Peft library for lightweight finetuning has been added.

Why Lamorel?

What is the difference between Lamorel and RLHF libs?

Lamorel was initially designed to easily use LLMs in interactive environments. It is especially made for high throughput using a distributed architecture. The philosophy of Lamorel is to be very permissive and allow as much as possible usage of LLMs while maintaining scaling: the application should run with 1 or N LLMs.

For this reason, it is not specialised neither in RL nor in particular in RLHF. Our examples illustrate how Lamorel can be used for various applications including RLHF-like finetuning. However, one must understand that Lamorel's philosophy means that users must implement themselves what they want to do with the LLM(s).

This is why we advise users knowing in advance they want to do RLHF, especially without any modification of classic implementations, to use libs specialised in RLHF that already come with RL implementations (e.g. RL4LMs, TRL). On the other hand, users more inclined to experiment with implementations or looking for an LLM lib they can use in different projects may prefer Lamorel.

Lamorel's key features

  1. Abstracts the use of LLMs (e.g. tonekization, batches) into simple calls
lm_server.generate(contexts=["This is an examples prompt, continue it with"])
lm_server.score(contexts=["This is an examples prompt, continue it with"], candidates=["a sentence", "another sentence"])
  1. Provides a method to compute the log probability of token sequences (e.g. action commands) given a prompt
  2. Is made for scaling up your experiments by deploying multiple instances of the LLM and dispatching the computation thanks to a simple configuration file
  distributed_setup_args:
    n_rl_processes: 1
    n_llm_processes: 1
  1. Provides access to open-sourced LLMs from the Hugging Face's hub along with Model Parallelism to use multiple GPUs for an LLM instance
  llm_args:
    model_type: seq2seq
    model_path: t5-small
    pretrained: true
    minibatch_size: 4
    pre_encode_inputs: true
    load_in_4bit: false
    parallelism:
      use_gpu: true
      model_parallelism_size: 2
      synchronize_gpus_after_scoring: false
      empty_cuda_cache_after_scoring: false
  1. Allows one to give their own PyTorch modules to compute custom operations (e.g. to add new heads on top of the LLM)
  2. Allows one to train the LLM (or part of it) thanks to a Data Parallelism setup where the user provides its own update method

Distributed and scalable

Lamorel relies on a client-server(s) architecture where your RL script acts as the client sending requests to the LLM server(s). In order to match the computation requirements of RL, Lamorel can deploy multiple LLM servers and dispatch the requests on them without any modification in your code. Distributed scoring

Installation

  1. cd lamorel
  2. pip install .
  3. Use pip install .[quantization] if you want to access 4 bits loading

Examples

We provide a set of examples that use Lamorel in interactive environments:

How to use Lamorel

Lamorel is built of three main components:

  • a Python API to interact with LLMs
  • a configuration file to set the LLM servers
  • a launcher deploying the multiple LLM servers and launching your RL script

Instantiating the server in your RL script

Lamorel leverages hydra for its configuration file. Because of this, you need to add the hydra decorator on top of your main function. Then, you must instantiate the Caller class from Lamorel which will create the object allowing you to interact with the LLM servers. Do not forget to initialize Lamorel once imported with lamorel_init() to initialize the communication with the servers.

import hydra
from lamorel import Caller, lamorel_init
lamorel_init()

@hydra.main(config_path='../config', config_name='config')
def main(config_args):
    lm_server = Caller(config_args.lamorel_args)
    # Do whatever you want with your LLM
    lm_server.close()
if __name__ == '__main__':
    main()

Do not forget to close the connection with servers at the end of your script.

Using the Caller

Once instantiated, you can use the different methods of the Caller object to send requests to your LLMs.

Scoring

First, we provide the score method to compute the log probability of a sequence of tokens (a candidate) given a prompt (context). Lamorel allows to provide multiple candidates for a single context but also to batch this computation for multiple contexts (along with their associated candidates). Using this, one can use a classic vectorized RL setup where at each step, multiple environments running in parallel return their current state and expect an action.

lm_server.score(contexts=["This is an examples prompt, continue it with"], 
                candidates=[["a sentence", "another sentence"]])

Generation

Lamorel also provides a method for text generation. Similarly to the score method, one can give multiple prompts (contexts). Our generate method can use any keyword argument from Transformers API. In addition of the generated texts, it also returns the probability (or log probability if return_logprobs=True is passed) of each generated sequence.

lm_server.generate(contexts=["This is an examples prompt, continue it with"])
lm_server.generate(contexts=["This is an examples prompt, continue it with"], temperature=0.1, max_length=25)

Custom modules

While Lamorel provides two main uses of LLMs (i.e. scoring and generating), we also allow users to provide their own m

Related Skills

View on GitHub
GitHub Stars247
CategoryDesign
Updated8d ago
Forks25

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings