Lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
Install / Use
/learn @flowersteam/LamorelREADME
Language Models for Reinforcement Learning - Lamorel
Lamorel is a Python library designed for people eager to use Large Language Models (LLMs) in interactive environments (e.g. RL setups).
** News **
- 2025/12/11 - Benchmark: We benchmarked lamorel's inference efficiency against transformers and vLLM. Find the results in our wiki.
- 2025/10/20 - V0.3 in beta:
- A new version of lamorel (V0.3 ) is available in beta on the branch V0.3
- It introduces important changes such as:
- lamorel can now deploy multiple and different LLMs within the same experiment
- an unsloth backend has been introduced
- more control on the distributed setup from the config, including which GPU each process can use
- lamorel now directly relies on torch.distributed instead of Accelerate
- Don't hesitate to test it and report any issue!
- 2023/11/21 - V0.2:
- The support of Decoder-Only models has been largely improved.
- Optimizations:
- contexts sent to lamorel are automatically padded, easing the use of custom modules (see examples).
- batching has been improved.
pre_encode_inputs: truenow works for all models, allowing one to cache contexts.- quantization has been added (please use
pip install .[quantization]and setload_in_4bit: truein your config) using bitsandbytes through Accelerate and Transformers.
- Simply setting
load_in_4bit: truein the fongi of PPO_LoRA_finetuning example results in using QLoRA. - Tests have been added to ensure scoring and training properly work.
- PPO_finetuning and PPO_LoRA_finetuning have been improved:
- gradient accumulation has been fixed.
- you can now load finetuned weights with
loading_path. - the environment is now vectorized for faster training.
- A new example shows how to consider tokens as actions as in an RLHF setup (this example can be used for RLHF purposes by modifying the reward).
- 2023/07/12: an example showing how to use LoRA through the Peft library for lightweight finetuning has been added.
Why Lamorel?
What is the difference between Lamorel and RLHF libs?
Lamorel was initially designed to easily use LLMs in interactive environments. It is especially made for high throughput using a distributed architecture. The philosophy of Lamorel is to be very permissive and allow as much as possible usage of LLMs while maintaining scaling: the application should run with 1 or N LLMs.
For this reason, it is not specialised neither in RL nor in particular in RLHF. Our examples illustrate how Lamorel can be used for various applications including RLHF-like finetuning. However, one must understand that Lamorel's philosophy means that users must implement themselves what they want to do with the LLM(s).
This is why we advise users knowing in advance they want to do RLHF, especially without any modification of classic implementations, to use libs specialised in RLHF that already come with RL implementations (e.g. RL4LMs, TRL). On the other hand, users more inclined to experiment with implementations or looking for an LLM lib they can use in different projects may prefer Lamorel.
Lamorel's key features
- Abstracts the use of LLMs (e.g. tonekization, batches) into simple calls
lm_server.generate(contexts=["This is an examples prompt, continue it with"])
lm_server.score(contexts=["This is an examples prompt, continue it with"], candidates=["a sentence", "another sentence"])
- Provides a method to compute the log probability of token sequences (e.g. action commands) given a prompt
- Is made for scaling up your experiments by deploying multiple instances of the LLM and dispatching the computation thanks to a simple configuration file
distributed_setup_args:
n_rl_processes: 1
n_llm_processes: 1
- Provides access to open-sourced LLMs from the Hugging Face's hub along with Model Parallelism to use multiple GPUs for an LLM instance
llm_args:
model_type: seq2seq
model_path: t5-small
pretrained: true
minibatch_size: 4
pre_encode_inputs: true
load_in_4bit: false
parallelism:
use_gpu: true
model_parallelism_size: 2
synchronize_gpus_after_scoring: false
empty_cuda_cache_after_scoring: false
- Allows one to give their own PyTorch modules to compute custom operations (e.g. to add new heads on top of the LLM)
- Allows one to train the LLM (or part of it) thanks to a Data Parallelism setup where the user provides its own update method
Distributed and scalable
Lamorel relies on a client-server(s) architecture where your RL script acts as the client sending requests to the LLM server(s). In order to match the computation requirements of RL, Lamorel can deploy multiple LLM servers and dispatch the requests on them without any modification in your code.

Installation
cd lamorelpip install .- Use
pip install .[quantization]if you want to access 4 bits loading
Examples
We provide a set of examples that use Lamorel in interactive environments:
- SayCan: A SayCan implementation that controls a robot hand in a simulated PickPlace environment.
- PPO_finetuning: A lightweight implementation of the PPO approach introduced in "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning" to finetune an LLM policy in BabyAI-Text.
- PPO_LoRA_finetuning: A lightweight implementation of the PPO approach introduced in "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning" but use LoRA through the Peft library for lightweight finetuning in BabyAI-Text.
- RLHF-like_PPO_LoRA_finetuning: We provide a simple example of PPO finetuning on a LLM which is asked to generate specific token sequences (as in RLHF, each token is a different action)
How to use Lamorel
Lamorel is built of three main components:
- a Python API to interact with LLMs
- a configuration file to set the LLM servers
- a launcher deploying the multiple LLM servers and launching your RL script
Instantiating the server in your RL script
Lamorel leverages hydra for its configuration file. Because of this, you need to add the hydra decorator on top of your main function.
Then, you must instantiate the Caller class from Lamorel which will create the object allowing you to interact with the LLM servers.
Do not forget to initialize Lamorel once imported with lamorel_init() to initialize the communication with the servers.
import hydra
from lamorel import Caller, lamorel_init
lamorel_init()
@hydra.main(config_path='../config', config_name='config')
def main(config_args):
lm_server = Caller(config_args.lamorel_args)
# Do whatever you want with your LLM
lm_server.close()
if __name__ == '__main__':
main()
Do not forget to close the connection with servers at the end of your script.
Using the Caller
Once instantiated, you can use the different methods of the Caller object to send requests to your LLMs.
Scoring
First, we provide the score method to compute the log probability of a sequence of tokens (a candidate) given a prompt (context).
Lamorel allows to provide multiple candidates for a single context but also to batch this computation for multiple contexts (along with their associated candidates). Using this, one can use a classic vectorized RL setup where at each step, multiple environments running in parallel return their current state and expect an action.
lm_server.score(contexts=["This is an examples prompt, continue it with"],
candidates=[["a sentence", "another sentence"]])
Generation
Lamorel also provides a method for text generation. Similarly to the score method, one can give multiple prompts (contexts).
Our generate method can use any keyword argument from Transformers API.
In addition of the generated texts, it also returns the probability (or log probability if return_logprobs=True is passed) of each generated sequence.
lm_server.generate(contexts=["This is an examples prompt, continue it with"])
lm_server.generate(contexts=["This is an examples prompt, continue it with"], temperature=0.1, max_length=25)
Custom modules
While Lamorel provides two main uses of LLMs (i.e. scoring and generating), we also allow users to provide their own m
Related Skills
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
2.0kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
Flyaro-waffle-app
Waffle Delight - Full Stack MERN Application Rules & Documentation Project Overview A comprehensive waffle delivery application built with MERN stack featuring premium UI/UX, admin management, a
