LaMP
Codes for papers on Large Language Models Personalization (LaMP)
Install / Use
/learn @LaMP-Benchmark/LaMPREADME
Codes for papers on Large Language Models Personalization (LaMP)
LaMP: When Large Language Models Meet Personalization
This paper highlights the importance of personalization in the current state of natural language understanding and generation and introduces the LaMP benchmark --- a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning across three classification and four text generation tasks. We further propose a retrieval augmentation approach that retrieves personalized items from user profiles to construct personalized prompts for large language models. The experiments conducted to establish fine-tuned and zero-shot baseline results for the benchmark conclude that LMs utilizing profile augmentation outperform their counterparts that do not factor in profile information.
@misc{salemi2023lamp,
title={La{MP}: When Large Language Models Meet Personalization},
author={Alireza Salemi and Sheshera Mysore and Michael Bendersky and Hamed Zamani},
year={2023},
eprint={2304.11406},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation
This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms that solicit feedback from the downstream personalized generation tasks for retrieval optimization--one based on reinforcement learning whose reward function is defined using any arbitrary metric for personalized generation and another based on knowledge distillation from the downstream LLM to the retrieval model. This paper also introduces a pre- and post-generation retriever selection model that decides what retriever to choose for each LLM input. Extensive experiments on diverse tasks from the language model personalization (LaMP) benchmark reveal statistically significant improvements in six out of seven datasets.
@misc{salemi2024optimization,
title={Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation},
author={Alireza Salemi and Surya Kallumadi and Hamed Zamani},
year={2024},
eprint={2404.05970},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Privacy-preserving methods for personalizing large language models (LLMs) are relatively under-explored. There are two schools of thought on this topic: (1) generating personalized outputs by personalizing the input prompt through retrieval augmentation from the user's personal information (RAG-based methods), and (2) parameter-efficient fine-tuning of LLMs per user that considers efficiency and space limitations (PEFT-based methods). This paper presents the first systematic comparison between two approaches on a wide range of personalization tasks using seven diverse datasets. Our results indicate that RAG-based and PEFT-based personalization methods on average yield 14.92% and 1.07% improvements over the non-personalized LLM, respectively. We find that combining RAG with PEFT elevates these improvements to 15.98%. Additionally, we identify a positive correlation between the amount of user data and PEFT's effectiveness, indicating that RAG is a better choice for cold-start users (i.e., user's with limited personal data).
@misc{salemi2024comparingretrievalaugmentationparameterefficientfinetuning,
title={Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models},
author={Alireza Salemi and Hamed Zamani},
year={2024},
eprint={2409.09510},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.09510},
}
Data
You can download all the datasets from the links provided here. However, we provided the minimal ids to generate the dataset using our codes for the Personalized Email Subject Generation because this dataset is not publicly accessible. Follow the following section to generate that dataset.
LaMP 6: Personalized Email Subject Generation (Avocado dataset)
The Avocado dataset is not publicly accessible. However, we provided the samples' id and the code we used to generate our dataset. Therefore, if you get access to the dataset, you can quickly generate the dataset with the same format as the other datasets in LaMP using the following code:
python data/avocado/create_avocado_dataset.py \
--avocado_files_dir \*Address to the directory containing zip files for avocado dataset 'avocado-1.0.2/data/text'*\ \
--extract_addr \*A temp dir to extract the files for creating dataset*\ \
--output_dir \*The directory to generate the final dataset*\ \
--input_question_file_train \*The address to the train_questions.json file we provided in LaMP*\ \
--input_question_file_dev \*The address to the dev_questions.json file we provided in LaMP*\ \
--input_question_file_test \*The address to the test_questions.json file we provided in LaMP*\
Evaluation
The instructions for evaluating your results on the test set are provided here. In order to evaluate your results on the dev set, we provided an evaluation script that can be found here:
Evaluate all tasks together:
python eval/eval_all.py \
--golds_zip /*Address to all gold labels for all tasks zipped in a file*/ \
--preds_zip /*Address to all predictions for all tasks zipped in a file*/ \
--temp_dir /*Address to a temp dir for extracting files*/ \
--output_file /*Address to the results file*/ \
Evaluate one task:
python eval/eval_task.py \
--golds_json /*Address to gold labels for the task as a json file*/ \
--preds_json /*Address to predictions for the task as a json file*/ \
--task_name /*Name of the task [LaMP_1, LaMP_2, LaMP_3, LaMP_4, LaMP_5, LaMP_6, LaMP_7]*/
--output_file /*Address to the results file*/ \
The pred files should follow the exact same format as the gold files:
{
"task" : "/*task name*/",
"golds" : [
{
"id" : "/*sample 1 id*/",
"output" : "/*output of the model for the first sample*/"
},
...,
{
"id" : "/*sample n id*/",
"output" : "/*output of the model for the n'th sample*/"
}
]
}
Personalizing LLMs with RAG (LaMP)
You first need to create an environment for this using the following script:
python3 -m venv lamp_venv
source lamp_venv/bin/activate
pip install -r LaMP/requirements.txt
Ranking Profiles based on the Input
The first step is to sort items in each user profile based on the input for the task:
cd LaMP
python rank_profiles.py \
--input_data_addr /*input questions for one of the LaMP tasks*/ \
--output_ranking_addr /*output address for the generated ranking file*/ \
--task /*name of the task [LaMP-1, LaMP-2, ..., LaMP-7]*/ \
--ranker /*the ranking model to be used [bm25, contriever, recency]*/ \
[optional] --use_date /*the batch size for ranking*/ \
[optional] --use_date \ /*if used, it adds time to the text of each profile item*/
[optional] --contriever_checkpoint /*address to the Contriever checkpoint to be used*/ \
After that, use the following script to sort the profiles in the dataset based on the ranking file:
cd LaMP
python utils/merge_with_rank.py \
--lamp_questions_addr /*address to the LaMP task inputs file*/ \
--lamp_output_addr /*address to the LaMP task outputs file*/ \
--profile_ranking_addr /*address to the generated ranking file from the previous script*/
--merged_output_addr /*address to the sorted dataset using the provided ranking file*/ \
Training LLM with RAG
The next step is to train the LLM on a LaMP task:
cd LaMP
python train_llm.py \
--train_data /*address to sorted training data using the previous step*/ \
--validation_data /*address to sorted validation data using the previous step*/ \
[optional] --test_data /*address to sorted test data using the previous step*/ \
--model_name /*address to the model that should be used for initialization of the LLM*/ \
--task /*name of the task [LaMP-1, LaMP-2, ..., LaMP-7]*/ \
--output_dir /*output directory to save results and checkpoints*/ \
--retriever /*the ranking model to be used [bm25, contriever, recency]*/ \
--use_profile \ /*used to perfrom personalization with RAG */
--is_ranked \ /*used if you pre-ranked the profiles based on the provided retrieval model*/
--num_retrieved /*number of items to be retrieved from the user profile*/ \
Zero-shot Evaluation of LLM with RAG
You can also evaluate the LLMs with the following script:
cd LaMP
python evaluate_llm.py \
--validation_data /*address to sorted validation data using the previous step*/ \
--model_addr /*address to the model that should be used for initialization of the LLM*/ \
--task /*name of the task [LaMP-1, LaMP-2, ..., LaMP-7]*/ \
--output_dir /*output directory to save results */ \
--use_profile \ /*used to perfrom personalization with RAG */
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
last30days-skill
19.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
