AmazonReviews2023
Scripts for processing the Amazon Reviews 2023 dataset; implementations and checkpoints of BLaIR: "Bridging Language and Items for Retrieval and Recommendation".
Install / Use
/learn @hyp1231/AmazonReviews2023README
Amazon Reviews 2023
[🌐 Website] · [🤗 Huggingface Datasets] · [📑 Paper] · [🔬 McAuley Lab]
This repository contains:
- Scripts for processing Amazon Reviews 2023 dataset into recommendation benchmarks;
- Checkpoints & implementations for BLaIR: "Bridging Language and Items for Retrieval and Recommendation";
- Scripts for constructing Amazon-C4, a new dataset for evaluating product search performance under complex contexts.
Recommendation Benchmarks
Based on the released Amazon Reviews 2023 dataset, we provide scripts to preprocess raw data into standard train/validation/test splits to encourage benchmarking recommendation models.
More details here -> [datasets & processing scripts]
BLaIR
BLaIR, which is short for "Bridging Language and Items for Retrieval and Recommendation", is a series of language models pre-trained on Amazon Reviews 2023 dataset.
<center> <img src="assets/blair.png" style="width: 75%;"> </center>BLaIR is grounded on pairs of (item metadata, language context), enabling the models to:
- derive strong item text representations, for both recommendation and retrieval;
- predict the most relevant item given simple / complex language context.
More details here -> [checkpoints & code]
Amazon-C4
Amazon-C4, which is short for "Complex Contexts Created by ChatGPT", is a new dataset for the complex product search task.
<center> <img src="assets/amazon-c4-example.png" style="width: 50%;"> </center>Amazon-C4 is designed to assess a model's ability to comprehend complex language contexts and retrieve relevant items.
More details here -> [datasets & code]
Reproduction
- Please refer to seq_rec_results for scripts that can reproduce our results on sequential recommendation.
- Please refer to product_search_results for scripts that can reproduce our results on product search.
Contact
Please let us know if you encounter a bug or have any suggestions/questions by filling an issue or emailing Yupeng Hou (@hyp1231) at yphou@ucsd.edu.
Acknowledgement
If you find Amazon Reviews 2023 dataset, BLaIR checkpoints, Amazon-C4 dataset, or our scripts/code helpful, please cite the following paper.
@article{hou2024bridging,
title={Bridging Language and Items for Retrieval and Recommendation},
author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
journal={arXiv preprint arXiv:2403.03952},
year={2024}
}
The recommendation experiments in the BLaIR paper are implemented using the open-source recommendation library RecBole.
The pre-training scripts refer a lot to huggingface language-modeling examples and SimCSE.
