SkillAgentSearch skills...

MiniOneRec

Minimal reproduction of OneRec

Install / Use

/learn @AkaliKong/MiniOneRec

README

<div align="center">

<img src="./assets/logo.png" width="500em" ></img>

An Open-Source Framework for Scaling Generative Recommendation

Python License <a href="https://arxiv.org/abs/2510.24431"><img src="https://img.shields.io/static/v1?label=arXiv&message=Paper&color=red"></a>

<a href="https://arxiv.org/abs/2510.24431">📄 Technical Report</a> | <a href="https://huggingface.co/kkknight/MiniOneRec">🤗 Huggingface</a> | <a href="https://modelscope.cn/models/k925238839/MiniOneRec">🤖 Modelscope</a>

</div>

MiniOneRec is the first fully open-source generative recommendation framework, which provides an end-to-end workflow spanning SID construction, supervised fine-tuning (SFT), and recommendation-oriented reinforcement learning (RL).


📢 Announcement

  • 2026-01-04 — Regarding the potential discrepancies between the reproduced results based on the Instruct model and our reported metrics, please check whether the CC metric in the evaluation log is non-zero (refer to calc.py). If it is non-zero, it indicates that the model is still generating a large number of invalid items, and constrained decoding has not been successful. We suspect this issue may be related to the versions of dependencies such as the transformer library, and we are still investigating the cause to provide a universal solution. In the meantime, you may switch the Instruct model to a base model, such as Qwen2.5-base, to avoid this problem.

  • 2025-12-04 — We update new scripts to support processing the Amazon23 dataset.

  • 2025-12-01 — We fix a bug in data.py that could cause the SID–item alignment task to see the answers in advance. This was because we had previously attempted to use partial trajectories to guide the full SID–item generation and does not affect the model performance.

  • 2025-11-20 — The SID construction method in RQ-Kmeans+ has been updated (first proposed in GPR and this is the first open-source reproduction).

  • 2025-11-19 — We implemented a multi-GPU parallel text-to-embedding method based on Accelerate, which is significantly more efficient than the original version: rq/text2emb/amazon_text2emb.py

  • 2025-11-19 — The SID construction method in constrained-RQ-Kmeans has been updated.

  • 2025-11-07 — Thank you for submitting issues! Based on your feedback, we have released a new implementation. If you encounter any problems while running the code, please update to and consult the latest version first.

  • 2025-11-07 — You can now choose to freeze the LLM parameters during the SFT stage and train only the embeddings for the newly added SID vocabulary.

  • 2025-10-31 — You can now directly download the implementation checkpoints of our MiniOnRec model.

  • 2025-10-31 — The SID construction method in RQ-Kmeans has been updated.


🛠️ Key Techniques

<div align="center"> <img src="./assets/minionerec_framework.png" width=100% ></img> </div>
  • SID Construction: MiniOneRec begins by transforming every product into a compact, semantically meaningful token. It concatenates an item’s title and description, feeds this sentence through a frozen text encoder, and then quantises the resulting embedding with a three-level RQ-VAE.

  • SFT: With all items rewritten as SIDs, the model is first trained in a supervised fashion. It views the chronologically ordered user history as a token sequence and learns, via next-token prediction, to generate the SID of the next product the user is likely to consume. Crucially, this stage is co-trained with a set of language-alignment objectives that map back and forth between natural language and SID space, allowing the recommender to inherit the world knowledge embedded in large language models while grounding that knowledge in discrete item codes.

  • Recommendation-Oriented RL: After SFT, MiniOneRec is further polished with a recommendation-oriented RL phase based on GRPO. Multiple candidate recommendations are generated for each prompt, their rewards are normalised within the group to stabilise gradients, and a KL penalty keeps the updated policy close to its reference. Because the action space is a closed list of item SIDs, the system switches to constrained beam search, which guarantees that every beam is unique and valid, greatly improving sampling efficiency and diversity. The reward signal itself blends a binary correctness term with a rank-aware component that penalises high-probability yet incorrect items more heavily, and can be augmented with collaborative-filtering scores. Together, this pipeline enables MiniOneRec to couple dense linguistic knowledge, achieving a high-performance, lightweight generative recommendation system.


📊 Evaluation

<div align="center"> <img src="./assets/minionerec_main_result.png" width=100% ></img> </div>

🗂️ Repository Overview

| File / Directory | Description | | ------------------------- | ------------------------------------------------------------------------------------------------------------- | | sft.sh | Shell script to start the Supervised Fine-Tuning (SFT) stage | | sft.py | Python implementation of the SFT training loop | | sft_gpr.py | GPR-inspired SFT with Value-Aware Fine-Tuning (VAFT): implements weighted loss based on simulated item value | | rl.sh | Shell script to start the Reinforcement Learning (RL) stage | | rl.py | Python implementation of the RL training loop | | rl_gpr.py | GPR-inspired RL with Hierarchy Enhanced Policy Optimization (HEPO) | | minionerec_trainer.py | MiniOneRec trainer — GRPO-based trainer specialized for generative recommendation | | configs/ | YAML configuration files | | evaluate.sh | One-click offline Top-K evaluation script | | evaluate.py | Evaluation utilities for computing HR@K and NDCG@K. | | LogitProcessor.py | Logit processor for constrained decoding (Python implementation) | | data.py | Data pipeline for SFT and RL training | | convert_dataset.py | Converts an RQ-trained dataset to the SFT-then-RL format | | convert_dataset_gpr.py | GPR-inspired dataset converter: injects simulated heterogeneous tokens (U/E/I/O) to emulate unified input representation | | data/amazon18_data_process.sh | Shell script to filter and preprocess Amazon18 data into an RQ-ready format | | data/amazon18_data_process.py | Python implementation of the Amazon18 data preprocessing pipeline | | data/amazon18_data_process_gpr.py | GPR-inspired Amazon18 preprocessing: extracts heterogeneous features for unified input representation | | data/amazon23_data_process.sh | Shell script to filter and preprocess Amazon23 data into an RQ-ready format | | data/amazon23_data_process.py | Python implementation of the Amazon23 data preprocessing pipeline | | rq/text2emb/amazon_text2emb.sh | Shell script to generate item embeddings (title + description) via emb_model for the Amazon dataset | | rq/text2emb/amazon_text2emb.py | Python implementation of the above embedding generation | | rq/text2emb/amazon_text2emb_gpr.py | GPR-inspired text-to-embedding | | rq/generate_indices.py | Generates the SID file after training an RQ-VAE model | | rq/rqvae.sh | Shell script to train RQ-VAE on Amazon item embeddings | | rq/rqvae.py | Python implementation of RQ-VAE training | | rq/rqkmeans_faiss.py | Python implementation of RQ-Kmeans training based on faiss | | rq/rqkmeans_constrained.py | Python implementation of Constrained RQ-Kmeans | | rq/rqkmeans_constrained.sh | Shell script to train constrained RQ-Kmeans constrained on Amazon item embeddings | | rq/rqkmeans_plus.py | Python implementation of RQ-Kmeans+ | | rq/rqkmeans_plus.sh | Shell script to train RQ-Kmeans+ constrained on Amazon item embeddings | | rq/generate_indices_plus.py | Generates the SID file after training an RQ-Kmeans+ model | | rq/generate_indices_plus.sh | Shell script to generate the SID file after training an RQ-Kmeans+ model | | requirements.txt | List of Python dependencies |


🚀 Quickstart

Use

View on GitHub
GitHub Stars1.4k
CategoryDevelopment
Updated52m ago
Forks186

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings