SlateQ

A comparison of Google SlateQ algorithm with traditional Reinforcement Learning algorithms

Generate Convert Improve

Install / Use

/learn @collinprather/SlateQ

About this skill

Quality Score

0/100

README

Reinforcement Learning for Recommender Systems

Summary

Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behaviour. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items—which may have interacting effects on user choice—methods are required to deal with the combinatorics of the RL action space.

Google’s SlateQ algorithm addresses this challenge by decomposing the long-term value (LTV) of a slate into a tractable function of its component item-wise LTVs. In this repo, we compare the efficiency of SlateQ to other RL methods like Q-learning that don’t decompose the LTV of a slate into its component-wise LTVs.

Results

Empirically, we've shown that the SlateQ algorithm outperforms traditional Q-learning approaches across multiple metrics in our simulated environment.

results

Environment

Here, we explore the interest evolution environment from RecSim (GitHub repo) library to train RL agents.

Important Links

Contributors

Collin Prather and Shishir Kumar are Master students in Data Science at the University of San Francisco.

Thanks to Prof Brian Spiering for introducing us to this wonderful world of RL.

As governed by the recsim library, this repo uses Python 3.6.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

collinprather

View profile

View on GitHub

GitHub Stars39

CategoryEducation

Updated1mo ago

Forks9

collinprather/SlateQ

Languages

Jupyter Notebook

Security Score

75/100

Audited on Feb 5, 2026

No findings