MRT

Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".

Generate Convert Improve

Install / Use

/learn @CMU-AIRe/MRT

About this skill

Quality Score

0/100

README

Optimizing Test-Time Compute via Meta Reinforcement Finetuning

This repository contains the code for our paper titled "Optimizing Test-Time Compute via Meta Reinforcement Finetuning." In this work, we introduce a novel approach to optimizing test-time compute through meta reinforcement learning, aiming to balance the efficiency and discovery capabilities of Large Language Models (LLMs).

Citation

If you use our work or codebase in your research, please cite our paper:

@misc{qu2025optimizingtesttimecomputemeta,
      title={Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning},
      author={Yuxiao Qu and Matthew Y. R. Yang and Amrith Setlur and Lewis Tunstall and Edward Emanuel Beeching and Ruslan Salakhutdinov and Aviral Kumar},
      year={2025},
      eprint={2503.07572},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.07572},
}

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

flutter-tutor

Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

16.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary