Siren
Official implementation of Selective Entropy Regularization (SIREN), proposed by paper 'Rethinking Entropy Regularization in Large Reasoning Models'.
Install / Use
/learn @Linn3a/SirenREADME
Selective Entropy Regularization (SIREN) 🧜♀️
<div align="center"> <p> </p> <a href="https://arxiv.org/abs/2509.25133"><img src="https://img.shields.io/badge/arXiv-2509.25133-b31b1b.svg?logo=arxiv&logoColor=white" alt="arXiv:2509.25133"></a> </div>This repository contains the official implementation of Selective Entropy Regularization (SIREN), introduced in our paper: Rethinking Entropy Regularization in Large Reasoning Models.
SIREN addresses the issue of entropy collapse in Reinforcement Learning with Verifiable Reasoning (RLVR) when applying naive entropy regularization to large reasoning models. Built upon the veRL framework, our implementation introduces key modifications to entropy computation, aggregation, and the overall training objective.
Installation ⚙️
We recommend creating a clean conda environment to avoid dependency conflicts.
conda create -n siren python=3.10
conda activate siren
pip install -r requirements.txt
# install verl
cd verl
pip install -e .
Usage 🍽️
Prepare data
huggingface-cli download --repo-type dataset --resume-download Elliott/Openr1-Math-46k-8192 --local-dir data
Running
We provide example scripts for both training and evaluation.
# training
bash exp_scripts/siren.sh
# evaluation
bash exp_scripts/eval.sh
- The training script (siren.sh) contains default hyperparameters and can be customized according to your experimental setup.
Acknowledgement 🫰
We thank the open-source communities behind the following projects for their valuable contributions:
- Frameworks: veRL, vLLM , Math-Verify
- Datasets: MATH, NuminaMath, OpenR1-Math-220k
- Backbones: Qwen2.5-Math, Llama-3.1
Citation 📜
If you find our work useful in your research, please consider citing:
@misc{jiang2025rethinkingentropyregularizationlarge,
title={Rethinking Entropy Regularization in Large Reasoning Models},
author={Yuxian Jiang and Yafu Li and Guanxu Chen and Dongrui Liu and Yu Cheng and Jing Shao},
year={2025},
eprint={2509.25133},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.25133},
}
Related Skills
node-connect
352.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
