Era

An official code repo of paper Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints.

Generate Convert Improve

Install / Use

/learn @nothingbutbut/Era

About this skill

Quality Score

0/100

README

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

<div align="center" style="font-family: Arial, sans-serif;"> <p> <a href="#🎉news" style="text-decoration: none; font-weight: bold;">🎉 News</a> • <a href="#📖introduction" style="text-decoration: none; font-weight: bold;">📖 Introduction</a> </p> <p> <a href="#🎈citation" style="text-decoration: none; font-weight: bold;">🎈 Citation</a> • <a href="#🌻acknowledgement" style="text-decoration: none; font-weight: bold;">🌻 Acknowledgement</a> • <a href="#📬Contact" style="text-decoration: none; font-weight: bold;">📬 Contact</a> </p> </div> </div>

🎉News

[2025/10/09] Released our Paper on arXiv. See here. We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models.

📖Introduction

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models (LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

🚀 Large Language Models

Entropy and pass@k results for GRPO with ERA

Entropy comparison and pass@k results for GRPO with ERA (ours) versus GRPO. The entropy curves demonstrate that ERA mitigates entropy collapse and establishes a clear lower bound. The pass@k results further indicate that ERA enhances exploration and strengthens the model’s reasoning ability.

For large language models, we apply an activation layer to the logits $z$ to obtain a transformed set $z'$. This layer adaptively modulates the logit values based on the response entropy $H_{\text{resp}}$ and token advantage $A_t$:

$$ z' = \begin{cases} k z & H_{\text{resp}} < \omega_{\text{low}},; A_{t}>0 \ z & \omega_{\text{low}} \leq H_{\text{resp}} \leq \omega_{\text{high}} \ \tfrac{1}{k} z & H_{\text{resp}} > \omega_{\text{high}},; A_{t}>0 \end{cases} $$

To ensure the stability of the policy update, we apply an inverse scaling factor to the advantages of the modified tokens:

$$ A't = \begin{cases} \frac 1k A_t & H{\text{resp}} < \omega_{\text{low}},; A_{t}>0 \ A_t & \omega_{\text{low}} \leq H_{\text{resp}} \leq \omega_{\text{high}} \ k A_t & H_{\text{resp}} > \omega_{\text{high}},; A_{t}>0 \end{cases} $$

This allows ERA to be integrated seamlessly into on-policy algorithms, resulting in the following GRPO objective:

$$ J(\theta) = \mathbb{E}t \left[\mathbb{E}{a_t\sim \pi_\theta(\cdot \mid s_t)} \log \pi_\theta'(a_t\mid s_t), A'_t \right] $$

🦾 Continuous Control

ERA Performance in Continuous Control Benchmarks

Main Results of ERA in Continuous Control. Aggregate normalized performance on HumanoidBench (6 tasks, with SAC), DMC (Humanoid & Dog) (6 tasks, with TD-MPC2), HumanoidBench (8 tasks, with FastSAC) and Mujoco Gym (4 tasks, with PPO). ERA consistently accelerates learning and achieves superior asymptotic performance.

In continuous control, we enforce a minimum entropy on the final policy by constraining the underlying Gaussian's entropy to a higher value. This is achieved by adjusting the Gaussian's standard deviation, $\sigma$. Our activation function $g(\cdot)$ computes the final parameters $(\mu', \sigma')$ as:

$$ \mu' = \mu,\quad \sigma' = \exp\left[\max \left(\log \sigma_{\max} + \frac{\left(\mathcal{H}0' - D\log \sqrt{2\pi e} - D \log \sigma{\max}\right) e^{\hat{\sigma}i}}{\sum{j=1}^{D} e^{\hat{\sigma}j}}, \log \sigma{\min}\right)\right] $$

Here, $\mathcal{H}_0'$ is the target entropy plus a compensation parameter $\delta \ge 0$ to account for the bounding bias. This parameter can be a constant or automatically tuned by minimizing the following loss:

$$ L(\hat{\delta}) = \mathbb{E}_{s \sim \mathcal{D}} \left[\hat{\delta}\left(\mathcal{H}[\pi(\cdot\mid s)] - \mathcal{H}_0\right)\right] $$

Policy Visualization

<div align="center"> <table> <tr> <td align="center">Dog Run<br><img src="./docs/static/images/dog-run.gif" alt="Dog Run" width="160"></td> <td align="center">Dog Walk<br><img src="./docs/static/images/dog-walk.gif" alt="Dog Walk" width="160"></td> <td align="center">Humanoid Run<br><img src="./docs/static/images/humanoid-run.gif" alt="Humanoid Run" width="160"></td> <td align="center">Humanoid Walk<br><img src="./docs/static/images/humanoid-walk.gif" alt="Humanoid Walk" width="160"></td> </tr> <tr> <td align="center">H1 Run<br><img src="./docs/static/images/h1-run.gif" alt="H1 Run" width="160"></td> <td align="center">H1 Walk<br><img src="./docs/static/images/h1-walk.gif" alt="H1 Walk" width="160"></td> <td align="center">H1 Slide<br><img src="./docs/static/images/h1-slide.gif" alt="H1 Slide" width="160"></td> <td align="center">H1 Stand<br><img src="./docs/static/images/h1-stand.gif" alt="H1 Stand" width="160"></td> </tr> </table> </div>

🖼️ Image Classification

In discrete classification, regularizing predictive entropy is crucial for preventing overconfidence. For a softmax policy, we transform the pre-activation logits $z$ into $z'$ to ensure the policy's entropy is at least a target value $\mathcal{H}_0$:

$$ z' = h^{-1}\left[\max \left(\frac{\log \tau}{\tau} + \left(C_{\mathcal{H}0} - n \frac{\log \tau}{\tau}\right)\frac{1}{D-1}\left(1 - \frac{e^{z_i}}{\sum{j=1}^{D} e^{z_j}}\right),, 0\right)\right] $$

Unlike label smoothing which applies uniform regularization, ERA allows the model to learn a structured, input-dependent uncertainty distribution, tailoring the regularization to each sample for greater expressive capacity and improved performance.

Performance on ImageNet and CIFAR-10

Performance Table: ImageNet and CIFAR-10

Top-1 and Top-5 accuracy (%) on ImageNet and CIFAR-10. We compare ERA against the original ResNet-50 baseline. Δ denotes the absolute improvement of ERA. All models are trained for 200 epochs.

Comparison with Other Regularization Methods

To investigate the effectiveness of ERA against common regularization methods, we conducted comparative experiments on CIFAR-10 against various intensities of Label Smoothing and Dropout. The results below show that increasing label smoothing intensity can harm performance, and dropout offers marginal gains. In contrast, ERA consistently and effectively enhances model performance, validating its advantage over conventional regularization methods.

Comparison of different regularization methods on the CIFAR-10 dataset

🎈Citation

If you find this work useful in your research, please consider citing:

@misc{kang2025entropyregularizingactivationboosting,
      title={Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints}, 
      author={Zilin Kang and Chonghua Liao and Tingqiang Xu and Huazhe Xu},
      year={2025},
      eprint={2510.08549},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.08549}, 
}

🌻 Acknowledgement

Our code is built upon the following open-source projects. We sincerely thank the authors for their contributions to the community.

We also thank the following people for their valuable discussions and suggestions:

📬 Contact

For questions, discussion, or collaboration opportunities, feel free to contact:

Zilin Kang (kzl22@mails.tsinghua.edu.cn)
Chonghua Liao (lch22@mails.tsinghua.edu.cn)
Tingqiang Xu (xtq23@mails.tsinghua.edu.cn)

Related Skills

node-connect

352.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。