SkillAgentSearch skills...

ThetaEvolve

ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time compute

Install / Use

/learn @ypwang61/ThetaEvolve
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <!-- Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen -->

ThetaEvolve: Test-time Learning on Open Problems

Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du*, Yelong Shen*

<br>

paper Code X_Summary

<!-- [![📁_W&B_LOGS](https://img.shields.io/badge/📁_W&B_LOGS-fcd022?style=for-the-badge&logo=wandb&logoColor=000)](https://wandb.ai/yipingwanguw/verl_few_shot?nw=nwuseryipingwang22) --> <!-- [![Models/Dataset](https://img.shields.io/badge/Models/Dataset-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/ypwang61/one-shot-rlvr-6827f72c3359b2ffe75fc1a8) --> </div>

Outline

We introduce ThetaEvolve, an open-source pipeline that simplifies (e.g., with single LLM) and extends AlphaEvolve to efficiently scale both ❄️in-context learning and 🔥RL training at test time.

With ThetaEvolve, an 8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀:

⭕Circle packing:

  • AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276

  • Ours (R1-Qwen3-8B): 2.63598308

Figure1

Setup

Our RL environment follows the same setup as slime and OpenEvolve. We use Docker (run in the ThetaEvolve folder):

# Reproducible setup (recommended): pin the exact image digest.
# This digest corresponds to slimerl/slime:latest at the time of writing.
SLIME_IMAGE="slimerl/slime@sha256:704eb14e1b02ef229e4ab440981aa81b543716c335e2af32cb32ffdc030e3008"
docker pull "${SLIME_IMAGE}"

# Start the container
docker run --rm --name slime-evolve \
  --gpus all --ipc=host --shm-size=16g \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  --ulimit nofile=1048576:1048576 \
  -v "$PWD":/workspace -w /workspace \
  -v /path/to/disk:/data \
  -it "${SLIME_IMAGE}" /bin/bash

If you explicitly want the newest image instead of reproducibility, you can use:

docker pull slimerl/slime:latest

latest is mutable and may change over time. For reproducible experiments, always use the pinned digest.

You can verify the exact digest pulled on your machine with:

docker inspect --format='{{join .RepoDigests "\n"}}' slimerl/slime

After entering the docker, run the installation commands:

cd /workspace
pip install -e .
cd openevolve_adapted
pip install --ignore-installed blinker
rm -rf openevolve.egg-info && pip install -e .
cd ..

Tasks

You could check our tasks in openevolve_adapted/examples. It is easy to extend to more tasks with continous objective values.

Run

To run the experiments, you could change the parameters in run.sh, and then directly run bash run.sh (Notably, for 8B model, we need at least 8x80G GPUs like A100s).

Fist, remember to set the save_path to store ckpts:

export SAVE_PATH=/path/to/disk/save

Then for example, if you want to run prorl-v2-1.5B, circle packing, RL training, original score as reward, you could set:

#### Model selection ####
SMALL_MODEL_NAME="dpsk_prorl_v2_1.5b"

#### Task configuration ####
TASK="circle_packing_modular"

#### CONFIG_POSTFIX options ####
CONFIG_POSTFIX="it_XL"

#### Training mode: True for training, False for inference-only ####
IS_TRAINING=True

#### Training parameters ####
# Options: "original_reward", "rl_normalized_reward"
REWARD_PROCESS_TYPE="original_reward"

#### Lazy output penalty ####
# 1 -> child = parent
# 2 -> child = any program in database
LAZY_OUTPUT_PENALTY=1

Finally set the wandb configurations:

WANDB_API_KEY=aaa
WANDB_ENTITY=bbb
WANDB_PROJECT=ccc

Then you can directly run

bash run.sh

Recommended logging for future reference

Use a fixed data root and keep per-run metadata + logs:

export SAVE_PATH=/data/thetaevolve
mkdir -p "${SAVE_PATH}"/{runs,logs}

RUN_TS=$(date +%Y%m%d_%H%M%S)
RUN_LOG_DIR="${SAVE_PATH}/runs/${RUN_TS}"
mkdir -p "${RUN_LOG_DIR}"

# Save reproducibility info
git rev-parse HEAD > "${RUN_LOG_DIR}/git_commit.txt"
cp run.sh "${RUN_LOG_DIR}/run.sh.snapshot"
cp scripts_evolve/Nemotron-Research-Reasoning-Qwen-1.5B/general.sh "${RUN_LOG_DIR}/general.sh.snapshot"

# Launch and tee logs
bash run.sh 2>&1 | tee "${RUN_LOG_DIR}/train.log"

This preserves the exact run script/config used for each experiment.

You could also adjust more parameters in scripts_evolve/Nemotron-Research-Reasoning-Qwen-1.5B/general.sh. Like ckpt saving frequency (default 10), number of evaluation threads (default 16), gpus (default 8), etc.

Results

Some results we obtain are available in Results. You can run python vis.py to see the verification results in each sub-task directory.

For example, we have our best-known solution for circle packing (with zero tolerance) in Results/CirclePacking/figs/8B-w_RL@65-Formal.png and AlphaEvolve's solution in Results/CirclePacking/figs/AlphaEvolve.png:

<div align="center"> <img src="Results/CirclePacking/figs/8B-w_RL@65-Formal.png" width="49%"> <img src="Results/CirclePacking/figs/AlphaEvolve.png" width="47%"> </div>

We point out that our solution is better than AlphaEvolve’s, and that our configuration is asymmetric, whereas AlphaEvolve’s solution is symmetric.

The program for finding it (with 1e-6 tolerance as OpenEvolve verification, detailed in paper) is shown in Results/CirclePacking/programs/8B-w_RL@65.py. For the formal one (without tolerance as AlphaEvolve), the program is shown in Results/CirclePacking/programs/8B-w_RL@65-Formal.py. The later one has a specific function for determing the size for shrinking radii, but in general, you could get close results by shrinking radii with values like 1e-9.

We also provide results from other tasks for visualization.

If you want to run these programs or the initial program, you could try to assign the parameters from config file.

TASK="circle_packing_modular"

CONFIG_POSTFIX="it_XL"

# # test command with verifier
OPENEVOLVE_CONFIG_PATH=$PWD/examples/${TASK}/configs/config_${TASK}_${CONFIG_POSTFIX}.yaml \
PYTHONPATH=$PWD \
python $PWD/examples/${TASK}/evaluators/evaluator_modular.py \
$PWD/examples/${TASK}/initial_programs/initial_program.py

Or you could just replace the parameters to directly rerun.

Citation

If you find our work useful, please consider citing:

@article{wang2025thetaevolve,
  title={ThetaEvolve: Test-time Learning on Open Problems},
  author={Wang, Yiping and Su, Shao-Rong and Zeng, Zhiyuan and Xu, Eva and Ren, Liliang and Yang, Xinyu and Huang, Zeyi and He, Xuehai and Ma, Luyao and Peng, Baolin and Cheng, Hao and He, Pengcheng and Chen, Weizhu and Wang, Shuohang and Du, Simon Shaolei and Shen, Yelong},
  journal={arXiv preprint 2511.23473},
  year={2025}
}

Related Skills

View on GitHub
GitHub Stars139
CategoryEducation
Updated5d ago
Forks14

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings