MeepleLM
No description available
Install / Use
/learn @leroy9472/MeepleLMREADME
🎲 MeepleLM
A Virtual Playtester Simulating Diverse Subjective Experiences in Board Games
</div> <p align="center"> <img src="./assets/overview.png" alt="MeepleLM Framework Overview" width="850"/> </p> <p align="center"> <img alt="Python" src="https://img.shields.io/badge/Python-3.8+-blue?logo=python&logoColor=white"> <img alt="License" src="https://img.shields.io/badge/License-MIT-green"> <a href="https://github.com/hiyouga/LLaMA-Factory"><img alt="Training" src="https://img.shields.io/badge/Training-LLaMA--Factory-orange"></a> <a href="https://docs.vllm.ai/en/latest/"><img alt="Inference" src="https://img.shields.io/badge/Inference-vLLM-blueviolet"></a> <a href="https://arxiv.org/abs/2601.07251"><img alt="Paper" src="https://img.shields.io/badge/arXiv-2601.07251-red"></a> </p>📖 Table of Contents
- 📜 Abstract
- 📂 File Structure
- 💾 Datasets
- 🤖 Models & Checkpoints
- 🚀 Training
- ⚡ Inference & Evaluation
- 📄 Citation
📜 Abstract
Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.
📂 File Structure
.
├── assets/ # Project images and figures
├── data/
│ ├── metadata/ # Meta-info (Game IDs, names, BGG stats, splits)
│ ├── finetuning/ # Alpaca-formatted datasets
│ ├── reviews/ # Filtered review data
│ └── rulebooks/ # Structured Markdown rulebooks
├── checkpoints/ # LoRA adapters for MeepleLM & Ablations
├── training/ # YAML configurations for LLaMA-Factory
├── inference/ # Inference scripts (vLLM example)
└── results/ # Generated critiques
💾 Datasets
We provide the complete pipeline data, from raw sources to instruction-tuning ready files.
-
data/metadata/:game_info.json: Mappings of Game ID to metadata (Name, Rank, Weight, Year).test_games_list.json: The official evaluation split (207 games) used in the paper.
-
data/finetuning/: Ready-to-use Alpaca format datasets for SFT. Each folder contains_train.jsonand_test.json.MeepleLM/: Full dataset with MDA CoT reasoning chains.wo_MDA/: Ablation without reasoning chains.wo_Persona/: Ablation without persona profiles.wo_Rulebook/: Ablation without rule context.
-
data/rulebooks/: The corpus of 1,727 processed rulebooks in Markdown format. -
data/reviews/: The filtered high-quality review corpus used to construct the training data.
🤖 Models & Checkpoints
We provide LoRA adapters trained on Qwen3-8B. These can be loaded easily using vLLM.
| Model Variant | Description | Path |
| --- | --- | --- |
| MeepleLM (Ours) | Full model with Persona-conditioning and MDA reasoning. | ./checkpoints/MeepleLM/ |
| w/o MDA | Ablation removing Chain-of-Thought reasoning. | ./checkpoints/wo_MDA/ |
| w/o Persona | Ablation using a generic player prompt. | ./checkpoints/wo_Persona/ |
| w/o Rulebook | Ablation relying solely on internal knowledge. | ./checkpoints/wo_Rulebook/ |
Serving with vLLM
You can serve the model with the LoRA adapter enabled. For example, to serve MeepleLM:
vllm serve Qwen/Qwen3-8B \
--enable-lora \
--lora-modules MeepleLM=checkpoints/MeepleLM \
--served-model-name MeepleLM \
--port 8000
🚀 Training
All models were trained using the LLaMA-Factory framework. We provide the exact YAML configurations used for our experiments in the training/ directory.
To reproduce the training process:
- Install LLaMA-Factory: Please refer to the official repository] for installation instructions.
- Register Datasets:
Add the paths from
data/finetuning/to LLaMA-Factory'sdata/dataset_info.json. - Run Training:
llamafactory-cli train training/train_meeplelm.yaml
(Note: Config files for ablation studies are also provided in the training/ folder.)
⚡ Inference
The inference/ directory contains scripts to generate virtual playtest results.
playtest_inference.py: A sample script designed to work with the MeepleLM checkpoint served via vLLM. It iterates through the test set games, applying the Persona constraints to generate reviews.results/: Stores the output JSON files generated by the model (e.g.,results/inference_meeplelm/).
Note: The provided inference script is configured for the MeepleLM LoRA adapter and local vLLM server. If you wish to evaluate other models or use different API endpoints, please modify the
API_URLandMODEL_NAMEparameters in the script accordingly.
📄 Citation
If you use MeepleLM, the rulebook dataset, or the persona taxonomy in your research, please cite our paper:
@article{li2026meeplelm,
title={MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences},
author={Li, Zizhen and Li, Chuanhao and Wang, Yibin and Feng, Yukang and Sun, Jianwen and Ai, Jiaxin and Zhang, Fanrui and Sun, Mingzhu and Huang, Yifei and Zhang, Kaipeng},
journal={arXiv preprint arXiv:2601.07251},
year={2026}
}
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
