MeepleLM

No description available

Generate Convert Improve

Install / Use

/learn @leroy9472/MeepleLM

About this skill

Quality Score

0/100

README

🎲 MeepleLM

A Virtual Playtester Simulating Diverse Subjective Experiences in Board Games

</div> <p align="center"> <img src="./assets/overview.png" alt="MeepleLM Framework Overview" width="850"/> </p> <p align="center"> <img alt="Python" src="https://img.shields.io/badge/Python-3.8+-blue?logo=python&logoColor=white"> <img alt="License" src="https://img.shields.io/badge/License-MIT-green"> <a href="https://github.com/hiyouga/LLaMA-Factory"><img alt="Training" src="https://img.shields.io/badge/Training-LLaMA--Factory-orange"></a> <a href="https://docs.vllm.ai/en/latest/"><img alt="Inference" src="https://img.shields.io/badge/Inference-vLLM-blueviolet"></a> <a href="https://arxiv.org/abs/2601.07251"><img alt="Paper" src="https://img.shields.io/badge/arXiv-2601.07251-red"></a> </p>

📜 Abstract

Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.

📂 File Structure

.
├── assets/                    # Project images and figures
├── data/
│   ├── metadata/              # Meta-info (Game IDs, names, BGG stats, splits)
│   ├── finetuning/            # Alpaca-formatted datasets
│   ├── reviews/               # Filtered review data
│   └── rulebooks/             # Structured Markdown rulebooks
├── checkpoints/               # LoRA adapters for MeepleLM & Ablations
├── training/                  # YAML configurations for LLaMA-Factory
├── inference/                 # Inference scripts (vLLM example)
└── results/                   # Generated critiques

💾 Datasets

We provide the complete pipeline data, from raw sources to instruction-tuning ready files.

data/metadata/:
- game_info.json: Mappings of Game ID to metadata (Name, Rank, Weight, Year).
- test_games_list.json: The official evaluation split (207 games) used in the paper.
data/finetuning/: Ready-to-use Alpaca format datasets for SFT. Each folder contains _train.json and _test.json.
- MeepleLM/: Full dataset with MDA CoT reasoning chains.
- wo_MDA/: Ablation without reasoning chains.
- wo_Persona/: Ablation without persona profiles.
- wo_Rulebook/: Ablation without rule context.
data/rulebooks/: The corpus of 1,727 processed rulebooks in Markdown format.
data/reviews/: The filtered high-quality review corpus used to construct the training data.

🤖 Models & Checkpoints

We provide LoRA adapters trained on Qwen3-8B. These can be loaded easily using vLLM.

| Model Variant | Description | Path | | --- | --- | --- | | MeepleLM (Ours) | Full model with Persona-conditioning and MDA reasoning. | ./checkpoints/MeepleLM/ | | w/o MDA | Ablation removing Chain-of-Thought reasoning. | ./checkpoints/wo_MDA/ | | w/o Persona | Ablation using a generic player prompt. | ./checkpoints/wo_Persona/ | | w/o Rulebook | Ablation relying solely on internal knowledge. | ./checkpoints/wo_Rulebook/ |

Serving with vLLM

You can serve the model with the LoRA adapter enabled. For example, to serve MeepleLM:

vllm serve Qwen/Qwen3-8B \
    --enable-lora \
    --lora-modules MeepleLM=checkpoints/MeepleLM \
    --served-model-name MeepleLM \
    --port 8000

🚀 Training

All models were trained using the LLaMA-Factory framework. We provide the exact YAML configurations used for our experiments in the training/ directory.

To reproduce the training process:

Install LLaMA-Factory: Please refer to the official repository] for installation instructions.
Register Datasets: Add the paths from data/finetuning/ to LLaMA-Factory's data/dataset_info.json.
Run Training:

llamafactory-cli train training/train_meeplelm.yaml

(Note: Config files for ablation studies are also provided in the training/ folder.)

⚡ Inference

The inference/ directory contains scripts to generate virtual playtest results.

playtest_inference.py: A sample script designed to work with the MeepleLM checkpoint served via vLLM. It iterates through the test set games, applying the Persona constraints to generate reviews.
results/: Stores the output JSON files generated by the model (e.g., results/inference_meeplelm/).

Note: The provided inference script is configured for the MeepleLM LoRA adapter and local vLLM server. If you wish to evaluate other models or use different API endpoints, please modify the API_URL and MODEL_NAME parameters in the script accordingly.

📄 Citation

If you use MeepleLM, the rulebook dataset, or the persona taxonomy in your research, please cite our paper:

@article{li2026meeplelm,
  title={MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences},
  author={Li, Zizhen and Li, Chuanhao and Wang, Yibin and Feng, Yukang and Sun, Jianwen and Ai, Jiaxin and Zhang, Fanrui and Sun, Mingzhu and Huang, Yifei and Zhang, Kaipeng},
  journal={arXiv preprint arXiv:2601.07251},
  year={2026}
}

Related Skills

node-connect

349.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。