ChessHarness
Inspired by GothamChess's AI chess videos - this is an LLM harness for chess. Fixes the real problems that were causing the models to stumble; proper board state context and move validation. Pit GPT, Gemini, Claude and others head-to-head with live reasoning streams, knockout tournaments, and PGN export.
Install / Use
/learn @Fingolfin7/ChessHarnessQuality Score
Category
Content & MediaSupported Platforms
README
ChessHarness
Pit LLM models against each other in chess. Configure any combination of OpenAI, OpenAI ChatGPT/Codex, Google Gemini, Anthropic, Kimi, or GitHub Copilot Chat models as White and Black — or run a full knockout tournament — then watch them play with move validation, check/checkmate detection, PGN export, and a live reasoning feed showing each model's thinking.

I got the idea to make it after watching GothamChess's series where he makes AI models play each other - noticed that most of the problems were because the models didn't have enough context on the board state or good move validation - and built this.
Features
- Multi-provider — OpenAI, OpenAI ChatGPT/Codex, Google Gemini, Anthropic, Kimi, GitHub Copilot Chat, OpenRouter
- Rich context per turn — FEN + ASCII board, or PNG image for vision models; per-player chat history so models can plan across turns; optional valid-move injection (details)
- Live reasoning panel — see each model's chain-of-thought as it streams in
- Move history — click any move to replay the game from that position
- Knockout tournaments — bracket view, byes, configurable draw handling
- PGN export — optionally annotated with model reasoning
- Custom starting position — pass any FEN to start mid-game
- Reconnecting WebSocket — survives network blips
Screenshots
Game Setup
Pick your models, set board input mode, token limits, and reasoning effort before starting.

Live Game
Board, move history, player panels, and real-time reasoning — all in one view.

The reasoning panels below the board stream each model's thinking as it arrives:

Tournament Setup
Seed up to 16 models into a knockout bracket, choose draw-handling rules, and launch.

Setup
cp config.example.yaml config.yaml # add your API keys
uv run web_main.py # backend on :8000
cd frontend && npm run dev # Vite dev server on :5173
Then open http://localhost:5173.
Configuration
Edit config.yaml to define which models are available. At startup the UI loads all connected providers automatically.
providers:
openai:
api_key: "sk-..."
models:
- id: gpt-5.2
name: "GPT-5.2"
supports_vision: true
google:
api_key: "AIza..."
models:
- id: gemini-3-flash-preview
name: "Gemini 3 Flash (Preview)"
supports_vision: true
anthropic:
api_key: "sk-ant-..."
models:
- id: claude-sonnet-4-6
name: "Claude Sonnet 4.6"
supports_vision: true
Additional providers (openai_chatgpt / Codex, copilot_chat, openrouter) follow the same pattern - see config.example.yaml for full details.
Notes:
max_output_tokensis a per-move/per-response setting, not a full-game budget.- For
openai_chatgpt(Codex endpoint),max_output_tokensmay be ignored because some Codex deployments reject a max-token parameter.
How It Works
The core insight is that the model isn't the bottleneck — the scaffolding is. Every turn each player model receives:
- FEN + ASCII board, or a PNG image for vision-capable models (last move highlighted)
- The full move history of the game
- An optional list of every legal move in the position
- Their own persistent conversation thread across the whole game, so they can build and execute multi-move plans rather than responding in isolation
If a model returns an illegal move it gets a specific error and a correction prompt injected into the next attempt (up to max_retries, default 3). Every move — prompt, legal move list, and raw response — is written to a per-player log in logs/.
See docs/context-handling.md for the full technical breakdown including prompt templates, move extraction, and log format.
Output
| Path | Contents |
|---|---|
| ./games/ | PGN file per game |
| ./logs/ | Full conversation log (prompts + raw responses) per player |
Press Stop Game or Ctrl+C to end a game early — the partial PGN is saved automatically.
Testing
uv run python -m pip install ".[test]"
uv run pytest -q
GitHub Actions runs the same suite on every push and pull request.
Auth
Providers can be connected in two ways:
config.yaml— addapi_keyorbearer_tokenbefore starting the server- Setup screen — paste a token in the Providers panel at runtime (saved to
.chessharness_auth.json)
GitHub Copilot Chat supports a device-flow sign-in ("Sign in with GitHub") directly from the setup screen.
OpenAI ChatGPT/Codex supports "Use Codex Login", which imports your local Codex auth from ~/.codex/auth.json (run codex login first). This is separate from regular OpenAI API-key auth.
For implementation details and flow diagrams, see docs/provider-auth-architecture.md.
Related Skills
bluebubbles
349.2kUse when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".
canvas
349.2kCanvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -
slack
349.2kUse when you need to control Slack from OpenClaw via the slack tool, including reacting to messages or pinning/unpinning items in Slack channels or DMs.
qqbot-channel
349.2kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
