ArcadeAI
No description available
Install / Use
/learn @davepl/ArcadeAIREADME
Tempest AI — Teaching a Neural Network to Play a 1981 Arcade Classic
Tempest AI is a reinforcement learning system that learns to play Atari's Tempest (1981) by watching the game run inside the MAME arcade emulator. A Lua script reads the game's memory every frame, a Python application trains a neural network on the GPU, and the network's decisions are fed back to the game controls — all in real time, at thousands of frames per second.
This README explains the architecture for programmers who may not be deep-learning specialists. No prior RL knowledge is assumed.
Table of Contents
- What This Project Does
- High-Level Architecture
- The MAME Side (Lua)
- The Python Side
- The Socket Bridge
- The Neural Network
- Training Loop
- The Expert System
- Exploration and Learning
- Reward Design
- Live Dashboard
- Running the System
- Project Structure
- Key Configuration
- Keyboard Controls
What This Project Does
The goal is to train an AI agent that plays Tempest better than its own teacher. It does this through a combination of:
- An expert system — a hand-coded rule engine that knows basic Tempest strategy (aim at the nearest enemy, avoid pulsars, dodge fuseballs).
- A deep neural network — a Rainbow DQN variant that starts by imitating the expert, then gradually takes over and discovers strategies the expert never knew.
- Reinforcement learning — the network learns from its own experience: what actions led to high scores and long survival, and what led to death.
High-Level Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ MAME Emulator (one or more instances) │
│ │
│ ┌──────────┐ reads ┌──────────┐ serializes ┌────────────┐ │
│ │ Tempest │───memory───▶│ Lua │───195 floats──▶│ TCP │ │
│ │ ROM │ │ Scripts │ + rewards │ Socket │ │
│ │ │◀──controls──│ │◀──3 bytes──────│ │ │
│ └──────────┘ (fire, └──────────┘ (fire,zap, └─────┬──────┘ │
│ zap, spinner) │ │
│ spinner) │ │
└──────────────────────────────────────────────────────────────┼─────────┘
│ TCP
┌──────────────────────────────────────────────────────────────┼────────┐
│ Python Application │ │
│ ▼ │
│ ┌────────────┐ frames ┌──────────────┐ batches ┌──────────────┐ │
│ │ Socket │─────────▶│ Replay │──────────▶│ Training │ │
│ │ Server │ │ Buffer │ │ Thread │ │
│ │ │◀─action──│ (15M) │ │ (GPU) │ │
│ └────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │ │
│ │ inference ┌──────────────┐ weight sync │ │
│ └─────────────▶│ Rainbow │◀────────────────────┘ │
│ │ DQN Model │ │
│ └──────────────┘ │
│ │
│ ┌────────────────┐ │
│ │ Web Dashboard │ http://localhost:8765 │
│ └────────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
Multiple MAME instances can connect simultaneously. Each one is a separate "client" that plays its own game, generating training data in parallel. The can be on multiple different client machines if desired.
The MAME Side (Lua)
MAME has a built-in Lua scripting engine. When you launch MAME with -autoboot_script, it runs a Lua script that can:
- Read and write the game's memory (the 6502 CPU's address space)
- Override input controls (fire button, zap button, spinner dial)
- Register a callback that runs every frame
What Happens Each Frame
The Lua code (Scripts/main.lua + modules) performs these steps every game frame:
-
Read game state from memory — The script reads ~80 memory addresses to extract everything a human player could see: player position, enemy positions and types, enemy depths, shot positions, spike heights, level geometry, score, lives, and more.
-
Compute the expert action — A rule-based expert system (
Scripts/logic.lua) analyzes the game state and decides what it would do: which segment to aim for, whether to fire or use the superzapper, which direction to spin. -
Calculate rewards — Two reward signals are computed:
- Objective reward: based on score changes (points from killing enemies)
- Subjective reward: based on positioning quality (are you aimed at a threat? are you avoiding danger?)
-
Serialize everything into a binary packet — The 195 game-state values are normalized to
[-1, +1]floats. Along with rewards, expert recommendations, and metadata, they're packed into a ~780-byte binary message. -
Send the packet over a TCP socket — The Lua script connects to the Python server at startup and streams frames continuously.
-
Receive a 3-byte action reply — The Python side responds with:
fire(0 or 1)zap(0 or 1)spinner(signed byte: -32 to +31, controlling rotation speed and direction)
-
Apply the action to the game controls — The fire and zap buttons are set via MAME's I/O port API, and the spinner value is written directly to the memory address that the game reads for dial input.
Lua Module Breakdown
| File | Purpose |
|----------------|----------------------------------------------------------------------------------|
| main.lua | Entry point: frame callback, socket I/O, binary serialization |
| state.lua | Game state classes: reads ~80 memory addresses into structured objects |
| logic.lua | Expert system: rule-based target selection, threat avoidance, reward calculation |
| display.lua | Optional on-screen debug overlay (game state, enemy tables) |
State Vector (195 Features)
The 195 normalized float values sent each frame include:
| Category | Count | Examples | |----------------------|-------|------------------------------------------------------------------------------| | Game state | 5 | gamestate, game mode, countdown, lives, level | | Player | 23 | position, alive, depth, zap uses, 8 shot positions, 8 shot segments | | Level geometry | 35 | level number, open/closed, shape, 16 spike heights, 16 tube angles | | Enemy global | 23 | counts by type, spawn slots, speeds, pulsar state | | Enemy per-slot (×7) | 42 | type, direction, between-segments, moving away, can shoot, split behavior | | Enemy spatial (×7) | 49 | segments, depths, top-of-tube flags, shot positions, pulsar lanes, top-rail | | Danger proximity | 3 | nearest threat depth in player's lane, left, and right | | Enemy velocity (×7) | 14 | per-slot segment delta and depth delta from previous frame |
The Python Side
The Python application (Scripts/main.py) is the brain of the system. It:
- Creates the neural network and loads any previously saved weights
- Starts a TCP socket server to accept connections from MAME instances
- Runs a background training thread that continuously improves the network
- Serves a live web dashboard for monitoring training progress
- Handles keyboard commands for real-time tuning
Threading Model
| Thread | Role |
|----------------------|-----------------------------------------------------------------------------|
| Main thread | Startup, periodic autosave, shutdown coordination |
| Socket server | Accepts MAME connections, spawns per-client handler threads |
| Per-client threads | Receive frames, request inference, send actions, store transitions |
| Training thread | Samples from replay buffer, runs gradient updates on GPU |
| Inference batcher | Collects inference requests across clients, runs batched GPU forward passes |
| Async replay buffer | Queues step() calls so client threads don't block on buffer writes |
| Stats reporter | Prints formatted metrics to the terminal every 30 seconds |
| Dashboard server | HTTP server for the live web UI |
The Socket Bridge
The communication between Lua and Python uses a simple custom binary protocol over TCP:
Lua → Python (per frame)
[2 bytes: payload length (big-endian uint16)]
[payload]:
[2 bytes: num_values (uint16)]
[8 bytes: subjective reward (float64)]
[8 bytes: obj
