Tempest AI — Teaching a Neural Network to Play a 1981 Arcade Classic

Tempest AI is a reinforcement learning system that learns to play Atari's Tempest (1981) by watching the game run inside the MAME arcade emulator. A Lua script reads the game's memory every frame, a Python application trains a neural network on the GPU, and the network's decisions are fed back to the game controls — all in real time, at thousands of frames per second.

This README explains the architecture for programmers who may not be deep-learning specialists. No prior RL knowledge is assumed.

What This Project Does
High-Level Architecture
The MAME Side (Lua)
The Python Side
The Socket Bridge
The Neural Network
Training Loop
The Expert System
Exploration and Learning
Reward Design
Live Dashboard
Running the System
Project Structure
Key Configuration
Keyboard Controls

What This Project Does

The goal is to train an AI agent that plays Tempest better than its own teacher. It does this through a combination of:

An expert system — a hand-coded rule engine that knows basic Tempest strategy (aim at the nearest enemy, avoid pulsars, dodge fuseballs).
A deep neural network — a Rainbow DQN variant that starts by imitating the expert, then gradually takes over and discovers strategies the expert never knew.
Reinforcement learning — the network learns from its own experience: what actions led to high scores and long survival, and what led to death.

High-Level Architecture

┌────────────────────────────────────────────────────────────────────────┐
│  MAME Emulator (one or more instances)                                 │
│                                                                        │
│  ┌──────────┐    reads    ┌──────────┐   serializes   ┌────────────┐   │
│  │ Tempest  │───memory───▶│  Lua     │───195 floats──▶│  TCP       │   │
│  │ ROM      │             │  Scripts │   + rewards    │  Socket    │   │
│  │          │◀──controls──│          │◀──3 bytes──────│            │   │
│  └──────────┘   (fire,    └──────────┘   (fire,zap,   └─────┬──────┘   │
│                  zap,                     spinner)           │         │
│                  spinner)                                    │         │
└──────────────────────────────────────────────────────────────┼─────────┘
                                                               │ TCP
┌──────────────────────────────────────────────────────────────┼────────┐
│  Python Application                                          │        │
│                                                              ▼        │
│  ┌────────────┐  frames  ┌──────────────┐  batches  ┌──────────────┐  │
│  │  Socket    │─────────▶│   Replay     │──────────▶│  Training    │  │
│  │  Server    │          │   Buffer     │           │  Thread      │  │
│  │            │◀─action──│   (15M)      │           │  (GPU)       │  │
│  └────────────┘          └──────────────┘           └──────┬───────┘  │
│        │                                                   │          │
│        │ inference    ┌──────────────┐    weight sync      │          │
│        └─────────────▶│  Rainbow     │◀────────────────────┘          │
│                       │  DQN Model   │                                │
│                       └──────────────┘                                │
│                                                                       │
│  ┌────────────────┐                                                   │
│  │  Web Dashboard │  http://localhost:8765                            │
│  └────────────────┘                                                   │
└───────────────────────────────────────────────────────────────────────┘

Multiple MAME instances can connect simultaneously. Each one is a separate "client" that plays its own game, generating training data in parallel. The can be on multiple different client machines if desired.

The MAME Side (Lua)

MAME has a built-in Lua scripting engine. When you launch MAME with -autoboot_script, it runs a Lua script that can:

Read and write the game's memory (the 6502 CPU's address space)
Override input controls (fire button, zap button, spinner dial)
Register a callback that runs every frame

What Happens Each Frame

The Lua code (Scripts/main.lua + modules) performs these steps every game frame:

Read game state from memory — The script reads ~80 memory addresses to extract everything a human player could see: player position, enemy positions and types, enemy depths, shot positions, spike heights, level geometry, score, lives, and more.
Compute the expert action — A rule-based expert system (Scripts/logic.lua) analyzes the game state and decides what it would do: which segment to aim for, whether to fire or use the superzapper, which direction to spin.
Calculate rewards — Two reward signals are computed:
- Objective reward: based on score changes (points from killing enemies)
- Subjective reward: based on positioning quality (are you aimed at a threat? are you avoiding danger?)
Serialize everything into a binary packet — The 195 game-state values are normalized to [-1, +1] floats. Along with rewards, expert recommendations, and metadata, they're packed into a ~780-byte binary message.
Send the packet over a TCP socket — The Lua script connects to the Python server at startup and streams frames continuously.
Receive a 3-byte action reply — The Python side responds with:
- fire (0 or 1)
- zap (0 or 1)
- spinner (signed byte: -32 to +31, controlling rotation speed and direction)
Apply the action to the game controls — The fire and zap buttons are set via MAME's I/O port API, and the spinner value is written directly to the memory address that the game reads for dial input.

Lua Module Breakdown

| File | Purpose | |----------------|----------------------------------------------------------------------------------| | main.lua | Entry point: frame callback, socket I/O, binary serialization | | state.lua | Game state classes: reads ~80 memory addresses into structured objects | | logic.lua | Expert system: rule-based target selection, threat avoidance, reward calculation | | display.lua | Optional on-screen debug overlay (game state, enemy tables) |

State Vector (195 Features)

The 195 normalized float values sent each frame include:

| Category | Count | Examples | |----------------------|-------|------------------------------------------------------------------------------| | Game state | 5 | gamestate, game mode, countdown, lives, level | | Player | 23 | position, alive, depth, zap uses, 8 shot positions, 8 shot segments | | Level geometry | 35 | level number, open/closed, shape, 16 spike heights, 16 tube angles | | Enemy global | 23 | counts by type, spawn slots, speeds, pulsar state | | Enemy per-slot (×7) | 42 | type, direction, between-segments, moving away, can shoot, split behavior | | Enemy spatial (×7) | 49 | segments, depths, top-of-tube flags, shot positions, pulsar lanes, top-rail | | Danger proximity | 3 | nearest threat depth in player's lane, left, and right | | Enemy velocity (×7) | 14 | per-slot segment delta and depth delta from previous frame |

The Python Side

The Python application (Scripts/main.py) is the brain of the system. It:

Creates the neural network and loads any previously saved weights
Starts a TCP socket server to accept connections from MAME instances
Runs a background training thread that continuously improves the network
Serves a live web dashboard for monitoring training progress
Handles keyboard commands for real-time tuning

Threading Model

| Thread | Role | |----------------------|-----------------------------------------------------------------------------| | Main thread | Startup, periodic autosave, shutdown coordination | | Socket server | Accepts MAME connections, spawns per-client handler threads | | Per-client threads | Receive frames, request inference, send actions, store transitions | | Training thread | Samples from replay buffer, runs gradient updates on GPU | | Inference batcher | Collects inference requests across clients, runs batched GPU forward passes | | Async replay buffer | Queues step() calls so client threads don't block on buffer writes | | Stats reporter | Prints formatted metrics to the terminal every 30 seconds | | Dashboard server | HTTP server for the live web UI |

The Socket Bridge

The communication between Lua and Python uses a simple custom binary protocol over TCP:

Lua → Python (per frame)

[2 bytes: payload length (big-endian uint16)]
[payload]:
    [2 bytes: num_values (uint16)]
    [8 bytes: subjective reward (float64)]
    [8 bytes: obj

ArcadeAI

Install / Use

README