Backtest Platform

A distributed quantitative backtesting platform that supports high-throughput, priority-scheduled strategy backtesting with real-time progress tracking.

Architecture

The platform consists of three main services communicating through Kafka and Redis:

backtest-console — Go HTTP API server that accepts backtest requests, manages task lifecycle, and routes jobs to the engine via Kafka priority queues.
backtest-engine — Python distributed execution engine powered by Ray, consuming tasks from Kafka and running strategy backtests across a cluster of workers.
backtest-datasource — Go ETL module that converts raw CSV market data into Parquet format for efficient columnar reads by the engine.

Data Flow

A client submits a backtest task via the REST API (backtest-console).
The Task Controller updates task state in MongoDB and produces the job to a Kafka priority topic (tasks.High / .Normal / .Low).
The backtest-engine-scheduler consumes from Kafka and dispatches jobs to the Ray Executer.
The Ray Executer fans out the workload across Ray Distribution Workers (one Actor per machine), reading market data from the shared Object Store with zero-copy DataFrame access.
Results are written back via Kafka (tasks.Results) and Redis (live progress + checkpoints).
The Task Controller consumes results and notifies the original caller.

Project Structure

backtest/
├── app/
│   ├── backtest-console/       # Go HTTP API server (Hertz)
│   │   ├── api/handler/        # REST handlers: task, strategy, data, dashboard
│   │   ├── api/service/        # Business logic
│   │   ├── api/store/          # MongoDB / Redis persistence
│   │   ├── api/queue/          # Kafka producer/consumer
│   │   ├── api/model/          # Domain models
│   │   └── conf/               # YAML configuration
│   ├── backtest-engine/        # Python distributed engine (Ray)
│   │   ├── service/scheduler/  # Scheduler Register: Dispatcher, Scheduler, Collector
│   │   ├── service/engine/     # Ray Executer: Actor Pool, Data Manager, Fan-out
│   │   ├── components/mq/      # Kafka consumer wrapper
│   │   ├── components/strategies/ # Strategy interface & examples
│   │   └── conf/               # Engine / scheduler / Ray cluster config
│   └── backtest-datasource/    # Go ETL: CSV → Parquet
│       └── datasource/         # CSVSource → Transformers → ParquetSink
├── data/
│   ├── csv/                    # Raw market data (input)
│   ├── parquet/                # Converted columnar data
│   └── results/                # Backtest output
├── deployments/
│   ├── docker-compose.yaml     # Full stack (console + engine + infra)
│   └── docker-compose.infra.yaml # Infrastructure only (Kafka, MongoDB, Redis)
├── docs/
│   └── architecture.png
├── scripts/                    # Kafka topic init, data seeding helpers
└── Makefile

Tech Stack

| Layer | Technology | |---|---| | API Server | Go 1.23, Hertz | | Distributed Execution | Python 3.11+, Ray 2.9+ | | Data Format | Apache Parquet (parquet-go / PyArrow) | | Message Queue | Apache Kafka (KRaft mode, Confluent 7.6) | | State / Progress Cache | Redis 7 | | Task / Strategy Store | MongoDB 7 | | Config | Viper + YAML (BACKTEST_ env var overrides) | | Container | Docker Compose |

Getting Started

Prerequisites

Docker & Docker Compose
Go 1.23+
Python 3.11+

1. Start Infrastructure

make infra-up

Starts Kafka (KRaft), MongoDB, and Redis locally.

2. Initialize Kafka Topics

make init-topics

Creates the required topics: tasks.High, tasks.Normal, tasks.Low, tasks.Cancel, tasks.Results.

3. Prepare Market Data

Place CSV files under data/csv/, then convert to Parquet:

make csv-to-parquet

4. Build & Run

Local development:

# Terminal 1 — API server (default port: 8020)
make run-console

# Terminal 2 — Backtest engine scheduler
make run-engine

Full Docker stack:

make docker-up

5. Run Tests

make test

Configuration

backtest-console reads app/backtest-console/conf/conf.local.yaml by default. All keys can be overridden with environment variables prefixed by BACKTEST_:

server:
  host: "0.0.0.0"
  port: 8020

mongodb:
  uri: "mongodb://localhost:27017"
  database: "backtest"

redis:
  addr: "localhost:6379"

kafka:
  brokers:
    - "localhost:9092"
  group_id: "backtest-console"

queue:
  type: "kafka"
  max_pending_tasks: 200

data:
  csv_dir: "../../data/csv"
  parquet_dir: "../../data/parquet"
  result_dir: "../../data/results"

backtest-engine uses app/backtest-engine/conf/scheduler.yaml and engine.yaml. For Ray cluster deployment, see conf/ray_cluster.yaml.

API Overview

| Method | Path | Description | |---|---|---| | POST | /api/v1/tasks | Submit a backtest task | | GET | /api/v1/tasks/:id | Get task status & result | | DELETE | /api/v1/tasks/:id | Cancel a pending task | | GET | /api/v1/strategies | List available strategies | | POST | /api/v1/data/convert | Trigger CSV → Parquet conversion | | GET | /api/v1/dashboard | Aggregated platform metrics |

Kafka Topics

| Topic | Direction | Description | |---|---|---| | tasks.High | console → engine | High-priority backtest jobs | | tasks.Normal | console → engine | Normal-priority backtest jobs | | tasks.Low | console → engine | Low-priority backtest jobs | | tasks.Cancel | console → engine | Task cancellation signals | | tasks.Results | engine → console | Completed result callbacks |

Ray Cluster Deployment

For multi-machine distributed backtesting, the Ray Executer spawns one Actor per node. Each Actor reads Parquet data from the Ray Object Store (zero-copy shared memory) and runs the strategy independently. Results are merged by the Fan-out/split component before being written back.

See app/backtest-engine/conf/ray_cluster.yaml for cluster configuration.

Backtest

Install / Use

README