Backtest
Distributed quantitative backtesting platform with priority scheduling, Ray-powered parallel execution, and real-time progress tracking.
Install / Use
/learn @xuxiaoxia96/BacktestREADME
Backtest Platform
A distributed quantitative backtesting platform that supports high-throughput, priority-scheduled strategy backtesting with real-time progress tracking.
Architecture

The platform consists of three main services communicating through Kafka and Redis:
- backtest-console — Go HTTP API server that accepts backtest requests, manages task lifecycle, and routes jobs to the engine via Kafka priority queues.
- backtest-engine — Python distributed execution engine powered by Ray, consuming tasks from Kafka and running strategy backtests across a cluster of workers.
- backtest-datasource — Go ETL module that converts raw CSV market data into Parquet format for efficient columnar reads by the engine.
Data Flow
- A client submits a backtest task via the REST API (
backtest-console). - The Task Controller updates task state in MongoDB and produces the job to a Kafka priority topic (
tasks.High / .Normal / .Low). - The backtest-engine-scheduler consumes from Kafka and dispatches jobs to the Ray Executer.
- The Ray Executer fans out the workload across Ray Distribution Workers (one Actor per machine), reading market data from the shared Object Store with zero-copy DataFrame access.
- Results are written back via Kafka (
tasks.Results) and Redis (live progress + checkpoints). - The Task Controller consumes results and notifies the original caller.
Project Structure
backtest/
├── app/
│ ├── backtest-console/ # Go HTTP API server (Hertz)
│ │ ├── api/handler/ # REST handlers: task, strategy, data, dashboard
│ │ ├── api/service/ # Business logic
│ │ ├── api/store/ # MongoDB / Redis persistence
│ │ ├── api/queue/ # Kafka producer/consumer
│ │ ├── api/model/ # Domain models
│ │ └── conf/ # YAML configuration
│ ├── backtest-engine/ # Python distributed engine (Ray)
│ │ ├── service/scheduler/ # Scheduler Register: Dispatcher, Scheduler, Collector
│ │ ├── service/engine/ # Ray Executer: Actor Pool, Data Manager, Fan-out
│ │ ├── components/mq/ # Kafka consumer wrapper
│ │ ├── components/strategies/ # Strategy interface & examples
│ │ └── conf/ # Engine / scheduler / Ray cluster config
│ └── backtest-datasource/ # Go ETL: CSV → Parquet
│ └── datasource/ # CSVSource → Transformers → ParquetSink
├── data/
│ ├── csv/ # Raw market data (input)
│ ├── parquet/ # Converted columnar data
│ └── results/ # Backtest output
├── deployments/
│ ├── docker-compose.yaml # Full stack (console + engine + infra)
│ └── docker-compose.infra.yaml # Infrastructure only (Kafka, MongoDB, Redis)
├── docs/
│ └── architecture.png
├── scripts/ # Kafka topic init, data seeding helpers
└── Makefile
Tech Stack
| Layer | Technology |
|---|---|
| API Server | Go 1.23, Hertz |
| Distributed Execution | Python 3.11+, Ray 2.9+ |
| Data Format | Apache Parquet (parquet-go / PyArrow) |
| Message Queue | Apache Kafka (KRaft mode, Confluent 7.6) |
| State / Progress Cache | Redis 7 |
| Task / Strategy Store | MongoDB 7 |
| Config | Viper + YAML (BACKTEST_ env var overrides) |
| Container | Docker Compose |
Getting Started
Prerequisites
- Docker & Docker Compose
- Go 1.23+
- Python 3.11+
1. Start Infrastructure
make infra-up
Starts Kafka (KRaft), MongoDB, and Redis locally.
2. Initialize Kafka Topics
make init-topics
Creates the required topics: tasks.High, tasks.Normal, tasks.Low, tasks.Cancel, tasks.Results.
3. Prepare Market Data
Place CSV files under data/csv/, then convert to Parquet:
make csv-to-parquet
4. Build & Run
Local development:
# Terminal 1 — API server (default port: 8020)
make run-console
# Terminal 2 — Backtest engine scheduler
make run-engine
Full Docker stack:
make docker-up
5. Run Tests
make test
Configuration
backtest-console reads app/backtest-console/conf/conf.local.yaml by default. All keys can be overridden with environment variables prefixed by BACKTEST_:
server:
host: "0.0.0.0"
port: 8020
mongodb:
uri: "mongodb://localhost:27017"
database: "backtest"
redis:
addr: "localhost:6379"
kafka:
brokers:
- "localhost:9092"
group_id: "backtest-console"
queue:
type: "kafka"
max_pending_tasks: 200
data:
csv_dir: "../../data/csv"
parquet_dir: "../../data/parquet"
result_dir: "../../data/results"
backtest-engine uses app/backtest-engine/conf/scheduler.yaml and engine.yaml. For Ray cluster deployment, see conf/ray_cluster.yaml.
API Overview
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/tasks | Submit a backtest task |
| GET | /api/v1/tasks/:id | Get task status & result |
| DELETE | /api/v1/tasks/:id | Cancel a pending task |
| GET | /api/v1/strategies | List available strategies |
| POST | /api/v1/data/convert | Trigger CSV → Parquet conversion |
| GET | /api/v1/dashboard | Aggregated platform metrics |
Kafka Topics
| Topic | Direction | Description |
|---|---|---|
| tasks.High | console → engine | High-priority backtest jobs |
| tasks.Normal | console → engine | Normal-priority backtest jobs |
| tasks.Low | console → engine | Low-priority backtest jobs |
| tasks.Cancel | console → engine | Task cancellation signals |
| tasks.Results | engine → console | Completed result callbacks |
Ray Cluster Deployment
For multi-machine distributed backtesting, the Ray Executer spawns one Actor per node. Each Actor reads Parquet data from the Ray Object Store (zero-copy shared memory) and runs the strategy independently. Results are merged by the Fan-out/split component before being written back.
See app/backtest-engine/conf/ray_cluster.yaml for cluster configuration.
