X2strategy
Extract structured strategy specifications from quantitative finance research papers — Agent Skill for GitHub Copilot & Claude Code
Install / Use
/learn @ALAGENT-HKU/X2strategyQuality Score
Category
Product ManagementSupported Platforms
README
X2Strategy
Any Research Input → Strategy Spec → Executable Code → Backtest → Diagnosis
Getting Started · How It Works · Examples · Docs · 简体中文
Turn quantitative finance research — papers, drafts, reports, or strategy ideas — into validated, executable trading strategies. Automatically.
</div>Highlights
- 🔬 Multi-Format Input — PDF papers, Markdown drafts, DOCX reports, plain text. Auto-detected.
- 🧠 5-Layer LLM Extraction — Multi-strategy detection → indicators → signal logic → execution plan → risk controls.
- ✅ Verified Code Generation — AST validation + Backtrader structural checks + indicator registry, not just "generate and hope".
- 📊 Automated Backtesting — Execute, extract metrics, and diagnose against paper-reported performance.
- 🤖 Agent-Native — Works as an Agent Skill (
/x2strategy) in VS Code Copilot, Claude Code, or any compatible agent. - 💰 ~$0.1 per paper — DeepSeek-powered. Any LiteLLM-supported provider works.
How It Works
┌──────────────────────────────────────────────────────────────┐
│ X2Strategy │
│ │
PDF / MD / DOCX / TXT │ ┌─────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐ │
─────────────────────►│ │ Parse ├──►│ Extract ├──►│ Generate ├──►│Backtest │ │
│ │ (parser) │ │ (L0 → L4) │ │ (code) │ │+ Diagnose││
│ └─────────┘ └───────────┘ └──────────┘ └─────────┘ │
│ ▼ ▼ ▼ ▼ │
│ PaperContent StrategySpec Backtrader.py Report.md │
└──────────────────────────────────────────────────────────────┘
| Stage | Input | Output | What Happens |
|:------|:------|:-------|:-------------|
| Parse | Any document | PaperContent | Format-aware extraction (PyMuPDF / direct read / python-docx) |
| Extract | PaperContent | StrategySpec[] | 5-layer LLM: detect strategies → extract indicators, logic, execution, risk |
| Generate | StrategySpec | strategy.py | Data module → signal module → backtest module → integration |
| Validate | strategy.py | Pass / Fail | AST syntax + Backtrader structure + indicator existence checks |
| Backtest | strategy.py | Metrics | Subprocess execution with timeout, metric extraction |
| Diagnose | Metrics | report.md | Compare against paper-reported results, flag deviations |
Getting Started
Option A: As an Agent Skill (Recommended)
<table> <tr><td><b>GitHub Copilot</b></td><td>Agent Skills is an open standard. Clone into the agent's skill directory — it auto-discovers
SKILL.mdand registers the/x2strategyslash command.
git clone https://github.com/ALAGENT-HKU/x2strategy.git ~/.copilot/skills/x2strategy
</td></tr>
<tr><td><b>Claude Code</b></td><td>
git clone https://github.com/ALAGENT-HKU/x2strategy.git ~/.claude/skills/x2strategy
</td></tr>
<tr><td><b>Project-scoped</b></td><td>
git clone https://github.com/ALAGENT-HKU/x2strategy.git .github/skills/x2strategy
</td></tr>
</table>
Then install dependencies:
cd ~/.copilot/skills/x2strategy # or wherever you cloned
# if you haven't installed uv, run `pip install uv`
uv sync --extra codegen # core + backtrader + yfinance + akshare
[!IMPORTANT] The directory name must be
x2strategy(matching thenamefield inSKILL.md). Once installed, type/x2strategyin chat or the agent auto-activates when relevant.
Option B: Standalone CLI
git clone https://github.com/ALAGENT-HKU/x2strategy.git && cd x2strategy
uv sync --extra codegen # core + backtest
uv sync --extra agent # + FAISS semantic search (for 100+ page papers)
uv sync --extra dev # + pytest
<details>
<summary>pip alternative</summary>
python -m venv .venv && source .venv/bin/activate
pip install -e ".[codegen,agent,dev]"
</details>
Quick Start
# 1. Configure
cp .env.example .env # add your API key (DEEPSEEK_API_KEY recommended)
# 2. Extract strategy specs from any input format
uv run python scripts/analyze.py paper.pdf -o library/my_paper/
uv run python scripts/analyze.py strategy_draft.md -o library/my_draft/
uv run python scripts/analyze.py report.docx -o library/my_report/
# 3. Generate Backtrader code from spec
uv run python scripts/generate.py library/my_paper/spec.json --strategy-index 0
# 4. Validate + backtest
uv run python scripts/validate_strategy.py library/my_paper/strategy.py
uv run python scripts/backtest.py library/my_paper/strategy.py -o library/my_paper/results/
Or use the agent skill — just say:
"Analyze this paper and implement the main strategy" + attach a PDF
The agent handles everything: parsing, extraction, code generation, validation, backtesting, and diagnosis.
Supported Input Formats
| Format | Extensions | Parser | Notes |
|:-------|:-----------|:-------|:------|
| PDF | .pdf | PyMuPDF → Mode A (direct) or Mode B (FAISS) | Full support, covering 95%+ of papers |
| Markdown | .md .markdown | Direct text read | Ideal for strategy drafts and notes |
| Word | .docx | python-docx (uv sync --extra docx) | Internal research reports |
| Plain text | .txt | Direct read | Raw strategy descriptions |
Format is auto-detected from file extension. No configuration needed.
Examples
Pre-generated outputs from real papers are available in examples/:
| Paper | Strategies Detected | Artifacts | |:------|:-------------------|:----------| | Tactical Asset Allocation (Faber 2007) | 1 — GTAA with SMA timing | spec + code | | Pairs Trading (Goncalves-Pinto et al.) | 3 — Distance, Stationarity, Cointegration | spec | | Value and Momentum (Asness et al.) | 2 — Value Factor, Momentum Factor | spec |
<details> <summary>Example output structure</summary>library/tactical_aa/
├── content.json # Parsed paper content
├── content.md # Human-readable paper summary
├── spec.json # Structured strategy specification
├── spec.md # Human-readable spec
├── metadata.json # Run metadata (model, timing, etc.)
├── strategy.py # Generated Backtrader code
├── validation_report.md # AST + structural validation results
└── results/
├── backtest_output.txt
└── diagnosis_report.md
</details>
Project Structure
x2strategy/
├── paper2spec/ # Phase 1: Document → Structured Spec
│ ├── parser.py # Multi-format parser (PDF / MD / DOCX / TXT)
│ ├── extractor.py # PaperContent → ExtractionResult (L0-L4)
│ ├── models.py # Data models (PaperContent, StrategySpec, etc.)
│ ├── prompts.py # 5-layer extraction prompt templates
│ ├── llm.py # LiteLLM unified interface
│ ├── render.py # JSON → Markdown rendering
│ └── search.py # arXiv + SSRN paper search
│
├── spec2code/ # Phase 2: Spec → Code → Backtest → Diagnosis
│ ├── prompts.py # Data / Signal / Backtest / Integration templates
│ ├── validator.py # AST + structural + indicator validation
│ ├── executor.py # Subprocess-based backtest execution
│ ├── analyzer.py # Result comparison + diagnosis report
│ └── models.py # CodeModules, ValidationResult
│
├── references/ # Verified domain knowledge (not LLM hallucinations)
│ ├── backtrader_patterns.md # Source-verified Backtrader patterns
│ ├── indicator_cookbook.md # Official indicator params (from bt source code)
│ ├── data_sources.md # yfinance + akshare API docs
│ ├── paper2spec.md # Paper2Spec deep-dive guide
│ └── spec2code.md # Spec2Code deep-dive guide
│
├── scripts/ # CLI entry points
│ ├── analyze.py # Full paper2spec pipeline
│ ├── generate.py # Full spec2code pipeline
│ └── validate_strategy.py # Standalone validation
│
├── schemas/ # JSON Schema definitions
├── examples/ # Pre-generated reference outputs
├── tests/ # 180+ unit & integration tests
├── SKILL.md # Agent Skill entry point
└── pyproject.toml # Project config & dependencies
Key Design Decisions
<table> <tr> <td width="50%">Why Reference Docs, Not Prompts?
LLMs frequently hallucinate Backtrader API details:
- SMA default
periodis30, not20 - RSI uses
SmoothedMovingAverage, not EMA - BollingerBands lines are
.top/.mid/.bot, not.upper/.lower
Our references/ directory contains source-code-verified knowledge. The agent reads these docs on demand — zero hallucination on API details.
