SkillAgentSearch skills...

Regimetry

Unsupervised regime detection for financial time series using embeddings and clustering.

Install / Use

/learn @kjpou1/Regimetry

README

regimetry

Mapping latent regimes in financial time series.

MIT License Python 3.11+ Built with TensorFlow Visualize with Dash Managed with Poetry Development Status Made With ❤️



📘 Overview

regimetry is a modular, unsupervised regime detection engine for financial time series — originally developed as a personal research project to explore latent structure and behavioral transitions in markets.

It combines transformer-based embeddings with clustering and regime structure analysis to help identify and label recurring phases such as trends, reversals, and volatility shifts.

While built for exploratory analysis, regimetry may evolve into a foundational component of my broader trading strategy stack.

⚙️ Tech Highlights:

  • Transformer encoder with positional encoding
  • Attention-based temporal modeling (windowed)
  • Spectral clustering on learned embeddings
  • Regime structure modeling via Markov transitions, stickiness, and entropy

🔍 What is a Regime?

In regimetry, a regime is a latent, temporally structured pattern in market behavior — characterized by combinations of volatility, trend strength, momentum shifts, and signal alignment. These are not defined by hand, but emerge from patterns discovered in the data.

Formally:

  • Regimes are clusters in the embedding space of overlapping market windows (e.g., 30 bars).
  • Each embedding is generated via a Transformer encoder that learns internal structure within each window using attention over time.
  • Spectral clustering then groups these embeddings into recurring behavioral states the market tends to revisit.

🧠 How It Works

1. Data Ingestion

  • Load daily bar data per instrument
  • Normalize features (Close, AHMA, LP, LC, etc.)
  • Features are typically sourced from ConvolutionLab,
    but regimetry is not dependent on that specific pipeline — any compatible feature set can be used.
  • Slice into overlapping windows (default: 30 bars, stride 1)

2. Embedding Pipeline

  • Each rolling window is passed through a Transformer encoder that uses positional encoding to preserve temporal structure and self-attention to learn nonlinear dependencies within the window.
  • This produces a dense, contextualized embedding that reflects local market dynamics.
  • The architecture is modular and can be swapped with alternatives such as autoencoders, SimCLR, or CNN-based encoders.

3. Clustering

  • Standardize the embeddings
  • Cluster them using Spectral Clustering (or another method)
  • Assign each window a regime_id

4. Visualization & Interpretation

  • Use t-SNE or UMAP to project embeddings
  • Visualize regime transitions over time
  • Map regimes back to chart or signal data for strategy insights

🚀 Getting Started

See the full step-by-step guide: 📖 docs/GETTING_STARTED_README.md

Includes:

  • Git clone instructions
  • Poetry or manual install
  • Data ingestion
  • Embedding generation
  • Regime clustering
  • Optional Dash dashboard launch

📘 Regime Detection Window Delay

📄 See: docs/REGIME_DETECTION_README.md

Because regime labels are assigned based on rolling windows, the cluster ID for the final bars of a dataset cannot be known until the full window is complete.

For example, with a window_size = 30:

  • The first 29 bars will not receive a regime ID
  • The last 29 bars also do not reflect any future regime change, since there are no forward windows to reclassify them

This introduces a natural lag in regime detection:

  • New regimes will only appear after enough time has passed for the model to “observe” a full window in the new market condition.

👉 For more details, see the full explanation: REGIME_DETECTION_README.md


📚 Documentation


📟 Command Line Usage

Run regimetry pipelines directly from the command line with optional overrides.

🔹 Ingest Data

python launch_host.py ingest \
  --signal-input-path examples/EUR_USD_processed_signals.csv

This will:

  • Parse the input CSV
  • Normalize and structure features
  • Save the result to artifacts/data/processed/

🔹 Generate Embeddings

python launch_host.py embed \
  --signal-input-path examples/EUR_USD_processed_signals.csv \
  --output-name EUR_USD_embeddings.npy \
  --window-size 30 \
  --stride 1 \
  --encoding-method sinusoidal \
  --encoding-style interleaved

This will:

  • Apply a rolling window (default: 30 bars, stride: 1 unless overridden)
  • Use positional encoding and Transformer to generate embeddings
  • Save the result to embeddings/EUR_USD_embeddings.npy

⚠️ Note: Ensure that window_size is smaller than your dataset length. If window_size >= len(data), no embeddings will be produced.

Ah — got it. Since --embedding-dim is now used for both learnable and sinusoidal, the description needs to be updated accordingly. Here's the revised table and footnote:


🛠 Available CLI Arguments for embed

| Argument | Description | | --------------------- | ------------------------------------------------------------------------------- | | --signal-input-path | Path to the CSV file with feature-enriched signal data | | --output-name | Optional output file name for the .npy embeddings (default: embeddings.npy) | | --window-size | Number of time steps per rolling window (default: 30) | | --stride | Step size between rolling windows (default: 1) | | --encoding-method | Positional encoding method: sinusoidal (default) or learnable | | --encoding-style | Sinusoidal encoding format: interleaved (default) or stacked | | --embedding-dim | Embedding dimension to use for both sinusoidal and learnable encodings | | --config | Optional YAML config path to override pipeline settings | | --debug | Enable debug logging |

ℹ️ Note: --embedding-dim applies to both sinusoidal and learnable encodings. For sinusoidal, it sets the generated frequency embedding size. For learnable, it defines the trainable positional embedding dimension.

🔹 Cluster Regimes

python launch_host.py cluster \
  --embedding-path embeddings/EUR_USD_embeddings.npy \
  --regime-data-path data/processed/regime_input.csv \
  --output-dir reports/EUR_USD \
  --window-size 30 \
  --n-clusters 3

This will:

  • Load precomputed transformer embeddings
  • Apply spectral clustering to assign regime IDs
  • Align cluster labels with original time-series dat

Related Skills

View on GitHub
GitHub Stars12
CategoryProduct
Updated9d ago
Forks7

Languages

Jupyter Notebook

Security Score

95/100

Audited on Mar 15, 2026

No findings