Regimetry
Unsupervised regime detection for financial time series using embeddings and clustering.
Install / Use
/learn @kjpou1/RegimetryREADME
regimetry
Mapping latent regimes in financial time series.
- regimetry
- 📘 Overview
- 🔍 What is a Regime?
- 🧠 How It Works
- 🚀 Getting Started
- 📚 Documentation
- 📟 Command Line Usage
- 🧪 Example Dataset
- 🛠️ Configuration Files
- ✅ Section: Configuration Files → Example Config
- 🖥️ Interactive Dashboard
- 🛠 Project Structure
- 🧭 Orientation Going Forward
- ✅ Status
- 🔗 Related Projects
- 📖 Further Reading
- 📄 License
- 👤 Author
📘 Overview
regimetry is a modular, unsupervised regime detection engine for financial time series — originally developed as a personal research project to explore latent structure and behavioral transitions in markets.
It combines transformer-based embeddings with clustering and regime structure analysis to help identify and label recurring phases such as trends, reversals, and volatility shifts.
While built for exploratory analysis,
regimetrymay evolve into a foundational component of my broader trading strategy stack.
⚙️ Tech Highlights:
- Transformer encoder with positional encoding
- Attention-based temporal modeling (windowed)
- Spectral clustering on learned embeddings
- Regime structure modeling via Markov transitions, stickiness, and entropy
🔍 What is a Regime?
In regimetry, a regime is a latent, temporally structured pattern in market behavior — characterized by combinations of volatility, trend strength, momentum shifts, and signal alignment. These are not defined by hand, but emerge from patterns discovered in the data.
Formally:
- Regimes are clusters in the embedding space of overlapping market windows (e.g., 30 bars).
- Each embedding is generated via a Transformer encoder that learns internal structure within each window using attention over time.
- Spectral clustering then groups these embeddings into recurring behavioral states the market tends to revisit.
🧠 How It Works
1. Data Ingestion
- Load daily bar data per instrument
- Normalize features (Close, AHMA, LP, LC, etc.)
- Features are typically sourced from
ConvolutionLab,
butregimetryis not dependent on that specific pipeline — any compatible feature set can be used. - Slice into overlapping windows (default: 30 bars, stride 1)
2. Embedding Pipeline
- Each rolling window is passed through a Transformer encoder that uses positional encoding to preserve temporal structure and self-attention to learn nonlinear dependencies within the window.
- This produces a dense, contextualized embedding that reflects local market dynamics.
- The architecture is modular and can be swapped with alternatives such as autoencoders, SimCLR, or CNN-based encoders.
3. Clustering
- Standardize the embeddings
- Cluster them using Spectral Clustering (or another method)
- Assign each window a
regime_id
4. Visualization & Interpretation
- Use t-SNE or UMAP to project embeddings
- Visualize regime transitions over time
- Map regimes back to chart or signal data for strategy insights
🚀 Getting Started
See the full step-by-step guide:
📖 docs/GETTING_STARTED_README.md
Includes:
- Git clone instructions
- Poetry or manual install
- Data ingestion
- Embedding generation
- Regime clustering
- Optional Dash dashboard launch
📘 Regime Detection Window Delay
Because regime labels are assigned based on rolling windows, the cluster ID for the final bars of a dataset cannot be known until the full window is complete.
For example, with a window_size = 30:
- The first 29 bars will not receive a regime ID
- The last 29 bars also do not reflect any future regime change, since there are no forward windows to reclassify them
This introduces a natural lag in regime detection:
- New regimes will only appear after enough time has passed for the model to “observe” a full window in the new market condition.
👉 For more details, see the full explanation: REGIME_DETECTION_README.md
📚 Documentation
- 📘 Getting Started Step-by-step setup, from ingestion to visualization.
- 🧠 Regime Detection Window Logic Explains the natural lag from using rolling windows in clustering.
- 🧭 Regime Assignment & Label Alignment Details how Spectral Clustering labels are aligned across runs using the Hungarian algorithm, with persistent baseline mapping and cluster color stability.
📟 Command Line Usage
Run regimetry pipelines directly from the command line with optional overrides.
🔹 Ingest Data
python launch_host.py ingest \
--signal-input-path examples/EUR_USD_processed_signals.csv
This will:
- Parse the input CSV
- Normalize and structure features
- Save the result to
artifacts/data/processed/
🔹 Generate Embeddings
python launch_host.py embed \
--signal-input-path examples/EUR_USD_processed_signals.csv \
--output-name EUR_USD_embeddings.npy \
--window-size 30 \
--stride 1 \
--encoding-method sinusoidal \
--encoding-style interleaved
This will:
- Apply a rolling window (default: 30 bars, stride: 1 unless overridden)
- Use positional encoding and Transformer to generate embeddings
- Save the result to
embeddings/EUR_USD_embeddings.npy
⚠️ Note: Ensure that
window_sizeis smaller than your dataset length. Ifwindow_size >= len(data), no embeddings will be produced.
Ah — got it. Since --embedding-dim is now used for both learnable and sinusoidal, the description needs to be updated accordingly. Here's the revised table and footnote:
🛠 Available CLI Arguments for embed
| Argument | Description |
| --------------------- | ------------------------------------------------------------------------------- |
| --signal-input-path | Path to the CSV file with feature-enriched signal data |
| --output-name | Optional output file name for the .npy embeddings (default: embeddings.npy) |
| --window-size | Number of time steps per rolling window (default: 30) |
| --stride | Step size between rolling windows (default: 1) |
| --encoding-method | Positional encoding method: sinusoidal (default) or learnable |
| --encoding-style | Sinusoidal encoding format: interleaved (default) or stacked |
| --embedding-dim | Embedding dimension to use for both sinusoidal and learnable encodings |
| --config | Optional YAML config path to override pipeline settings |
| --debug | Enable debug logging |
ℹ️ Note:
--embedding-dimapplies to bothsinusoidalandlearnableencodings. Forsinusoidal, it sets the generated frequency embedding size. Forlearnable, it defines the trainable positional embedding dimension.
🔹 Cluster Regimes
python launch_host.py cluster \
--embedding-path embeddings/EUR_USD_embeddings.npy \
--regime-data-path data/processed/regime_input.csv \
--output-dir reports/EUR_USD \
--window-size 30 \
--n-clusters 3
This will:
- Load precomputed transformer embeddings
- Apply spectral clustering to assign regime IDs
- Align cluster labels with original time-series dat
Related Skills
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
roadmap
A beautifully designed, floating Pomodoro timer that respects your workspace.
progress
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
21PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
