Xlr8

High-performance read acceleration layer for MongoDB. Decomposes large range queries into parallel chunks and executes them using a memory-bounded execution model with a Rust-based backend for CPU-intensive processing. Streams compressed Parquet output for analytics and data-lake ingestion, while integrating with the PyMongo API

Generate Convert Improve

Install / Use

/learn @XLR8-DB/Xlr8

About this skill

Quality Score

0/100

README

<img src="https://raw.githubusercontent.com/XLR8-DB/xlr8/main/.github/XLR8_logo.png" alt="XLR8 Logo" width="360"/> Accelerate MongoDB analytical queries with parallel execution and Parquet caching. Suitable for timeseries data. Faster Queries → Less Memory → Real Savings <a href="https://pypi.org/project/xlr8/"><img src="https://img.shields.io/pypi/v/xlr8.svg" alt="PyPI version"/></a> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python versions"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="License"/></a> <a href="#performance-benchmarks"><img src="https://img.shields.io/badge/performance-3.8x%20faster-c1ff72.svg" alt="Performance"/></a> 🦀 Rust-Backed · ⚡ Up to 4x Faster Queries · 📦 10-12x Compression · 📊 Configurable Memory Limits

Minimal Code Changes

# Before: PyMongo
df = pd.DataFrame(collection.find(query))

# After: XLR8 - just wrap and go!
xlr8_collection = accelerate(collection, schema, mongodb_uri)
                                                ^ Union(str, callback)
df = xlr8_collection.find(query).to_dataframe()

That's it. Same query syntax, same DataFrame output-just faster.

The Problem

When running analytical queries over large MongoDB collections, you encounter two fundamental bottlenecks:

flowchart LR
    subgraph Bottleneck1["I/O Bottleneck"]
        A1[Python] -->|"Single cursor"| B1[MongoDB]
        B1 -->|"Network RTT"| C1[Wait...]
        C1 -->|"Next batch"| A1
    end
    
    subgraph Bottleneck2["CPU Bottleneck"]
        A2[Python GIL] -->|"Holds lock"| B2[BSON decode]
        B2 -->|"Still locked"| C2[Build dict]
        C2 -->|"Still locked"| D2[Next doc]
    end

I/O Bound: PyMongo uses a single cursor, fetching documents one batch at a time. Your CPU sits idle waiting for network round trips.

CPU/GIL Bound: Even with the data in hand, Python's Global Interpreter Lock (GIL) means BSON decoding and DataFrame construction happen on a single core.

These aren't PyMongo limitations-they're inherent to Python's design. XLR8 provides a solution.

How XLR8 Solves It

flowchart LR
    subgraph Solution["XLR8: Rust Backend (GIL-Free) + Tokio Async + Cache-First"]
        direction LR

        Q["Your Query<br/>cursor.to_dataframe(...)"] --> PLAN["Execution plan<br/>chunking + worker count + RAM budget"]
        PLAN --> GIL["Python releases GIL<br/>(py.allow_threads)"]
        GIL --> RT["Rust Backend<br/>Tokio async runtime"]

        RT --> W1["Worker 1<br/>async fetch + BSON→Arrow"]
        RT --> W2["Worker 2<br/>async fetch + BSON→Arrow"]
        RT --> W3["Worker 3<br/>async fetch + BSON→Arrow"]
        RT --> WN["Worker N<br/>async fetch + BSON→Arrow"]

        W1 --> M1{"RAM limit reached?<br/>flush_ram_limit_mb"}
        W2 --> M2{"RAM limit reached?<br/>flush_ram_limit_mb"}
        W3 --> M3{"RAM limit reached?<br/>flush_ram_limit_mb"}
        WN --> MN{"RAM limit reached?<br/>flush_ram_limit_mb"}

        M1 -->|flush| C1["Write Parquet shard<br/>.cache/<hash>/part_0001.parquet"]
        M2 -->|flush| C2["Write Parquet shard<br/>.cache/<hash>/part_0002.parquet"]
        M3 -->|flush| C3["Write Parquet shard<br/>.cache/<hash>/part_0003.parquet"]
        MN -->|flush| CN["Write Parquet shard<br/>.cache/<hash>/part_00NN.parquet"]

        C1 --> READ["Read shards (Arrow/DuckDB)"]
        C2 --> READ
        C3 --> READ
        CN --> READ

        READ --> DF["Assemble final DataFrame"]
    end

XLR8 releases Python's GIL and hands execution to a Rust backend powered by Tokio's async runtime. Multiple workers fetch from MongoDB in parallel, convert BSON to Arrow, and write Parquet shards-all without touching the GIL.

The result? Your analytical queries run upto 4x faster, especially for large result sets.

Installation

pip install xlr8

XLR8 requires Python 3.11+ and includes pre-compiled Rust extensions.

Quick Start

from pymongo import MongoClient
from xlr8 import accelerate, Schema, Types
from datetime import datetime, timezone, timedelta
from bson import ObjectId

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017")
collection = client["iot"]["sensor_readings"]

# Define your schema
schema = Schema(
    time_field="timestamp",
    fields={
        "timestamp": Types.Timestamp("ms", tz="UTC"),
        "device_id": Types.ObjectId(),
        "reading": Types.Any(),  # Handles int, float, string dynamically
    },
    avg_doc_size_bytes=200,
)

# Wrap collection with XLR8
xlr8_col = accelerate(collection, schema=schema, mongo_uri="mongodb://localhost:27017")

# Query like normal PyMongo
cursor = xlr8_col.find({
    "device_id": ObjectId("507f1f77bcf86cd799439011"),
    "timestamp": {"$gte": datetime(2024, 1, 1, tzinfo=timezone.utc),
                  "$lt": datetime(2024, 6, 1, tzinfo=timezone.utc)}
}).sort("timestamp", 1)

# Get DataFrame - parallel fetch, cached for reuse
df = cursor.to_dataframe(
    chunking_granularity=timedelta(days=7),
    max_workers=8,
)

Key Features

🦀 GIL-Free Rust Backend

Python's GIL is released via py.allow_threads(). Rust's Tokio runtime handles async I/O and CPU-intensive work across all cores.

</td> <td width="50%" valign="top">

⚡ Parallel MongoDB Fetching

Queries are split into time-based chunks. Each worker maintains its own MongoDB connection, fetching in parallel.

</td> </tr> <tr> <td width="50%" valign="top">

💾 Query aware cache

Data is stored in the query-hash folder, cursors can be supplied a start and end date to filter through the cache.

</td> <td width="50%" valign="top">

$or queries are automatically split into independent "brackets" that can be executed in parallel.

$or: each branch becomes its own bracket (while shared filters are kept as global constraints).
$in: stays intact within each bracket - MongoDB handles it efficiently with index scans.

Before execution, XLR8 builds an execution plan that detects overlapping brackets (cases where multiple brackets could match the same document) and ensures results are correct and deterministic. This behavior is covered by extensive tests to prevent duplicates or missing rows.

</td> </tr> <tr> <td width="50%" valign="top">

🔀 DuckDB K-Way Merge

When sorting is required, DuckDB performs a GIL-free K-way merge across sorted shards-O(N log K) complexity.

</td> <td width="50%" valign="top">

🐻‍❄️ Pandas & Polars Support

to_dataframe() returns pandas. to_polars() returns native Polars. Choose based on your downstream analytics.

</td> </tr> <tr> <td width="50%" valign="top">

📊 Memory-Controlled Execution

Set flush_ram_limit_mb to control RAM per worker. Process large datasets without OOM errors.

</td> <td width="50%" valign="top">

📤 Stream to Data Lakes

stream_to_callback() partitions data by time and custom fields-perfect for S3/GCS ingestion pipelines.

</td> </tr> </table>

Cloud & Container Benefits

XLR8's architecture provides specific advantages in cloud environments:

flowchart TB
    subgraph Benefits["Compute savings"]
        direction LR
        
        subgraph Speed["Faster Queries"]
            S1[Parallel fetch] --> S2[Reduced container up time]
            S2 --> S3[Lower cloud billable time]
        end

        
        subgraph Memory["Memory Control"]
            M1[Predictable memory usage]
            M1 --> M2[Smaller container instances]
        end
    end

| Benefit | How XLR8 Helps | |---------|----------------| | Reduced container runtime | Parallel execution finishes faster → lower billable seconds | | Cache-first processing | Fetch once, process many times without hitting MongoDB | | Smaller instances | Memory control via flush_ram_limit_mb allows smaller container sizes | | Predictable costs | Consistent memory footprint = consistent billing |

Performance Benchmarks

Real-world benchmarks comparing XLR8 against vanilla PyMongo + pandas on a production-like workload.

Test Environment

| Component | Specification | |-----------|---------------| | MongoDB | Atlas M30 (General), GCP europe-west2 (London) | | Compute | GCP Cloud Run Jobs, 8 vCPU / 32 GB RAM, europe-west2 | | Dataset | Forex candlestick data, 27 currency pairs, ~54K docs/day | | Query | Time-range filter + $in on 27 instruments |

Methodology

PyMongo baseline: Stream cursor → build DataFrames in 300k-row batches → pd.concat()
XLR8: cursor.to_dataframe(max_workers=14, chunking_granularity=4 days, cache_read=False)
Each test runs sequentially to avoid database contention

Results

| Period | Rows | PyMongo Time | XLR8 Time | Speedup | |--------|-----:|-------------:|----------:|:-----------:| | 3 months | 4.8M | 89.5s | 31.1s | 2.9x | | 6 months | 9.8M | 177.4s | 54.1s | 3.3x | | 1 year | 19.7M | 371.2s | 109.3s | 3.4x | | 1.5 years | 29.8M | 555.5s | 157.4s | 3.5x | | 2 years | 39.7M | 760.7s | 204.0s | 3.7x | | 2.5 years | 49.7M | 949.5s | 252.6s | 3.8x |

Visualization

Key Takeaways

Consistent 3-4x speedup across all data sizes
Throughput: XLR8 sustains ~180-195K rows/sec vs PyMongo's ~52-55K rows/sec
Scales linearly: Speedup improves with larger datasets as parallelism amortizes overhead
Memory bounded: Barri