SkillAgentSearch skills...

Py13x

A high-performance Python package built specifically for Python 3.13, and true free-threaded (no-GIL) execution. py13x provides utilities for parallel processing, memory management, profiling, and thread-safe operations optimized for GIL-free Python.

Install / Use

/learn @Donny-GUI/Py13x
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

py13x

Python 3.13+ Free-threaded License: MIT

A high-performance Python package built specifically for Python 3.13+ and true free-threaded (no-GIL) execution. py13x provides utilities for parallel processing, memory management, profiling, and thread-safe operations optimized for GIL-free Python.

🚀 Features

  • True Parallelism: Take full advantage of Python 3.13's free-threaded runtime
  • High-Performance Thread Pool: Lightweight pool optimized for CPU-bound parallel work
  • Parallel Patterns: Built-in map, fanout, and pipeline patterns
  • Memory Management: Thread-local arenas, reusable buffers, and Rust-inspired ownership
  • Profiling Tools: Lock contention detection, performance counters, and execution tracing
  • Thread-Safe Primitives: Atomic integers, barriers, and synchronization utilities
  • Runtime Detection: Verify free-threaded mode and system capabilities
  • Graceful Shutdown: Signal handling for clean application termination

📋 Requirements

  • Python 3.13+ with free-threaded runtime enabled
  • Use python3.13t executable or build Python with --disable-gil

📦 Installation

pip install py13x

Or install from source:

git clone https://github.com/yourusername/py13x.git
cd py13x
pip install -e .

🔍 Quick Start

Verify Free-Threaded Runtime

from py13x.runtime import assert_free_threaded

# Ensure you're running in free-threaded mode
assert_free_threaded()

Parallel Map

from py13x.parallel import pmap
from py13x.runtime import cpu_count

def compute_heavy(x):
    return x ** 2 + x ** 0.5

data = range(1000)
results = pmap(compute_heavy, data, workers=cpu_count())

Thread Pool

from py13x.threading import ParallelThreadPool
from py13x.runtime import cpu_count

pool = ParallelThreadPool(workers=cpu_count())

# Submit tasks
futures = [pool.submit(expensive_function, arg) for arg in args]

# Collect results
results = [f.result() for f in futures]

pool.shutdown()

Atomic Operations

from py13x.threading import AtomicInt
from threading import Thread

counter = AtomicInt(0)

def worker():
    for _ in range(1000):
        counter.inc()

threads = [Thread(target=worker) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Final count: {counter.get()}")  # 10000

Performance Tracing

from py13x.profiling import trace

with trace("data processing"):
    result = process_large_dataset()
# Output: data processing: 0.123456s

Graceful Shutdown

from py13x.utils import shutdown

# Install signal handlers
shutdown.install()

# In worker loops
while not shutdown.should_exit():
    process_batch()
    
cleanup()

📚 Module Reference

py13x.runtime

Runtime detection and system information:

  • assert_free_threaded(): Verify Python is running in free-threaded mode
  • cpu_count(): Get available CPU cores
  • set_thread_affinity(cpu): Pin thread to specific CPU core (Linux only)

py13x.threading

Thread-safe primitives and pools:

  • ParallelThreadPool: High-performance thread pool
  • AtomicInt: Thread-safe integer with atomic operations
  • barrier(parties): Synchronization barrier for coordinating threads

py13x.parallel

High-level parallel patterns:

  • pmap(fn, data, workers): Parallel map operation
  • fanout(fn, items, workers): Fan-out pattern for fire-and-forget tasks
  • PipelineStage: Multi-stage pipeline processing

py13x.memory

Memory management utilities:

  • ThreadArena: Thread-local storage arenas
  • ReusableBuffer: Pre-allocated reusable buffers
  • Owned[T]: Rust-inspired move semantics for ownership tracking

py13x.profiling

Performance profiling tools:

  • trace(label): Context manager for execution time tracing
  • Counter: Throughput counter with rate calculation
  • ContentionProbe: Lock contention measurement

py13x.utils

Utility functions:

  • shutdown.install(): Install graceful shutdown signal handlers
  • shutdown.should_exit(): Check if shutdown was requested
  • BoundedQueue: Queue with automatic backpressure

🎯 Use Cases

CPU-Bound Parallel Processing

from py13x.parallel import pmap
from py13x.runtime import cpu_count, assert_free_threaded

assert_free_threaded()

def cpu_intensive(data):
    # Heavy computation without GIL interference
    return complex_calculation(data)

results = pmap(cpu_intensive, large_dataset, workers=cpu_count())

Pipeline Processing

from py13x.parallel import PipelineStage
from queue import Queue

# Create pipeline stages
input_queue = Queue()
stage1_output = Queue()
stage2_output = Queue()

stage1 = PipelineStage(parse_data, input_queue, stage1_output)
stage2 = PipelineStage(transform_data, stage1_output, stage2_output)

stage1.start()
stage2.start()

# Feed data into pipeline
for item in data_source:
    input_queue.put(item)

Lock Contention Analysis

from py13x.profiling import ContentionProbe

probe = ContentionProbe()

# Use instead of regular Lock
probe.acquire()
try:
    # Critical section
    shared_data.update()
finally:
    probe.release()

print(f"Lock wait time: {probe.wait_time:.3f}s")

Thread-Local Caching

from py13x.memory import ThreadArena

def worker():
    arena = ThreadArena.get()
    
    # Each thread gets its own cache
    if 'cache' not in arena:
        arena['cache'] = {}
    
    cache = arena['cache']
    # Use thread-local cache without locks

⚡ Performance Tips

  1. Worker Count: For CPU-bound work, use cpu_count(). For I/O-bound, you can exceed core count.
  2. Batch Size: Process data in batches to reduce task submission overhead
  3. Memory Reuse: Use ReusableBuffer and ThreadArena to minimize allocations
  4. Profiling: Use ContentionProbe to identify locking bottlenecks
  5. Affinity: Consider pinning threads to cores for cache-sensitive workloads

🔬 Benchmarking

Compare performance with and without the GIL:

import time
from py13x.parallel import pmap
from py13x.runtime import assert_free_threaded, cpu_count

assert_free_threaded()

def benchmark():
    start = time.perf_counter()
    results = pmap(compute_intensive, data, workers=cpu_count())
    elapsed = time.perf_counter() - start
    print(f"Processed {len(results)} items in {elapsed:.3f}s")
    print(f"Throughput: {len(results)/elapsed:.1f} items/sec")

benchmark()

🐛 Debugging

Enable detailed logging:

import logging
logging.basicConfig(level=logging.DEBUG)

# Use tracing for performance debugging
from py13x.profiling import trace

with trace("suspicious operation"):
    potentially_slow_function()

🤝 Contributing

Contributions are welcome! Please ensure:

  • Code works with Python 3.13+ free-threaded runtime
  • All functions have type hints and docstrings
  • Tests pass with python3.13t -m pytest

📄 License

MIT License - see LICENSE file for details.

🔗 Links

⚠️ Notes

  • This package requires Python 3.13+ with free-threaded runtime enabled
  • Use python3.13t or build Python with --disable-gil flag
  • Performance benefits are most significant for CPU-bound workloads
  • Some operations may still require synchronization (locks, atomics)

Built for the future of parallel Python 🐍⚡

View on GitHub
GitHub Stars5
CategoryProduct
Updated8d ago
Forks0

Languages

Python

Security Score

85/100

Audited on Mar 29, 2026

No findings