Snakepit

High-performance, generalized process pooler and session manager for external language integrations. Orchestrates and supervises languages like Python and Javascript from Elixir.

Generate Convert Improve

Install / Use

/learn @nshkrdotcom/Snakepit

About this skill

Quality Score

0/100

README

Snakepit

A high-performance, generalized process pooler and session manager for external language integrations in Elixir

Features

High-performance process pooling with concurrent worker initialization
Session affinity for stateful operations across requests (hint by default, strict modes available)
gRPC streaming for real-time progress updates and large data transfers
Bidirectional tool bridge allowing Python to call Elixir functions and vice versa
Production-ready process management with automatic orphan cleanup
Hardware detection for ML accelerators (CUDA, MPS, ROCm)
Fault tolerance with circuit breakers, retry policies, and crash barriers
Comprehensive telemetry with OpenTelemetry support
Dual worker profiles (process isolation or threaded parallelism)
Zero-copy data interop via DLPack and Arrow

Installation

Add snakepit to your dependencies in mix.exs:

def deps do
  [
    {:snakepit, "~> 0.13.0"}
  ]
end

Then run:

mix deps.get
mix snakepit.setup    # Install Python dependencies and generate gRPC stubs
mix snakepit.doctor   # Verify environment is correctly configured

Using with SnakeBridge (Recommended)

For higher-level Python integration with compile-time type generation, use SnakeBridge instead of snakepit directly. SnakeBridge handles Python environment setup automatically at compile time.

def deps do
  [{:snakebridge, "~> 0.16.0"}]
end

def project do
  [
    ...
    compilers: [:snakebridge] ++ Mix.compilers()
  ]
end

Quick Start

# Execute a command on any available worker
{:ok, result} = Snakepit.execute("ping", %{})

# Execute with session affinity (prefer the same worker for related requests)
{:ok, result} = Snakepit.execute_in_session("session_123", "process_data", %{input: data})

# Stream results for long-running operations
Snakepit.execute_stream("batch_process", %{items: items}, fn chunk ->
  IO.puts("Progress: #{chunk["progress"]}%")
end)

Configuration

Simple Configuration

# config/config.exs
config :snakepit,
  pooling_enabled: true,
  adapter_module: Snakepit.Adapters.GRPCPython,
  adapter_args: ["--adapter", "your_adapter_module"],
  pool_size: 10,
  log_level: :error

In legacy single-pool mode, if both top-level :pool_size and pool_config.pool_size are configured, the top-level :pool_size value wins.

Multi-Pool Configuration (v0.6+)

config :snakepit,
  pools: [
    %{
      name: :default,
      worker_profile: :process,
      pool_size: 10,
      adapter_module: Snakepit.Adapters.GRPCPython,
      adapter_args: ["--adapter", "my_app.adapters.MainAdapter"]
    },
    %{
      name: :compute,
      worker_profile: :thread,
      pool_size: 4,
      threads_per_worker: 8,
      adapter_args: ["--adapter", "my_app.adapters.ComputeAdapter"]
    }
  ]

If your adapter defines command_timeout/2, timeout selection is resolved from the checked-out worker's pool adapter_module. The global :adapter_module value is used only as a fallback when a pool does not declare one.

Logging Configuration

Snakepit is silent by default (errors only):

config :snakepit, log_level: :error          # Default - errors only
config :snakepit, log_level: :info           # Include info messages
config :snakepit, log_level: :debug          # Verbose debugging
config :snakepit, log_level: :none           # Complete silence

# Filter to specific categories
config :snakepit, log_level: :debug, log_categories: [:grpc, :pool]

gRPC Listener Configuration

By default, Snakepit runs an internal-only gRPC listener on an ephemeral port and publishes the assigned port to Python workers at runtime:

config :snakepit,
  grpc_listener: %{
    mode: :internal
  }

Explicit external bindings are opt-in and require host/port configuration:

config :snakepit,
  grpc_listener: %{
    mode: :external,
    host: "localhost",
    bind_host: "0.0.0.0",
    port: 50051
  }

For multi-instance deployments sharing a host, use the pooled external mode:

config :snakepit,
  grpc_listener: %{
    mode: :external_pool,
    host: "localhost",
    bind_host: "0.0.0.0",
    base_port: 50051,
    pool_size: 32
  }

To isolate process registry state when sharing a deployment directory, set an explicit instance name, instance token, and data directory:

config :snakepit,
  instance_name: "my-app-a",
  instance_token: "node-a-01",
  data_dir: "/var/lib/snakepit"

instance_name identifies an environment (for example prod-us-east-1). instance_token identifies one running instance inside that environment. When running multiple Snakepit VMs from the same checkout or host at the same time, each VM must use a unique instance_token so cleanup logic never targets another live instance.

Environment variables are also supported:

SNAKEPIT_INSTANCE_NAME=my-app SNAKEPIT_INSTANCE_TOKEN=job_1 mix run --no-start script_a.exs
SNAKEPIT_INSTANCE_NAME=my-app SNAKEPIT_INSTANCE_TOKEN=job_2 mix run --no-start script_b.exs

Runtime Configurable Defaults

All hardcoded timeout and sizing values are now configurable via Application.get_env/3. Values are read at runtime, allowing configuration changes without recompilation.

# config/runtime.exs - Example customization
config :snakepit,
  # Timeouts (all in milliseconds)
  default_command_timeout: 30_000,       # Default timeout for commands
  pool_request_timeout: 60_000,          # Pool execute timeout
  pool_streaming_timeout: 300_000,       # Pool streaming timeout
  pool_startup_timeout: 10_000,          # Worker startup timeout
  pool_queue_timeout: 5_000,             # Queue timeout
  checkout_timeout: 5_000,               # Worker checkout timeout
  grpc_worker_execute_timeout: 30_000,   # GRPCWorker execute timeout
  grpc_worker_stream_timeout: 300_000,   # GRPCWorker streaming timeout
  grpc_worker_health_check_timeout_ms: 5_000, # Periodic worker health RPC timeout
  graceful_shutdown_timeout_ms: 6_000,   # Python process shutdown timeout

  # Pool sizing
  pool_max_queue_size: 1000,             # Max pending requests in queue
  pool_max_workers: 150,                 # Maximum workers per pool
  pool_startup_batch_size: 10,           # Workers started per batch
  pool_startup_batch_delay_ms: 500,      # Delay between startup batches

  # Pool recovery
  pool_reconcile_interval_ms: 1_000,     # Reconcile worker count interval (0 disables)
  pool_reconcile_batch_size: 2,          # Max workers respawned per tick

  # Worker supervisor restart intensity
  worker_starter_max_restarts: 3,
  worker_starter_max_seconds: 5,
  worker_supervisor_max_restarts: 3,
  worker_supervisor_max_seconds: 5,

  # Retry policy
  retry_max_attempts: 3,
  retry_backoff_sequence: [100, 200, 400, 800, 1600],
  retry_max_backoff_ms: 30_000,
  retry_jitter_factor: 0.25,

  # Circuit breaker
  circuit_breaker_failure_threshold: 5,
  circuit_breaker_reset_timeout_ms: 30_000,
  circuit_breaker_half_open_max_calls: 1,

  # Crash barrier
  crash_barrier_taint_duration_ms: 60_000,
  crash_barrier_max_restarts: 1,
  crash_barrier_backoff_ms: [50, 100, 200],

  # Health monitor
  health_monitor_check_interval: 30_000,
  health_monitor_crash_window_ms: 60_000,
  health_monitor_max_crashes: 10,

  # Heartbeat
  heartbeat_ping_interval_ms: 2_000,
  heartbeat_timeout_ms: 10_000,
  heartbeat_max_missed: 3,

  # Session store
  session_cleanup_interval: 60_000,
  session_default_ttl: 3600,
  session_max_sessions: 10_000,
  session_warning_threshold: 0.8,

  # gRPC listener
  grpc_listener: %{mode: :internal},
  grpc_internal_host: "127.0.0.1",
  grpc_port_pool_size: 32,
  grpc_listener_ready_timeout_ms: 5_000,
  grpc_listener_port_check_interval_ms: 25,
  grpc_listener_reuse_attempts: 3,
  grpc_listener_reuse_wait_timeout_ms: 500,
  grpc_listener_reuse_retry_delay_ms: 100,
  grpc_num_acceptors: 20,
  grpc_max_connections: 1000,
  grpc_socket_backlog: 512

See Snakepit.Defaults module documentation for the complete list of configurable values.

Core API

Basic Execution

# Simple command execution
{:ok, result} = Snakepit.execute("command_name", %{param: "value"})

# With timeout
{:ok, result} = Snakepit.execute("slow_command", %{}, timeout: 30_000)

# Target specific pool
{:ok, result} = Snakepit.execute("ml_inference", %{}, pool: :compute)

Session Affinity

Sessions route related requests to the same worker when possible, enabling stateful operations:

session_id = "user_#{user.id}"

# First call establishes worker affinity
{:ok, _} = Snakepit.execute_in_session(session_id, "load_model", %{model: "gpt-4"})

# Subsequent calls prefer the same worker
{:ok, result} = Snakepit.execute_in_session(session_id, "generate", %{prompt: "Hello"})
{:ok, result} = Snakepit.execute_in_session(session_id, "generate", %{prompt: "Continue"})

By default, affinity is a hint. If the preferred worker is busy or tainted, Snakepit can fall back to another worker. For strict pinning, configure affinity modes at the pool level:

config :snakepit,
  pools: [
    %{name: :default, pool_size: 4, affinity: :strict_queue},
    %{name: :latency_sensitive, pool_size: 4, affinity: :strict_fail_fast}
  ]

:strict_queue queues requests for the preferred worker when it is busy.
`:

Related Skills

node-connect

346.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

107.2k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

107.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

346.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

nshkrdotcom

View profile

View on GitHub

GitHub Stars11

CategoryDevelopment

Updated15d ago

Forks2

nshkrdotcom/snakepit

Languages

Elixir

Security Score

95/100

Audited on Mar 19, 2026

No findings