SkillAgentSearch skills...

Snakepit

High-performance, generalized process pooler and session manager for external language integrations. Orchestrates and supervises languages like Python and Javascript from Elixir.

Install / Use

/learn @nshkrdotcom/Snakepit

README

Snakepit

<div align="center"> <img src="assets/snakepit-logo.svg" alt="Snakepit Logo" width="200" height="200"> </div>

A high-performance, generalized process pooler and session manager for external language integrations in Elixir

Hex Version Hex Docs License: MIT Elixir

Features

  • High-performance process pooling with concurrent worker initialization
  • Session affinity for stateful operations across requests (hint by default, strict modes available)
  • gRPC streaming for real-time progress updates and large data transfers
  • Bidirectional tool bridge allowing Python to call Elixir functions and vice versa
  • Production-ready process management with automatic orphan cleanup
  • Hardware detection for ML accelerators (CUDA, MPS, ROCm)
  • Fault tolerance with circuit breakers, retry policies, and crash barriers
  • Comprehensive telemetry with OpenTelemetry support
  • Dual worker profiles (process isolation or threaded parallelism)
  • Zero-copy data interop via DLPack and Arrow

Installation

Add snakepit to your dependencies in mix.exs:

def deps do
  [
    {:snakepit, "~> 0.13.0"}
  ]
end

Then run:

mix deps.get
mix snakepit.setup    # Install Python dependencies and generate gRPC stubs
mix snakepit.doctor   # Verify environment is correctly configured

Using with SnakeBridge (Recommended)

For higher-level Python integration with compile-time type generation, use SnakeBridge instead of snakepit directly. SnakeBridge handles Python environment setup automatically at compile time.

def deps do
  [{:snakebridge, "~> 0.16.0"}]
end

def project do
  [
    ...
    compilers: [:snakebridge] ++ Mix.compilers()
  ]
end

Quick Start

# Execute a command on any available worker
{:ok, result} = Snakepit.execute("ping", %{})

# Execute with session affinity (prefer the same worker for related requests)
{:ok, result} = Snakepit.execute_in_session("session_123", "process_data", %{input: data})

# Stream results for long-running operations
Snakepit.execute_stream("batch_process", %{items: items}, fn chunk ->
  IO.puts("Progress: #{chunk["progress"]}%")
end)

Configuration

Simple Configuration

# config/config.exs
config :snakepit,
  pooling_enabled: true,
  adapter_module: Snakepit.Adapters.GRPCPython,
  adapter_args: ["--adapter", "your_adapter_module"],
  pool_size: 10,
  log_level: :error

In legacy single-pool mode, if both top-level :pool_size and pool_config.pool_size are configured, the top-level :pool_size value wins.

Multi-Pool Configuration (v0.6+)

config :snakepit,
  pools: [
    %{
      name: :default,
      worker_profile: :process,
      pool_size: 10,
      adapter_module: Snakepit.Adapters.GRPCPython,
      adapter_args: ["--adapter", "my_app.adapters.MainAdapter"]
    },
    %{
      name: :compute,
      worker_profile: :thread,
      pool_size: 4,
      threads_per_worker: 8,
      adapter_args: ["--adapter", "my_app.adapters.ComputeAdapter"]
    }
  ]

If your adapter defines command_timeout/2, timeout selection is resolved from the checked-out worker's pool adapter_module. The global :adapter_module value is used only as a fallback when a pool does not declare one.

Logging Configuration

Snakepit is silent by default (errors only):

config :snakepit, log_level: :error          # Default - errors only
config :snakepit, log_level: :info           # Include info messages
config :snakepit, log_level: :debug          # Verbose debugging
config :snakepit, log_level: :none           # Complete silence

# Filter to specific categories
config :snakepit, log_level: :debug, log_categories: [:grpc, :pool]

gRPC Listener Configuration

By default, Snakepit runs an internal-only gRPC listener on an ephemeral port and publishes the assigned port to Python workers at runtime:

config :snakepit,
  grpc_listener: %{
    mode: :internal
  }

Explicit external bindings are opt-in and require host/port configuration:

config :snakepit,
  grpc_listener: %{
    mode: :external,
    host: "localhost",
    bind_host: "0.0.0.0",
    port: 50051
  }

For multi-instance deployments sharing a host, use the pooled external mode:

config :snakepit,
  grpc_listener: %{
    mode: :external_pool,
    host: "localhost",
    bind_host: "0.0.0.0",
    base_port: 50051,
    pool_size: 32
  }

To isolate process registry state when sharing a deployment directory, set an explicit instance name, instance token, and data directory:

config :snakepit,
  instance_name: "my-app-a",
  instance_token: "node-a-01",
  data_dir: "/var/lib/snakepit"

instance_name identifies an environment (for example prod-us-east-1). instance_token identifies one running instance inside that environment. When running multiple Snakepit VMs from the same checkout or host at the same time, each VM must use a unique instance_token so cleanup logic never targets another live instance.

Environment variables are also supported:

SNAKEPIT_INSTANCE_NAME=my-app SNAKEPIT_INSTANCE_TOKEN=job_1 mix run --no-start script_a.exs
SNAKEPIT_INSTANCE_NAME=my-app SNAKEPIT_INSTANCE_TOKEN=job_2 mix run --no-start script_b.exs

Runtime Configurable Defaults

All hardcoded timeout and sizing values are now configurable via Application.get_env/3. Values are read at runtime, allowing configuration changes without recompilation.

# config/runtime.exs - Example customization
config :snakepit,
  # Timeouts (all in milliseconds)
  default_command_timeout: 30_000,       # Default timeout for commands
  pool_request_timeout: 60_000,          # Pool execute timeout
  pool_streaming_timeout: 300_000,       # Pool streaming timeout
  pool_startup_timeout: 10_000,          # Worker startup timeout
  pool_queue_timeout: 5_000,             # Queue timeout
  checkout_timeout: 5_000,               # Worker checkout timeout
  grpc_worker_execute_timeout: 30_000,   # GRPCWorker execute timeout
  grpc_worker_stream_timeout: 300_000,   # GRPCWorker streaming timeout
  grpc_worker_health_check_timeout_ms: 5_000, # Periodic worker health RPC timeout
  graceful_shutdown_timeout_ms: 6_000,   # Python process shutdown timeout

  # Pool sizing
  pool_max_queue_size: 1000,             # Max pending requests in queue
  pool_max_workers: 150,                 # Maximum workers per pool
  pool_startup_batch_size: 10,           # Workers started per batch
  pool_startup_batch_delay_ms: 500,      # Delay between startup batches

  # Pool recovery
  pool_reconcile_interval_ms: 1_000,     # Reconcile worker count interval (0 disables)
  pool_reconcile_batch_size: 2,          # Max workers respawned per tick

  # Worker supervisor restart intensity
  worker_starter_max_restarts: 3,
  worker_starter_max_seconds: 5,
  worker_supervisor_max_restarts: 3,
  worker_supervisor_max_seconds: 5,

  # Retry policy
  retry_max_attempts: 3,
  retry_backoff_sequence: [100, 200, 400, 800, 1600],
  retry_max_backoff_ms: 30_000,
  retry_jitter_factor: 0.25,

  # Circuit breaker
  circuit_breaker_failure_threshold: 5,
  circuit_breaker_reset_timeout_ms: 30_000,
  circuit_breaker_half_open_max_calls: 1,

  # Crash barrier
  crash_barrier_taint_duration_ms: 60_000,
  crash_barrier_max_restarts: 1,
  crash_barrier_backoff_ms: [50, 100, 200],

  # Health monitor
  health_monitor_check_interval: 30_000,
  health_monitor_crash_window_ms: 60_000,
  health_monitor_max_crashes: 10,

  # Heartbeat
  heartbeat_ping_interval_ms: 2_000,
  heartbeat_timeout_ms: 10_000,
  heartbeat_max_missed: 3,

  # Session store
  session_cleanup_interval: 60_000,
  session_default_ttl: 3600,
  session_max_sessions: 10_000,
  session_warning_threshold: 0.8,

  # gRPC listener
  grpc_listener: %{mode: :internal},
  grpc_internal_host: "127.0.0.1",
  grpc_port_pool_size: 32,
  grpc_listener_ready_timeout_ms: 5_000,
  grpc_listener_port_check_interval_ms: 25,
  grpc_listener_reuse_attempts: 3,
  grpc_listener_reuse_wait_timeout_ms: 500,
  grpc_listener_reuse_retry_delay_ms: 100,
  grpc_num_acceptors: 20,
  grpc_max_connections: 1000,
  grpc_socket_backlog: 512

See Snakepit.Defaults module documentation for the complete list of configurable values.

Core API

Basic Execution

# Simple command execution
{:ok, result} = Snakepit.execute("command_name", %{param: "value"})

# With timeout
{:ok, result} = Snakepit.execute("slow_command", %{}, timeout: 30_000)

# Target specific pool
{:ok, result} = Snakepit.execute("ml_inference", %{}, pool: :compute)

Session Affinity

Sessions route related requests to the same worker when possible, enabling stateful operations:

session_id = "user_#{user.id}"

# First call establishes worker affinity
{:ok, _} = Snakepit.execute_in_session(session_id, "load_model", %{model: "gpt-4"})

# Subsequent calls prefer the same worker
{:ok, result} = Snakepit.execute_in_session(session_id, "generate", %{prompt: "Hello"})
{:ok, result} = Snakepit.execute_in_session(session_id, "generate", %{prompt: "Continue"})

By default, affinity is a hint. If the preferred worker is busy or tainted, Snakepit can fall back to another worker. For strict pinning, configure affinity modes at the pool level:

config :snakepit,
  pools: [
    %{name: :default, pool_size: 4, affinity: :strict_queue},
    %{name: :latency_sensitive, pool_size: 4, affinity: :strict_fail_fast}
  ]
  • :strict_queue queues requests for the preferred worker when it is busy.
  • `:

Related Skills

View on GitHub
GitHub Stars11
CategoryDevelopment
Updated15d ago
Forks2

Languages

Elixir

Security Score

95/100

Audited on Mar 19, 2026

No findings