Kubetorch
Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.
Install / Use
/learn @run-house/KubetorchREADME
📦Kubetorch🔥
A Fast, Pythonic, "Serverless" Interface for Running ML Workloads on Kubernetes
Kubetorch lets you programmatically build, iterate, and deploy ML applications on Kubernetes at any scale - directly from Python.
It brings your cluster's compute power into your local development environment, enabling extremely fast iteration (1-2 seconds). Logs, exceptions, and hardware faults are automatically propagated back to you in real-time.
Since Kubetorch has no local runtime or code serialization, you can access large-scale cluster compute from any Python environment - your IDE, notebooks, CI pipelines, or production code - just like you would use a local process pool.
Hello World
import kubetorch as kt
def hello_world():
return "Hello from Kubetorch!"
if __name__ == "__main__":
# Define your compute
compute = kt.Compute(cpus=".1")
# Send local function to freshly launched remote compute
remote_hello = kt.fn(hello_world).to(compute)
# Runs remotely on your Kubernetes cluster
result = remote_hello()
print(result) # "Hello from Kubetorch!"
What Kubetorch Enables
- 100x faster iteration from 10+ minutes to 1-3 seconds for complex ML applications like RL and distributed training
- 50%+ compute cost savings through intelligent resource allocation, bin-packing, and dynamic scaling
- 95% fewer production faults with built-in fault handling with programmatic error recovery and resource adjustment
Installation
1. Python Client
pip install "kubetorch[client]"
2. Kubernetes Deployment (Helm)
# Option 1: Install directly from OCI registry
helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \
--version 0.5.0 -n kubetorch --create-namespace
# Option 2: Download chart locally first
helm pull oci://ghcr.io/run-house/charts/kubetorch --version 0.5.0 --untar
helm upgrade --install kubetorch ./kubetorch -n kubetorch --create-namespace
For detailed setup instructions, see our Installation Guide.
Source Layout
This repo now includes the customer-facing OSS deployment components that were previously split across internal and OSS repos:
python_client/for the SDKcharts/kubetorch/for the Helm chartservices/for the controller and data store sourcesrelease/default_images/for the workload base imagesrelease/for release scripts and version sync
Kubetorch Serverless
Contact us (email, Slack) to try out Kubetorch on our fully managed serverless platform.
Learn More
- Documentation - API Reference, concepts, and guides
- Examples - Real-world usage patterns and tutorials
- Join our Slack - Connect with the community and get support
🏃♀️ Built by Runhouse 🏠
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
110.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
351.2kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
