Lazycluster
🎛 Distributed machine learning made simple.
Install / Use
/learn @ml-tooling/LazyclusterREADME
lazycluster is a Python library intended to liberate data scientists and machine learning engineers by abstracting away cluster management and configuration so that they are able to focus on their actual tasks. Especially, the easy and convenient cluster setup with Python for various distributed machine learning frameworks is emphasized.
Highlights
- High-Level API for starting clusters:
- Lower-level API for:
- Managing Runtimes or RuntimeGroups to:
- A-/synchronously execute RuntimeTasks by leveraging the power of ssh
- Expose services (e.g. a DB) from or to a
Runtimeor in a wholeRuntimeGroup
- Managing Runtimes or RuntimeGroups to:
- Command line interface (CLI)
- List all available
Runtimes - Add a
Runtimeconfiguration - Delete a
Runtimeconfiguration
- List all available

Concept Definition: Runtime <a name="runtime"></a>
A
Runtimeis the logical representation of a remote host. Typically, the host is another server or a virtual machine / container on another server. This python class provides several methods for utilizing remote resources such as the port exposure from / to aRuntimeas well as the execution of RuntimeTasks. ARuntimehas a working directory. Usually, the execution of aRuntimeTaskis conducted relatively to this directory if no other path is explicitly given. The working directory can be manually set during the initialization. Otherwise, a temporary directory gets created that might eventually be removed.
Concept Definition: RuntimeGroup
A
RuntimeGroupis the representation of logically relatedRuntimesand provides convenient methods for managing those relatedRuntimes. Most methods are wrappers around their counterparts in theRuntimeclass. Typical usage examples are exposing a port (i.e. a service such as a DB) in theRuntimeGroup, transfer files, or execute aRuntimeTaskon theRuntimes. Additionally, all concrete RuntimeCluster (e.g. the HyperoptCluster) implementations rely onRuntimeGroupsfor example.
Concept Definition: Manager<a name="manager"></a>
The
managerrefers to the host where you are actually using the lazycluster library, since all desired lazycluster entities are managed from here. Caution: It is not to be confused with the RuntimeManager class.
Concept Definition: RuntimeTask <a name="task"></a>
A
RuntimeTaskis a composition of multiple elemantary task steps, namelysend file,get file,run command(shell),run function(python). ARuntimeTaskcan be executed on a remote host either by handing it over to aRuntimeobject or standalone by handing over a fabric Connection object to the execute method of theRuntimeTask. Consequently, all invididual task steps are executed sequentially. Moreover, aRuntimeTaskobject captures the output (stdout/stderr) of the remote execution in its execution log. An example for aRuntimeTaskcould be to send a csv file to aRuntime, execute a python function that is transforming the csv file and finally get the file back.
<br>
Getting started
Installation
pip install lazycluster
# Most up-to-date development version
pip install --upgrade git+https://github.com/ml-tooling/lazycluster.git@develop
Prerequisites
For lazycluster usage on the manager:
-
Unix based OS
-
Python >= 3.6
-
ssh client (e.g. openssh-client)
-
Passwordless ssh access to the
<details> <summary>Configure passwordless ssh access (click to expand...)</summary>Runtimehosts (recommended) <a name="passwordless-ssh"></a>- Create a key pair on the manager as described here or use an existing one
- Install lazycluster on the manager
- Create the ssh configuration for each host to be used as Runtime by using the lazycluster CLI command
lazycluster add-runtimeas described here and do not forget to specify the--id-fileargument. - Finally, enable the passwordless ssh access by copying the public key to each Runtime as descibed here
Runtime host requirements:
- Unix based OS
- Python >= 3.6
- ssh server (e.g. openssh-server)
Note:
Passwordless ssh needs to be setup for the hosts to be used as Runtimes for the most convenient user experience. Otherwise, you need to pass the connection details to Runtime.__init__ via connection_kwargs. These parameters will be passed on to the fabric.Connection.
Usage example high-level API
Start a Dask cluster.
from lazycluster import RuntimeManager
from lazycluster.cluster.dask_cluster import DaskCluster
# Automatically generate a group based on the ssh configuration
runtime_manager = RuntimeManager()
runtime_group = runtime_manager.create_group()
# Start the Dask cluster instances using the RuntimeGroup
dask_cluster = DaskCluster(runtime_group)
dask_cluster.start()
# => Now, you can start using the running Dask cluster
# Get Dask client to interact with the cluster
# Note: This will give you a dask.distributed.Client which is not
# a lazycluster cluster but a Dask one instead
client = cluster.get_client()
Usage example lower-level API
Execute a Python function on a remote host and access the return data.
from lazycluster import RuntimeTask, Runtime
# Define a Python function which will be executed remotely
def hello(name:str):
return 'Hello ' + name + '!'
# Compose a `RuntimeTask`
task = RuntimeTask('my-first_task').run_command('echo Hello World!') \
.run_function(hello, name='World')
# Actually execute it remotely in a `Runtime`
task = Runtime('host-1').execute_task(task, execute_async=False)
# The stdout from from the executing `Runtime` can be accessed
# via the execution log of the `RuntimeTask`
task.print_log()
# Print the return of the `hello()` call
generator = task.function_returns
print(next(generator))
Support
The lazycluster project is maintained by Jan Kalkan. Please understand that we won't be able to provide individual support via email. We also believe that help is much more valuable if it's shared publicly so that more people can benefit from it.
| Type | Channel | | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 🚨 Bug Reports | <a href="https://github.com/ml-tooling/lazycluster/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+label%3Abug+sort%3Areactions-%2B1-desc+" title="Open Bug Report"><img src="https://img.shields.io/github/issues/ml-tooling/lazycluster/bug.svg"></a> | | 🎁 Feature Requests | <a href="https://github.com/ml-tooling/lazycluster/issues?q=is%3Aopen+is%3Aissue+label%3Afeature-request+sort%3Areactions-%2B1-desc" title="Open Feature Request"><img src="https://img.shields.io/github/issues/ml-tooling/lazycluster/feature-request.svg?label=feature%20requests"></a>
Related Skills
claude-opus-4-5-migration
83.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
338.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.9k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.7kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
