HPK
HPK allows running Kubernetes applications within HPC by translating deployments to Slurm and Singularity/Apptainer
Install / Use
/learn @CARV-ICS-FORTH/HPKREADME
High-Performance Kubernetes
High-Performance Kubernetes (HPK), allows HPC users to run their own private "mini Clouds" on a typical HPC cluster. HPK uses a single container to run the Kubernetes control plane and a Virtual Kubelet Provider implementation to translate container lifecycle management commands from Kubernetes-native to Slurm/Apptainer.
To allow users to run HPK, the HPC environment should have Apptainer configured so that:
- It allows users to run containers with
--fakeroot. - It uses a CNI plug-in that hands over private IPs to containers, which are routable across cluster hosts (we use flannel and the flannel CNI plug-in).
In contrast to a typical Kubernetes installation at the Cloud:
- HPK uses a pass-through scheduler, which assigns all pods to the single
hpk-kubeletthat represents the cluster. In practice, this means that all scheduling is delegated to Slurm. - All Kubernetes services are converted to headless. This avoids the need for internal, virtual cluster IPs that would need special handling at the network level. As a side effect, HPK services that map to multiple pods are load-balanced at the DNS level if clients support it.
HPK is a continuation of the KNoC project, a Virtual Kubelet Provider implementation that can be used to bridge Kubernetes and HPC environments.
Trying it out
First you need to configure Apptainer for HPK. The install-environment.sh script showcases how we implement the requirements in a single node for testing.
Once setup, compile the hpk-kubelet using make.
make build
Then you need to start the Kubernetes Master and hpk-kubelet seperately.
To run the Kubernetes Master:
make run-hpk-master
Once the master is up and running, you can start the hpk-kubelet:
make run-kubelet
Now you can configure and use kubectl:
export KUBE_PATH=~/.hpk-master/kubernetes/
export KUBECONFIG=${KUBE_PATH}/admin.conf
kubectl get nodes
In case that you experience DNS issues, you should retry starting the Kubernetes Master with:
export EXTERNAL_DNS=<your dns server>
make run-kubemaster
The above command will set CoreDNS to forward requests for external names to your DNS server.
Publications/presentations
The latest paper on HPK is available at arXiv. A previous edition of this work was presented at WOCC'23 ("The International Workshop on Converged Computing on Edge, Cloud, and HPC", held in conjunction with ISC-HPC 2023).
The corresponding BibTeX entry is the following:
@misc{hpk,
title={Running Cloud-native Workloads on HPC with High-Performance Kubernetes},
author={Antony Chazapis and Evangelos Maliaroudakis and Fotis Nikolaidis and Manolis Marazakis and Angelos Bilas},
year={2024},
eprint={2409.16919},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2409.16919},
}
HPK was presented at FOSDEM 2025; slides and video from the event are available.
Acknowledgements
We thankfully acknowledge the support of the European Commission and the Greek General Secretariat for Research and Innovation to this project. HPK has received funding from the European Union’s Horizon Europe research and innovation programme through project RISER ("RISC-V for Cloud Services", GA-101092993), from the EuroHPC Joint Undertaking through projects EUPEX (GA-101033975) and DEEP-SEA (GA-955606), as well as from the Chips Joint Undertaking through project REBECCA ("Reconfigurable Heterogeneous Highly Parallel Processing Platform for safe and secure AI", GA-101097224). EuroHPC JU and Chips JU projects are jointly funded by the European Commission and the involved state members (including the Greek General Secretariat for Research and Innovation).
Related Skills
tmux
338.0kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
blogwatcher
338.0kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
Unla
2.1k🧩 MCP Gateway - A lightweight gateway service that instantly transforms existing MCP Servers and APIs into MCP servers with zero code changes. Features Docker deployment and management UI, requiring no infrastructure modifications.
github-trending
Multi-agent orchestration system for infrastructure monitoring, incident response, and load testing with autonomous AI agents
