Tuppr
Kubernetes controller to upgrade Talos and Kubernetes
Install / Use
/learn @home-operations/TupprREADME
tuppr - Talos Linux Upgrade Controller
A Kubernetes controller for managing automated upgrades of Talos Linux and Kubernetes.
✨ Features
Core Capabilities
- 🚀 Automated Talos node upgrades with intelligent orchestration
- 🎯 Kubernetes upgrades - upgrade Kubernetes to newer versions
- 🔒 Safe upgrade execution - upgrades always run from healthy nodes (never self-upgrade)
- 📊 Built-in health checks - CEL-based expressions for custom cluster validation
- 🔄 Configurable reboot modes - default or powercycle options
- 📋 Comprehensive status tracking with real-time progress reporting
- ⚡ Resilient job execution with automatic retry and pod replacement
- 📈 Prometheus metrics - detailed monitoring of upgrade progress and health
- 🎯 Per-node overrides - use annotations to specify unique versions or schematics for specific nodes
- 🏷️ Node labeling - automatic labels during upgrades for integration with remediation systems
🚀 Quick Start
Prerequisites
- Talos cluster with API access configured
- Namespace for the controller (e.g.,
system-upgrade)
Installation
Allow Talos API access to the desired namespace by applying this config to all of you nodes:
machine:
features:
kubernetesTalosAPIAccess:
allowedKubernetesNamespaces:
- system-upgrade # or the namespace the controller will be installed to
allowedRoles:
- os:admin
enabled: true
Install the Helm chart:
# Install via Helm
helm install tuppr oci://ghcr.io/home-operations/charts/tuppr \
--version 0.1.0 \
--namespace system-upgrade
Basic Usage
Talos Node Upgrades
Create a TalosUpgrade resource:
apiVersion: tuppr.home-operations.com/v1alpha1
kind: TalosUpgrade
metadata:
name: cluster
spec:
talos:
# renovate: datasource=docker depName=ghcr.io/siderolabs/installer
version: v1.11.0 # Required - target Talos version
policy:
debug: true # Optional, verbose logging
force: false # Optional, skip etcd health checks
rebootMode: default # Optional, default|powercycle
placement: soft # Optional, hard|soft
stage: false # Optional, stage upgrade
timeout: 30m # Optional, per-node upgrade timeout
# Custom health checks (optional)
healthChecks:
- apiVersion: v1
kind: Node
expr: status.conditions.exists(c, c.type == "Ready" && c.status == "True")
# Talosctl configuration (optional)
talosctl:
image:
repository: ghcr.io/siderolabs/talosctl # Optional, default
tag: v1.11.0 # Optional, auto-detected
pullPolicy: IfNotPresent # Optional, default
# Maintenance windows (optional)
maintenance:
windows:
- start: "0 2 * * 0" # Cron expression (Sunday 02:00)
duration: "4h" # How long window stays open
timezone: "UTC" # IANA timezone, default UTC
# Node selector (optional)
nodeSelector:
matchExpressions:
# Only upgrade nodes that have opted-in via this label
- {key: tuppr.home-operations.com/upgrade, operator: In, values: ["enabled"]}
# Exclude control plane nodes from this specific plan
- {key: node-role.kubernetes.io/control-plane, operator: DoesNotExist}
# Configure drain behavior (optional)
drain:
# Continue even if there are pods using emptyDir (local data)
deleteLocalData: true
# Ignore DaemonSet-managed pods
ignoreDaemonSets: true
# Force drain even if pods do not declare a controller
force: true
# Optional: Force delete instead of eviction
# disableEviction: false
# Optional: Skip waiting for delete timeout (seconds)
# skipWaitForDeleteTimeout: 0
Kubernetes Upgrades
Create a KubernetesUpgrade resource:
apiVersion: tuppr.home-operations.com/v1alpha1
kind: KubernetesUpgrade
metadata:
name: kubernetes
spec:
kubernetes:
# renovate: datasource=docker depName=ghcr.io/siderolabs/kubelet
version: v1.34.0 # Required - target Kubernetes version
# Custom health checks (optional)
healthChecks:
- apiVersion: v1
kind: Node
expr: status.conditions.exists(c, c.type == "Ready" && c.status == "True")
timeout: 10m
# Talosctl configuration (optional)
talosctl:
image:
repository: ghcr.io/siderolabs/talosctl # Optional, default
tag: v1.11.0 # Optional, auto-detected
pullPolicy: IfNotPresent # Optional, default
# Maintenance windows (optional)
maintenance:
windows:
- start: "0 2 * * 0" # Cron expression (Sunday 02:00)
duration: "4h" # How long window stays open
timezone: "UTC" # IANA timezone, default UTC
🎯 Advanced Configuration
Health Checks
Define custom health checks using CEL expressions. These health checks are evaluated before each upgrade and run concurrently.
healthChecks:
# Check all nodes are ready
- apiVersion: v1
kind: Node
expr: |
status.conditions.filter(c, c.type == "Ready").all(c, c.status == "True")
timeout: 10m
# Check specific deployment replicas
- apiVersion: apps/v1
kind: Deployment
name: critical-app
namespace: production
expr: status.readyReplicas == status.replicas
# Check custom resources
- apiVersion: ceph.rook.io/v1
kind: CephCluster
name: rook-ceph
namespace: rook-ceph
expr: status.ceph.health in ["HEALTH_OK"]
Upgrade Policies (TalosUpgrade only)
Fine-tune upgrade behavior:
policy:
# Enable debug logging for troubleshooting
debug: true
# Force upgrade even if etcd is unhealthy (dangerous!)
force: true
# Controls how strictly upgrade jobs avoid the target node
placement: hard # or "soft"
# Use powercycle reboot for problematic nodes
rebootMode: powercycle # or "default"
# Stage upgrade then reboot to apply (2 total reboots)
stage: false
Maintenance Windows
Control when upgrades start using cron-based maintenance windows. Running upgrades always complete without interruption.
maintenance:
windows:
- start: "0 2 * * 0" # Sunday 02:00
duration: "4h" # Max 168h, warn if <1h
timezone: "Europe/Paris" # IANA timezone, default UTC
- Upgrades only start during open windows (stays
Pendingotherwise) - Multiple windows create union (any open window allows start)
- In-progress upgrades always complete (never interrupted)
- TalosUpgrade re-checks between nodes
- Empty config: upgrades start immediately (backwards compatible)
Per-Node Overrides
Tuppr supports overriding the global TalosUpgrade configuration on a per-node basis using Kubernetes annotations. This is useful for testing new versions on a canary node or handling nodes with different hardware schematics.
| Annotation | Description | Example | | -------- | ------- | ------- | | tuppr.home-operations.com/version | Overrides the target Talos version for this node. | v1.12.1 | | tuppr.home-operations.com/schematic | Overrides the Talos schematic hash for this node. | b55fbf... |
Example: Applying an override
# Upgrade a specific node to a different version than the global policy
kubectl annotate node worker-01 tuppr.home-operations.com/version="v1.12.1"
# Apply a custom schematic (with specific extensions) to one node
kubectl annotate node worker-02 tuppr.home-operations.com/schematic="314b18a3f89d..."
How it works:
- The controller checks if a node version or schematic matches the annotation instead of the global TalosUpgrade spec.
- If an inconsistency is found, an upgrade job is triggered for that node using the override values.
⚠️ Safe Talos Upgrade Paths
Talos Linux has specific supported upgrade paths. You should always upgrade through each minor version sequentially rather than skipping minor versions. For example, upgrading from Talos v1.0 to v1.2.4 requires:
- Upgrade from v1.0.x to the latest patch of v1.0 (e.g., v1.0.6)
- Upgrade from v1.0.6 to the latest patch of v1.1 (e.g., v1.1.2)
- Upgrade from v1.1.2 to v1.2.4
Tuppr does not automatically enforce safe upgrade paths — it will upgrade directly to whatever version you specify in the TalosUpgrade resource. It is your responsibility to ensure the target version is a valid upgrade from your current version.
Recommended: Use Renovate for Safe Version Bumps
Renovate can automate version updates in your GitOps repository while respecting safe upgrade boundaries. Configure it to separate major/minor and minor/patch PRs so you can step through each version sequentially:
{
"packageRules": [
{
"matchDatasources": ["docker"],
"matchPackageNames": ["ghcr.io/siderolabs/installer"],
"separateMajorMinor": true,
"separateMinorPatch": true
}
]
}
separateMajorMinor— creates separate PRs for major vs minor bumpsseparateMinorPatch— creates separate PRs for minor vs patch bumps
This way, Renovate will propose incremental version bumps that you can merge one at a time, ensuring you follow the supported upgrade path. Combine this with the renovate comment in your TalosUpgrade spec:
spec:
talos:
# renovate: datasource=docker depName=ghcr.io/siderolabs/installer
version: v1.11.0
📊 Monitoring & Metrics
Prometheus Metrics
Tuppr exposes comprehensive Prometheus metrics for monitoring upgrade progress, health check performance, and job execution:
Talos Upgrade Metrics
# Current phase of a Talos upgrade (state-set:
Related Skills
node-connect
348.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
