SkillAgentSearch skills...

Tuppr

Kubernetes controller to upgrade Talos and Kubernetes

Install / Use

/learn @home-operations/Tuppr
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

tuppr - Talos Linux Upgrade Controller

A Kubernetes controller for managing automated upgrades of Talos Linux and Kubernetes.

✨ Features

Core Capabilities

  • 🚀 Automated Talos node upgrades with intelligent orchestration
  • 🎯 Kubernetes upgrades - upgrade Kubernetes to newer versions
  • 🔒 Safe upgrade execution - upgrades always run from healthy nodes (never self-upgrade)
  • 📊 Built-in health checks - CEL-based expressions for custom cluster validation
  • 🔄 Configurable reboot modes - default or powercycle options
  • 📋 Comprehensive status tracking with real-time progress reporting
  • Resilient job execution with automatic retry and pod replacement
  • 📈 Prometheus metrics - detailed monitoring of upgrade progress and health
  • 🎯 Per-node overrides - use annotations to specify unique versions or schematics for specific nodes
  • 🏷️ Node labeling - automatic labels during upgrades for integration with remediation systems

🚀 Quick Start

Prerequisites

  1. Talos cluster with API access configured
  2. Namespace for the controller (e.g., system-upgrade)

Installation

Allow Talos API access to the desired namespace by applying this config to all of you nodes:

machine:
  features:
    kubernetesTalosAPIAccess:
      allowedKubernetesNamespaces:
        - system-upgrade # or the namespace the controller will be installed to
      allowedRoles:
        - os:admin
      enabled: true

Install the Helm chart:

# Install via Helm
helm install tuppr oci://ghcr.io/home-operations/charts/tuppr \
  --version 0.1.0 \
  --namespace system-upgrade

Basic Usage

Talos Node Upgrades

Create a TalosUpgrade resource:

apiVersion: tuppr.home-operations.com/v1alpha1
kind: TalosUpgrade
metadata:
  name: cluster
spec:
  talos:
    # renovate: datasource=docker depName=ghcr.io/siderolabs/installer
    version: v1.11.0  # Required - target Talos version

  policy:
    debug: true          # Optional, verbose logging
    force: false         # Optional, skip etcd health checks
    rebootMode: default  # Optional, default|powercycle
    placement: soft      # Optional, hard|soft
    stage: false         # Optional, stage upgrade
    timeout: 30m         # Optional, per-node upgrade timeout

  # Custom health checks (optional)
  healthChecks:
    - apiVersion: v1
      kind: Node
      expr: status.conditions.exists(c, c.type == "Ready" && c.status == "True")

  # Talosctl configuration (optional)
  talosctl:
    image:
      repository: ghcr.io/siderolabs/talosctl  # Optional, default
      tag: v1.11.0                             # Optional, auto-detected
      pullPolicy: IfNotPresent                 # Optional, default

  # Maintenance windows (optional)
  maintenance:
    windows:
      - start: "0 2 * * 0"    # Cron expression (Sunday 02:00)
        duration: "4h"         # How long window stays open
        timezone: "UTC"        # IANA timezone, default UTC

  # Node selector (optional)
  nodeSelector:
    matchExpressions:
      # Only upgrade nodes that have opted-in via this label
      - {key: tuppr.home-operations.com/upgrade, operator: In, values: ["enabled"]}
      # Exclude control plane nodes from this specific plan
      - {key: node-role.kubernetes.io/control-plane, operator: DoesNotExist}

  # Configure drain behavior (optional)
  drain:
    # Continue even if there are pods using emptyDir (local data)
    deleteLocalData: true

    # Ignore DaemonSet-managed pods
    ignoreDaemonSets: true

    # Force drain even if pods do not declare a controller
    force: true

    # Optional: Force delete instead of eviction
    # disableEviction: false

    # Optional: Skip waiting for delete timeout (seconds)
    # skipWaitForDeleteTimeout: 0

Kubernetes Upgrades

Create a KubernetesUpgrade resource:

apiVersion: tuppr.home-operations.com/v1alpha1
kind: KubernetesUpgrade
metadata:
  name: kubernetes
spec:
  kubernetes:
    # renovate: datasource=docker depName=ghcr.io/siderolabs/kubelet
    version: v1.34.0  # Required - target Kubernetes version

  # Custom health checks (optional)
  healthChecks:
    - apiVersion: v1
      kind: Node
      expr: status.conditions.exists(c, c.type == "Ready" && c.status == "True")
      timeout: 10m

  # Talosctl configuration (optional)
  talosctl:
    image:
      repository: ghcr.io/siderolabs/talosctl  # Optional, default
      tag: v1.11.0                             # Optional, auto-detected
      pullPolicy: IfNotPresent                 # Optional, default

  # Maintenance windows (optional)
  maintenance:
    windows:
      - start: "0 2 * * 0"    # Cron expression (Sunday 02:00)
        duration: "4h"         # How long window stays open
        timezone: "UTC"        # IANA timezone, default UTC

🎯 Advanced Configuration

Health Checks

Define custom health checks using CEL expressions. These health checks are evaluated before each upgrade and run concurrently.

healthChecks:
  # Check all nodes are ready
  - apiVersion: v1
    kind: Node
    expr: |
      status.conditions.filter(c, c.type == "Ready").all(c, c.status == "True")
    timeout: 10m

  # Check specific deployment replicas
  - apiVersion: apps/v1
    kind: Deployment
    name: critical-app
    namespace: production
    expr: status.readyReplicas == status.replicas

  # Check custom resources
  - apiVersion: ceph.rook.io/v1
    kind: CephCluster
    name: rook-ceph
    namespace: rook-ceph
    expr: status.ceph.health in ["HEALTH_OK"]

Upgrade Policies (TalosUpgrade only)

Fine-tune upgrade behavior:

policy:
  # Enable debug logging for troubleshooting
  debug: true

  # Force upgrade even if etcd is unhealthy (dangerous!)
  force: true

  # Controls how strictly upgrade jobs avoid the target node
  placement: hard  # or "soft"

  # Use powercycle reboot for problematic nodes
  rebootMode: powercycle  # or "default"

  # Stage upgrade then reboot to apply (2 total reboots)
  stage: false

Maintenance Windows

Control when upgrades start using cron-based maintenance windows. Running upgrades always complete without interruption.

maintenance:
  windows:
    - start: "0 2 * * 0"      # Sunday 02:00
      duration: "4h"           # Max 168h, warn if <1h
      timezone: "Europe/Paris" # IANA timezone, default UTC
  • Upgrades only start during open windows (stays Pending otherwise)
  • Multiple windows create union (any open window allows start)
  • In-progress upgrades always complete (never interrupted)
  • TalosUpgrade re-checks between nodes
  • Empty config: upgrades start immediately (backwards compatible)

Per-Node Overrides

Tuppr supports overriding the global TalosUpgrade configuration on a per-node basis using Kubernetes annotations. This is useful for testing new versions on a canary node or handling nodes with different hardware schematics.

| Annotation | Description | Example | | -------- | ------- | ------- | | tuppr.home-operations.com/version | Overrides the target Talos version for this node. | v1.12.1 | | tuppr.home-operations.com/schematic | Overrides the Talos schematic hash for this node. | b55fbf... |

Example: Applying an override

# Upgrade a specific node to a different version than the global policy
kubectl annotate node worker-01 tuppr.home-operations.com/version="v1.12.1"

# Apply a custom schematic (with specific extensions) to one node
kubectl annotate node worker-02 tuppr.home-operations.com/schematic="314b18a3f89d..."

How it works:

  • The controller checks if a node version or schematic matches the annotation instead of the global TalosUpgrade spec.
  • If an inconsistency is found, an upgrade job is triggered for that node using the override values.

⚠️ Safe Talos Upgrade Paths

Talos Linux has specific supported upgrade paths. You should always upgrade through each minor version sequentially rather than skipping minor versions. For example, upgrading from Talos v1.0 to v1.2.4 requires:

  1. Upgrade from v1.0.x to the latest patch of v1.0 (e.g., v1.0.6)
  2. Upgrade from v1.0.6 to the latest patch of v1.1 (e.g., v1.1.2)
  3. Upgrade from v1.1.2 to v1.2.4

Tuppr does not automatically enforce safe upgrade paths — it will upgrade directly to whatever version you specify in the TalosUpgrade resource. It is your responsibility to ensure the target version is a valid upgrade from your current version.

Recommended: Use Renovate for Safe Version Bumps

Renovate can automate version updates in your GitOps repository while respecting safe upgrade boundaries. Configure it to separate major/minor and minor/patch PRs so you can step through each version sequentially:

{
  "packageRules": [
    {
      "matchDatasources": ["docker"],
      "matchPackageNames": ["ghcr.io/siderolabs/installer"],
      "separateMajorMinor": true,
      "separateMinorPatch": true
    }
  ]
}

This way, Renovate will propose incremental version bumps that you can merge one at a time, ensuring you follow the supported upgrade path. Combine this with the renovate comment in your TalosUpgrade spec:

spec:
  talos:
    # renovate: datasource=docker depName=ghcr.io/siderolabs/installer
    version: v1.11.0

📊 Monitoring & Metrics

Prometheus Metrics

Tuppr exposes comprehensive Prometheus metrics for monitoring upgrade progress, health check performance, and job execution:

Talos Upgrade Metrics

# Current phase of a Talos upgrade (state-set: 

Related Skills

View on GitHub
GitHub Stars215
CategoryDevelopment
Updated18h ago
Forks13

Languages

Go

Security Score

100/100

Audited on Apr 4, 2026

No findings