Descheduler
Descheduler for Kubernetes
Install / Use
/learn @kubernetes-sigs/DeschedulerREADME
Descheduler for Kubernetes
Scheduling in Kubernetes is the process of binding pending pods to nodes, and is performed by a component of Kubernetes called kube-scheduler. The scheduler's decisions, whether or where a pod can or can not be scheduled, are guided by its configurable policy which comprises of set of rules, called predicates and priorities. The scheduler's decisions are influenced by its view of a Kubernetes cluster at that point of time when a new pod appears for scheduling. As Kubernetes clusters are very dynamic and their state changes over time, there may be desire to move already running pods to some other nodes for various reasons:
- Some nodes are under or over utilized.
- The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
- Some nodes failed and their pods moved to other nodes.
- New nodes are added to clusters.
Consequently, there might be several pods scheduled on less desired nodes in a cluster. Descheduler, based on its policy, finds pods that can be moved and evicts them. Please note, in current implementation, descheduler does not schedule replacement of evicted pods but relies on the default scheduler for that.
⚠️ Documentation Versions by Release
If you are using a published release of Descheduler (such as
registry.k8s.io/descheduler/descheduler:v0.35.0), follow the documentation in
that version's release branch, as listed below:
|Descheduler Version|Docs link|
|---|---|
|v0.35.x|release-1.35|
|v0.34.x|release-1.34|
|v0.33.x|release-1.33|
|v0.32.x|release-1.32|
|v0.31.x|release-1.31|
|v0.30.x|release-1.30|
The
master
branch is considered in-development and the information presented in it may not
work for previous versions.
Quick Start
The descheduler can be run as a Job, CronJob, or Deployment inside of a k8s cluster. It has the
advantage of being able to be run multiple times without needing user intervention.
The descheduler pod is run as a critical pod in the kube-system namespace to avoid
being evicted by itself or by the kubelet.
Run As A Job
kubectl create -f kubernetes/base/rbac.yaml
kubectl create -f kubernetes/base/configmap.yaml
kubectl create -f kubernetes/job/job.yaml
Run As A CronJob
kubectl create -f kubernetes/base/rbac.yaml
kubectl create -f kubernetes/base/configmap.yaml
kubectl create -f kubernetes/cronjob/cronjob.yaml
Run As A Deployment
kubectl create -f kubernetes/base/rbac.yaml
kubectl create -f kubernetes/base/configmap.yaml
kubectl create -f kubernetes/deployment/deployment.yaml
Install Using Helm
Starting with release v0.18.0 there is an official helm chart that can be used to install the descheduler. See the helm chart README for detailed instructions.
The descheduler helm chart is also listed on the artifact hub.
Install Using Kustomize
You can use kustomize to install descheduler. See the resources | Kustomize for detailed instructions.
Run As A Job
kustomize build 'github.com/kubernetes-sigs/descheduler/kubernetes/job?ref=release-1.34' | kubectl apply -f -
Run As A CronJob
kustomize build 'github.com/kubernetes-sigs/descheduler/kubernetes/cronjob?ref=release-1.34' | kubectl apply -f -
Run As A Deployment
kustomize build 'github.com/kubernetes-sigs/descheduler/kubernetes/deployment?ref=release-1.34' | kubectl apply -f -
User Guide
See the user guide in the /docs directory.
Policy, Default Evictor and Strategy plugins
The Descheduler Policy is configurable and includes default strategy plugins that can be enabled or disabled. It includes a common eviction configuration at the top level, as well as configuration from the Evictor plugin (Default Evictor, if not specified otherwise). Top-level configuration and Evictor plugin configuration are applied to all evictions.
Top Level configuration
These are top level keys in the Descheduler Policy that you can use to configure all evictions.
| Name | type | Default Value | Description |
|------------------------------------|----------|---------------|----------------------------------------------------------------------------------------------------------------------------|
| nodeSelector | string | nil | Limiting the nodes which are processed. Only used when nodeFit=true and only by the PreEvictionFilter Extension Point. |
| maxNoOfPodsToEvictPerNode | int | nil | Maximum number of pods evicted from each node (summed through all strategies). |
| maxNoOfPodsToEvictPerNamespace | int | nil | Maximum number of pods evicted from each namespace (summed through all strategies). |
| maxNoOfPodsToEvictTotal | int | nil | Maximum number of pods evicted per rescheduling cycle (summed through all strategies). |
| metricsCollector (deprecated) | object | nil | Configures collection of metrics for actual resource utilization. |
| metricsCollector.enabled | bool | false | Enables Kubernetes Metrics Server collection. |
| metricsProviders | []object | nil | Enables various metrics providers like Kubernetes Metrics Server |
| evictionFailureEventNotification | bool | false | Enables eviction failure event notification. |
| gracePeriodSeconds | int | nil | The duration in seconds before the object should be deleted. The value zero indicates delete immediately. If this value is nil, the default grace period for the specified type will be used. |
| prometheus |object| nil | Configures collection of Prometheus metrics for actual resource utilization |
| prometheus.url |string| nil | Points to a Prometheus server url |
| prometheus.authToken |object| nil | Sets Prometheus server authentication token. If not specified in cluster authentication token from the container's file system is read. |
| prometheus.authToken.secretReference |object| nil | Read the authentication token from a kubernetes secret (the secret is expected to contain the token under prometheusAuthToken data key) |
| prometheus.authToken.secretReference.namespace |string| nil | Authentication token kubernetes secret namespace (currently, the RBAC configuration permits retrieving secrets from the kube-system namespace. If the secret needs to be accessed from a different namespace, the existing RBAC rules must be explicitly extended. |
| prometheus.authToken.secretReference.name |string| nil | Authentication token kubernetes secret name |
The descheduler currently allows to configure a metric collection of Kubernetes Metrics through metricsProviders field.
The previous way of setting metricsCollector field is deprecated. There are currently two sources to configure:
KubernetesMetrics: enables metrics collection from Kubernetes Metrics serverPrometheus: enables metrics collection from Prometheus server
In general, each plugin can consume metrics from a different provider so multiple distinct providers can be configured in parallel.
Evictor Plugin configuration (Default Evictor)
The Default Evictor Plugin is used by default for filtering pods before processing them in an strategy plugin, or for applying a PreEvictionFilter of pods before eviction. You can also create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config.
| Name | Type | Default Value | Description | |---------------------------|------------------------|---------------|----------------------------------------
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
