Deprecation Notice: kube-throttler's development had been transferred to pfnet/kube-throttler.
kube-throttler enables you to throttle your pods. It means that kube-throttler can prohibit to schedule any pods when it detects total amount of computational resource(in terms of resources.requests field) or the count of Running pods may exceeds a threshold .
kube-throttler provides you very flexible and fine-grained throttle control. You can specify a set of pods which you want to throttle by label selector and its threshold by Throttle/ClusterThrottle CRD (see deploy/0-crd.yaml for complete definition).
Throttle control is fully dynamic. Once you update throttle setting, kube-throttler follow the setting and change its status in up-to-date.
Quota returns error when you tried to create pods if you requested resource which exceeds the quota. However Throttle won't return any errors when creating pods but keep your pods stay Pending state by just throttling running pods.
And Quota is based on Namespace which is the unit of multi tenancy in Kubernetes. Throttle provides a kind of virtual computational resource pools in more dynamic and more finer grained way.
kube-throttler is implemented as a kubernetes scheduler plugin by Scheduling Framework.
There are two ways to use kube-throttler:
- Using pre-build binary
- Integrate
kube-throttlerwith your scheduler plugins
kube-throttler ships pre-build binary/container images which kube-throttler is integrated with kube-scheduler.
kubectl create -f deploy/This creates:
kube-throttlernamespace, service accounts, RBAC entries- this will create a cluster role and cluster role binding. please see deploy/2-rbac.yaml for detail.
- custom
kube-throttlerintegratedkube-schedulerdeployment- with sample scheduler config
- scheduler name is
my-scheduler - throttler name is
kube-throttler
- scheduler name is
- with sample scheduler config
You need to register kube-throttler to your scheduler by calling app.WithPlugin() like this:
...
import (
"time"
kubethrottler "github.com/everpeace/kube-throttler/pkg/scheduler_plugin"
"k8s.io/component-base/logs"
"k8s.io/kubernetes/cmd/kube-scheduler/app"
)
func main() {
command := app.NewSchedulerCommand(
...
app.WithPlugin(kubethrottler.PluginName, kubethrottler.NewPlugin),
)
logs.InitLogs()
defer logs.FlushLogs()
if err := command.Execute(); err != nil {
os.Exit(1)
}
}See these documents and repos for details of Scheduling Framework:
- Scheduling Framework's Official Document
- Scheduler Plugins - Repository for out-of-tree scheduler plugins based on the scheduler framework.
kube-throttler requires [kube-throttler] cluster roles defined in deploy/rbac.yaml
You need to enable kube-throttler in your scheduler config. See deploy/config.yaml
a Throttle custom resource defines three things:
- throttler name which is responsible for this
Throttlecustom resource. - a set of pods to which the throttle affects by
selector- please note that throttler only counts running pods which is responsible for configured target scheduler names.
- threshold of
- resource amount of
request-ed computational resource of the throttle - count of resources (currently only
podis supported) - those can be overridden by
temporaryThresholdOverride. Please refer to below section.
- resource amount of
And it also has status field. status field contains:
usedshows the current total usage ofreauest-ed resource amount or counts ofRunningpods matchingselectorcalculatedThresholdshows the calculated threshold value which takestemporaryThresholdOverrideinto account.throttledshows the throttle is active for each resource requests or resource counts.
# example/throttle.yaml
apiVersion: schedule.k8s.everpeace.github.com/v1alpha1
kind: Throttle
metadata:
name: t1
spec:
# throttler name which responsible for this Throttle custom resource
throttlerName: kube-throttler
# you can write any label selector freely
# items under selecterTerms are evaluated OR-ed
# each selecterTerm item are evaluated AND-ed
selector:
selecterTerms:
- podSelector:
matchLabels:
throttle: t1
# you can set a threshold of the throttle
threshold:
# limiting total count of resources
resourceCounts:
# limiting count of running pods
pod: 3
# limiting total amount of resource which running pods can `requests`
resourceRequests:
cpu: 200m
status:
# 'throttled' shows throttle status defined in spec.threshold.
# when you tried to create a pod, all your 'request'-ed resource's throttle
# and count of resources should not be throttled
throttled:
resourceCounts:
pod: false
resourceRequests:
cpu: true
# 'used' shows total 'request'-ed resource amount and count of 'Running' pods
# matching spec.selector
used:
resourceCounts:
pod: 1
resourceRequests:
cpu: 300mUser sometimes increase/decrease threshold value. You can edit spec.threshold directly. However, what if the increase/decrease is expected in limited term?? Temporary threshold overrides can solve it.
Temporary threshold overrides provides declarative threshold override. It means, override automatically activated when the term started and expired automatically when the term finished. It would greatly reduces operational tasks.
spec can have temporaryThresholdOverrides like this:
apiVersion: schedule.k8s.everpeace.github.com/v1alpha1
kind: Throttle
metadata:
name: t1
spec:
threshold:
resourceCounts:
pod: 3
resourceRequests:
cpu: 200m
memory: "1Gi"
nvidia.com/gpu: "2"
temporaryThresholdOverrides:
# begin/end should be a datetime string in RFC3339
# each entry is active when t in [begin, end]
# if multiple entries are active all active threshold override
# will be merged (first override lives for each resource count/request).
- begin: 2019-02-01T00:00:00+09:00
end: 2019-03-01T00:00:00+09:00
threshold:
resourceRequests:
cpu: "5"
- begin: 2019-02-15T00:00:00+09:00
end: 2019-03-01T00:00:00+09:00
threshold:
resourceRequests:
cpu: "1"
memory: "8Gi"temporaryTresholds can define multiple override entries. Each entry is active when current time is in [begin, end] (inclusive on both end). If multiple entries are active, all active overrides will be merged. First override lives for each resource count/request. For above example, if current time was '2019-02-16T00:00:00+09:00', both overrides are active and merged threshold will be:
resourceCounts: # this is not overridden
pod: 3
resourceRequests:
cpu: "5" # from temporaryThresholdOverrides[0]
memory: "8Gi" # from temporaryThresholdOverrides[1]These calculated threshold value are recoreded in staus.calculatedThrottle field. The field matters when deciding throttle is active or not.
I describe a simple scenario here. Note that this scenario holds with ClusterThrottle. The only difference between them is ClusterThrottles can targets pods in multiple namespaces but Throttle can targets pods only in the same namespace with it.
- define a throttle
t1which targetsthrottle=t1label and thresholdcpu=200mandmemory=1Gi. - create
pod1with the same label andrequestscpu=200m - then,
t1status will transition tothrottled: cpu: truebecause total amount ofcpuof running pods reaches its threshold. - create
pod2with the same label andrequestscpu=300mand see the pod staysPendingstate becausecpuwas throttled. - create
pod1mwith same label andrequestsmemory=512Mi. ane see the pod will be scheduled becauset1is throttled only oncpuandmemoryis not throttled. - update
t1threshold withcpu=700m, then throttle will open and seepod2will be scheduled. t1'scpucapacity remains200m(threshold iscpu=700mand usedcpu=500m) now.- then, create
pod3with same label andrequestscpu=300m. kube-throttler detects no enough space left forcpuresource int1. So,pod3stays `Pending.
Lets' create Thrttle first.
kubectl create -f example/throttle.yaml Just after a while, you can see the status of the throttle change:
$ kubectl get throttle t1 -o yaml
...
spec:
throttlerName: kube-throttler
selector:
selecterTerms:
- podSelector:
matchLabels:
throttle: t1
threshold:
resourceCounts:
pod: 5
resourceRequests:
cpu: 200m
memory: 1Gi
status:
throttled:
resourceCounts:
pod: false
resourceRequests:
cpu: false
memory: false
used:
resourceRequests: {}Then, create a pods with label throttle=t1 and requests cpu=300m.
kubectl create -f example/pod1.yamlafter a while, you can see throttle t1 will be activated on cpu.
$ kubectl get throttle t1 -o yaml
...
status:
throttled:
resourceCounts:
pod: false
resourceRequests:
cpu: true
memory: false
used:
resourceCounts:
pod: 1
resourceRequests:
cpu: "0.200"Next, create another pod then you will see the pod will be throttled and keep stay Pending state by kube-throttler.
$ kubectl create -f example/pod2.yaml
$ kubectl describe pod pod2
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 14s (x9 over 1m) my-scheduler pod is unschedulable due to throttles[active]=(default,t1)In this situation, you can run pod1m requesting memory=512Mi because t1's memory throttle is not throttled.
$ kubectl create -f example/pod1m.yaml
$ kubectl get po pod1m
NAME READY STATUS RESTARTS AGE
pod1m 1/1 Running 0 24s
$ kubectl get throttle t1 -o yaml
...
status:
throttled:
resourceCounts:
pod: false
resourceRequests:
cpu: true
memory: false
used:
resourceCounts:
pod: 2
resourceRequests:
cpu: "0.200"
memory: "536870912"Then, update t1 threshold with cpu=700m
$ kubectl edit throttle t1
# Please edit threshold section 'cpu: 200m' ==> 'cpu: 700m'
$ kubectl describe pod pod2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 14s (x9 over 1m) my-scheduler pod is unschedulable due to throttles[active]=(default,t1)
Normal Scheduled 7s my-scheduler Successfully assigned default/pod-r8lxq to minikube
Normal Pulling 6s kubelet, minikube pulling image "busybox"
Normal Pulled 4s kubelet, minikube Successfully pulled image "busybox"
Normal Created 3s kubelet, minikube Created container
Normal Started 3s kubelet, minikube Started containerYou will also see t1 status now stays open.
$ kubectl get throttle t1 -o yaml
...
spec:
selector:
selecterTerms:
- podSelector:
matchLabels:
throttle: t1
threshold:
resourceCounts:
pod: 5
resourceRequests:
cpu: 700m
memory: 1Gi
status:
throttled:
resourceCounts:
pod: false
resourceRequests:
cpu: false
memory: false
used:
resourceCounts:
pod: 3
resourceRequests:
cpu: "0.500"
memory: "536870912"Now, t1 remains cpu:200m capacity. Then, create pod3 requesting cpu:300m. pod3 stays Pending state because t1 does not have enough capacity on cpu resources.
$ kubectl create -f example/pod3.yaml
$ kubectl get po pod3
NAME READY STATUS RESTARTS AGE
pod3 0/1 Pending 0 5s
$ kubectl describe pod pod3
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9s (x3 over 13s) my-scheduler 0/1 nodes are available: 1 pod (default,pod3) is unschedulable due to , throttles[insufficient]=(default,t1)kube-throttler exports prometheus metrics. Metrics are served on kube-scheduler's metrics endpoint. kube-throttler exports metrics below:
| metrics name | definition | example |
|---|---|---|
| throttle_status_throttled_resourceRequests | resourceRequests of the throttle is throttled or not on specific resource (1=throttled, 0=not throttled). |
throttle_status_throttled_resourceRequests{name="t1", namespace="default",uuid="...",resource="cpu"} 1.0 |
| throttle_status_throttled_resourceCounts | resourceCounts of the throttle is throttled or not on specific resource (1=throttled, 0=not throttled). |
throttle_status_throttled_resourceRequests{name="t1", namespace="default",uuid="...",resource="pod"} 1.0 |
| throttle_status_used_resourceRequests | used amount of resource requests of the throttle | throttle_status_used_resourceRequests{name="t1", namespace="default",uuid="...",resource="cpu"} 200 |
| throttle_status_used_resourceCounts | used resource counts of the throttle | throttle_status_used_resourceCounts{name="t1", namespace="default",uuid="...",resource="pod"} 2 |
| throttle_status_calculated_threshold_resourceRequests | calculated threshold on specific resourceRequests of the throttle | throttle_status_calculated_threshold_resourceRequests{name="t1", namespace="default",uuid="...",resource="pod"} 2 |
| throttle_status_calculated_threshold_resourceCounts | calculated threshold on specific resourceCounts of the throttle | throttle_status_calculated_threshold_resourceCounts{name="t1", namespace="default",uuid="...",resource="cpu"} 200 |
| throttle_spec_threshold_resourceRequests | threshold on specific resourceRequests of the throttle | throttle_spec_threshold_resourceRequests{name="t1", namespace="default",uuid="...",resource="pod"} 2 |
| throttle_spec_threshold_resourceCounts | threshold on specific resourceCounts of the throttle | throttle_spec_threshold_resourceCounts{name="t1", namespace="default",uuid="...",resource="cpu"} 200 |
| clusterthrottle_status_throttled_resourceRequests | resourceRequests of the clusterthrottle is throttled or not on specific resource (1=throttled, 0=not throttled). |
clusterthrottle_status_throttled_resourceRequests{name="clt1",uuid="...",resource="cpu"} 1.0 |
| clusterthrottle_status_throttled_resourceCounts | resourceCounts of the clusterthrottle is throttled or not on specific resource (1=throttled, 0=not throttled). |
clusterthrottle_status_throttled_resourceRequests{name="clt1",uuid="...",resource="pod"} 1.0 |
| clusterthrottle_status_used_resourceRequests | used amount of resource requests of the clusterthrottle | clusterthrottle_status_used_resourceRequests{name="t1",uuid="...",resource="cpu"} 200 |
| clusterthrottle_status_used_resourceCounts | used resource counts of the clusterthrottle | clusterthrottle_status_used_resourceCounts{name="clt1",uuid="...",resource="pod"} 2 |
| clusterthrottle_status_calculated_threshold_resourceRequests | calculated threshold on specific resourceRequests of the clusterthrottle | clusterthrottle_status_calculated_threshold_resourceRequests{name="t1",uuid="...",resource="pod"} 2 |
| clusterthrottle_status_calculated_threshold_resourceCounts | calculated threshold on specific resourceCounts of the clusterthrottle | clusterthrottle_status_calculated_threshold_resourceCounts{name="t1",uuid="...",resource="cpu"} 200 |
| clusterthrottle_spec_threshold_resourceRequests | threshold on specific resourceRequests of the clusterthrottle | clusterthrottle_spec_threshold_resourceRequests{name="t1",uuid="...",resource="pod"} 2 |
| clusterthrottle_spec_threshold_resourceCounts | threshold on specific resourceCounts of the clusterthrottle | clusterthrottle_spec_threshold_resourceCounts{name="t1",uuid="...",resource="cpu"} 200 |
Apache License 2.0
Since 1.0.0, change logs have been published in Github releases.
- Fixed
- fail fast the liveness probe when kubernetes api watch stopped(#23)
- Fixed
- Watching Kubernetes events stopped when some watch source faced error (#22)
- Changed
- upgraded
skuberversion tov2.2.0 - periodic throttle reconciliation which limits those which really need to
- upgraded
- Fixed
- reduced memory usage for large cluster.
kube-throttlerdoes not cache completed(status.phase=Succeeded|Failed) pods anymore.
- reduced memory usage for large cluster.
- Added
- support
Preemptscheduler extender at/preempt_if_not_throttledendpoint in order to prevent from undesired preemptions when high priority pods are throttled.
- support
- Changed
status.usedcounts on not onlyRunningpod but all scheduled Pod- scheduled means pod assigned to some node but not finished.
pod.status.phase != (Succeeded or Failed) && spec.nodeName is nonEmpty
- scheduled means pod assigned to some node but not finished.
- skip unnecessary calculation when pod changed. It reduces controller's load when pod churn rate is high.
- temporary threshold override reconciliation is now performed asynchronously
all changes are for performance issue.
- Changed
- now http server's request handling can be performed in isolated thread pool.
- checking a pod is throttled are performed in different actor (
ThrottleRequestHandler) - healthcheck is now performed in different actor
WatchActor.
- Fixed
- can't collect dispatcher metrics.
- Fixed
- too frequent updates on throttles/clusterthrottles calculated threshold
- Added
- introduced
status-force-update-interval (default: 15 m)parameter to update calculated threshold forcefully even if its threshold values are unchanged.
- introduced
- Fixed
- "too old resource version" on init reconciliation for clusters with large number of throttles/clusterthrottles
- Changed
- log level for all the metrics changes is now debug.
- Added
temporaryThresholdOverridesintroduced. User now can define declarative threshold override with finite term by using this.kube-throttleractivates/deactivates those threshold overrides.status.calculatedThresholdintroduced. This fields shows the latest calculated threshold. The field matters when deciding throttle is active or not.[cluster]throttle_status_calculated_thresholdare also introduced.
- Changed
- BREAKING CHANGE: change
spec.selectorobject schema to support OR-ed multiple label selectors andnamespaceSelectorin clusterthrottles. (#6)
- BREAKING CHANGE: change
- stop kube-throttlers (recommend to make
replicasto 0) - dump your all throttle/clusterthrottles
kubectl get clusterthrottles,throttles --all-namespaces - replace
selector.matchLabelswithselector.selectorTerms[0].podSelecter.matchLabelsin your crs# before spec: selector: matchLabels: { some object } matchExpressions: [ some arrays ] # after spec: selector: selectorTerms: - podSelector: matchLabels: { some object } matchExpressions: [ some arrays ]
- delete all throttle/clusterthrottles
kubectl delete clusterthrottles,throttles --all-namespaces --all - update crds and rbacs
kubectl apply -f deploy/0-crd.yaml; kubectl apply -f deploy/2-rbac.yaml - start kube-throttlers (recommend to make
replicasback to the original value) - apply updated throttles/clusterthrottoles crs.
- Changed
- large refactoring #4 (moving throttle logic to model package from controller package)
- skip un-marshalling
matchFieldsfield inNodeSelectorTerm.- the attribute has been supported since kubernetes
v1.11.
- the attribute has been supported since kubernetes
- Changed
- sanitize invalid characters in metrics labels
- remove
metadata.annotationsfrom metrics labels
- Added
resourceCounts.podinThrottle/ClusterThrottleso that user can throttle count ofrunningpod.
- Changed
- previous compute resource threshold should be defined in
resourceRequests.{cpu|memory}.
- previous compute resource threshold should be defined in
- introduce
ClusterThrottlewhich can target pods in multiple namespaces. - make
Throttle/ClusterThrottlenot burstable. This means if some throttle remainscpu:200mand pod requestingcpu:300is trie to schedule, kube-throttler does not allow the pod to be scheduled. At that case, message ofthrottles[insufficient]=<throttle name>will be returned to scheduler.
watch-buff-sizecan be configurable for large pods- properly handle initial sync error
- multi-throttler, multi-scheduler deployment support
throttlerNameis introduced inThrottleCRDthrottler-nameandtarget-scheduler-namesare introduced in throttler configuration
- fixed returning filter error when normal throttled situation.
first public release.