Kubernetes Autoscaling

Guide to Kubernetes Autoscaling
Introduction Kubernetes Autoscaling

Kubernetes offers multiple levels of capacity management control for autoscaling. So much so that the multitude of knobs can confuse even the most experienced administrators.

Kubernetes schedulers assign pods of containers to cluster nodes with the entire process controllable by configuration parameters in YAML files. Using those files, Kubernetes administrators can request and set maximum limits for the CPU and memory available for use by each container within a pod (also known as resource request and limits).

Sizing container CPU and memory resources
Kubernetes cluster with two nodes running pods and containers of various sizes

Administrators can also provide instructions for Kubernetes to automatically allocate more CPU and memory to a pod according to CPU and memory usage criteria (also known as vertical pod autoscaling). Furthermore, they can configure Kubernetes to automatically replicate pods for stateless application workloads (also known as horizontal pod autoscaling). Finally, they can also configure the cluster to add more nodes once the other nodes are fully used or reserved (also known as cluster autoscaler).

Vertical Pod Autoscaler (VPA)
Increases and decreases pod CPU and memory
Horizontal Pod Autoscaler (HPA)
Adds and removes pods
Cluster Autoscaler (CA)
Adds and removes cluster nodes

Additional controls exist to help administrators guide workloads to specific nodes or node groups (also known as taints and tolerations) and create logical administrative domains within a cluster for different teams (also known as namespaces), each with its maximum allowed usage of cluster CPU and memory (also known as resource quotas).

The impacts of taints and toleration and requests and limits
The impacts of taints and toleration and requests and limits on a namespace

Despite (or maybe in spite) of the sophisticated autoscaling technology, a Kubernetes cluster often accumulates financial waste and creates performance bottlenecks over time.

Balancing performance and resource utilization in K8s
Balancing performance and resource waste

The top three reasons administrators don’t get away from the burden of balancing performance and efficiency are:

Excessive container requests cause waste
Users often request (or reserve) more CPU and memory for their containers than their application workloads use. Kubernetes respects such requests allowing waste to accumulate over time and magnify with automated replication.
Resource configurations ignore IOPS and network
Kubernetes focuses on CPU and memory; however, many performance bottlenecks are created within the I/O capacity to write to disk or the network bandwidth required to transfer data, thus complicating cluster node configuration.
Static thresholds selected by humans pose a risk
Manual thresholds (e.g., pod CPU request and limit) or autoscaling mechanisms based on moving averages (of coarsely aggregated data) aren’t a match for the machine learning algorithms required to ensure an accurate assessment of resource usage and lack in upstream open-source distributions.

In this guide, we explain the Kubernetes autoscaling and control functionality using examples and YAML configuration files and highlight the limitations that deserve the attention of administrators.

You like our article?

Follow our LinkedIn monthly digest to receive more free educational content like this.

Follow LinkedIn K8s digest

Automated, Intelligent Container Sizing

Kubernetes Vertical Pod Autoscaling doesn’t recommend pod limit values or consider I/O. Densify identifies mis-provisioned containers at a glance and prescribes the optimal configuration.

Densify has partnered with Intel to offer one year of free resource optimization software licensing to qualified companies.

Visualization of memory resource risk
  • Kubernetes
  • Red Hat OpenShift

The Chapters

Chapter 1: Vertical Pod Autoscaling (VPA)

Learn how VPA can recommend more CPU and memory for your pods.

Read Chapter 1

Chapter 2: Horizontal Pod Autoscaling (HPA)

Appreciate the power of replicating pods for applications ready to take advantage.

Read Chapter 2

Chapter 3: The Kubernetes Cluster Autoscaler

Learn to add nodes to expand the computing capacity of hosted clusters.

Read Chapter 3

Chapter 4: How to manage Kubernetes Resource Limits

Learn how to define Kubernetes resource quotas, set limit ranges, and optimize resource usage.

Read Chapter 4

Chapter 5: Kubernetes Resource Quota

Learn using Kubernetes resource quota in namespaces along with pod requests and limits.

Read Chapter 5

Chapter 6: Taints and Tolerations

Understand how Taints and Tolerations control node assignment.

Read Chapter 6

Chapter 7: Kubernetes Workload

Understand the difference between ReplicaSet, Deployment, DeamonSet, and more.

Read Chapter 7

Chapter 8: Service Load Balancer

Become familiar with Kubernetes services and how to distribute service traffic.

Read Chapter 8

Chapter 9: Kubernetes Namespace

Follow step by step instructions to configure a Kubernetes namespace.

Read Chapter 9

Chapter 10: Kubernetes Affinity

Learn Kubernetes advanced scheduling using Node Affinity and Pod Affinity.

Read Chapter 10

Chapter 11: Kubernetes Node Capacity

Explore all of the dimensions in planning for Kubernetes node capacity.

Read Chapter 11

Chapter 12: Kubernetes Service Discovery

Understand the functionality of the Kubernetes Service Discovery by following examples.

Read Chapter 12

Chapter 13: Kubernetes Labels

Learn Kubernetes Labels use cases and best practices by following examples.

Read Chapter 13

Automated, Intelligent Container Sizing

Kubernetes Vertical Pod Autoscaling doesn’t recommend pod limit values or consider I/O. Densify identifies mis-provisioned containers at a glance and prescribes the optimal configuration.

Densify has partnered with Intel to offer one year of free resource optimization software licensing to qualified companies.

Visualization of memory resource risk
  • Kubernetes
  • Red Hat OpenShift

Continue Reading this Series