Component Instance Auto-scaling

The load of components will constantly change, high and low, and it is difficult to configure appropriate resources for them; Kato introduces automatic scaling, which can solve this problem well. The automatic scaling of the component will always observe the indicator set by the user.Once the indicator exceeds or falls below the expected threshold, the component will be automatically scaled.

This article will introduce the automatic scaling function of components based on the following three aspects:

  • The principle of automatic expansion of components.
  • How to use component auto-scaling.
  • A demonstration example of component auto-scaling.

The Principle of Automatic Scaling of Components

There are two types of component auto-scaling, which are horizontal auto-scaling and vertical auto-scaling. Horizontal scaling increases or decreases the number of copies of the component; while automatic horizontal scaling is based on the component’s CPU usage, memory or other Custom indicators, automatically perform horizontal scaling of components. Vertical scaling is to allocate more or less CPU and memory to the component; while automatic vertical scaling is to automatically perform the vertical scaling of the component according to the component’s CPU usage, memory or other custom indicators. Currently (5.1.9), Kato only supports horizontal auto-scaling of components, so this article will not involve too much vertical auto-scaling.

Component Horizontal Auto-scaling

Horizontal Pod Autoscaler consists of two parts, HPA resource and HPA controller. HPA resource defines the behavior of the component, including indicators, expected values, and the maximum and minimum number of copies. HPA controller, period Check the indicators set by the inspection component; the period is controlled by the parameter --horizontal-pod-autoscaler-sync-period of the controller manager, and the default is 15 seconds.

Rbd Worker is responsible for converting the indicators, expected values, maximum and minimum number of copies and other parameters you set for the component in the cloud help console into HPA resources in the kubernetes cluster for use by the HPA controller.

In each cycle, the HPA controller queries the metrics set by the user for each component through the mertrics API; when the metrics exceed or fall below the expected threshold, the HPA controller adjusts the number of copies in the Deployment/Statefulset , Finally, the number of component instances is increased or decreased by Deployment/Statefulset.

HPA controllers generally observe metrics from three aggregated APIs: metrics.k8s.io, custom.metrics.k8s.io and external.metrics.k8s.io.

metrics.k8s.io This API is provided by metrics-server, which corresponds to resource metrics, that is, CPU usage Rate, CPU usage and Memory usage, Memory usage. They are also the indicator types currently supported by Kato.

custom.metrics.k8s.io corresponds to custom indicators, external.metrics.k8s.io corresponds to external indicators. For example: requests-per-secon , Packets received per second (packets-per-second). By Kube Metrics Adapter, Prometheus Adapter

Or it is provided by a third-party service that follows the Kubernetes metrics API definition implemented by itself. Custom metrics and external metrics are roughly the same.

The custom indicator does not support the usage rate, it can only be the value, or the usage amount, which is the average value of each instance of the component.

Currently in 5.1.9, Kato only supports resource indicators, that is, indicators related to CPU and memory. The custom indicators and external indicators in the dashed box will be implemented in future versions.

Horizontal Automatic Scaling Algorithm

From the most basic point of view, Horizontal Pod Autoscaler calculates the expected number of instances based on the ratio of the target value of the indicator to the actual value of the indicator:

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue )]

For example, if the actual value of the indicator is 200m, and the expected value is 100m, then the number of instances will double because 200.0 / 100.0 == 2.0; if the actual value is reduced to 50m, then The number of instances does not change, because 50.0 / 100.0 == 0.5 (the ratio is close to 1.0, and the HPA controller will skip scaling).

Use of Component Auto-scaling

The automatic scaling of the component is located in the scaling option on the component page, under the manual scaling.

As mentioned in [Component level auto-scaling](/Component level auto-scaling), you can set the auto-scaling maximum number of copies, minimum number of copies and index on the auto-scaling page.

Maximum Number of Instances

When the actual value of the indicator exceeds its expected value, the number of instances of the component will continue to increase, as far as possible to make the actual value of the indicator lower than the expected value. When the number of instances is equal to the maximum number of instances, the actual value of the indicator If it is still higher than the expected value, then the number of instances is fixed at the maximum number of instances and will not continue to grow.

Minimum Number of Instances

When the actual value of the indicator is lower than its expected value, the number of instances of the component will continue to decrease, as much as possible to make the actual value of the indicator close to the expected value. When the number of instances is equal to the minimum number of instances, the actual value of the indicator Still not close to the expected value, then the number of instances is fixed at the minimum number of instances and will not continue to decrease.

Indicators

Currently, Kato only supports resource metrics, namely CPU usage, CPU usage, Memory usage and Memory usage. The unit of CPU usage is m, and 1m is equal to one thousandth of a core. The unit of memory usage is Mi.

Indicators support adding and deleting, follow the following rules:

  • There can only be one indicator for CPU, namely CPU usage, CPU usage two indicators cannot coexist. The same is true for memory.
  • There is at least one indicator, i.e. all indicators cannot be deleted.
  • The indicator value must be an integer greater than 0.

The modification of the above 3 types of parameters does not need to update or restart the component, and the modification will take effect immediately.

A Demo Example

Create Component hpa-example

In order to demonstrate the horizontal automatic scaling of components, we will use a custom docker image based on the php-apache image. The content of the Dockerfile is as follows:

FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php

It defines an index.php page that performs some CPU-intensive calculations:

<?php
  $x = 0.0001;
  for ($i = 0; $i <= 1000000; $i++) {
    $x += sqrt($x);
  }
  echo "OK!";
?>

You can build a mirror based on the above Dockerfile, or you can use our prepared mirror gdevs/hpa-example:latest. We use this mirror to create a component with a memory limit of 128M and a port of 80.

Configure Automatic Scaling

We set the maximum number of instances to 10, the minimum number of instances to 1, and the CPU usage rate is 50%. Roughly speaking, HPA will increase or decrease the number of replicas (through deployment) to maintain the average CPU usage rate of all Pods at 50% . As shown below:

We use the command line to check the current status of hpa:

root@r6dxenial64:~# kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
6b981574b23d4073a226cf95faf497e3 a737ffa9edca436fadb609d5b3dab1bd Deployment/5a8e8667d96e194be248f2856dcaedda-deployment 1%/40% 1 10 1 1h

Increase Load

Now, we open a Linux terminal and send messages to the php-apache service in an infinite loop.

# Please replace http://80.grcaedda.eqfzk1ms.75b8a5.grapps.ca with your actual domain name
while true; do wget -q -O- http://80.grcaedda.eqfzk1ms.75b8a5.grapps.ca; done

One minute later, we use the following command to check the status of hpa:

root@r6dxenial64:~# kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
6b981574b23d4073a226cf95faf497e3 a737ffa9edca436fadb609d5b3dab1bd Deployment/5a8e8667d96e194be248f2856dcaedda-deployment 270%/40% 1 10 1 1h

It can be seen that the CPU usage has risen to 270%. This has caused the number of instances to be increased to 4. As shown in the following figure:

Reduce Load

Now, let’s stop the infinite loop running in the Linux terminal above to reduce the load of the component. Then, let’s check the status of hpa, as follows:

root@r6dxenial64:~# kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
6b981574b23d4073a226cf95faf497e3 a737ffa9edca436fadb609d5b3dab1bd Deployment/5a8e8667d96e194be248f2856dcaedda-deployment 3%/50% 1 10 1 1h

The CPU usage dropped to 0, so the number of instances scaled to 1.

Horizontal Scaling Record

Let’s observe the horizontal expansion and contraction records of the components and see what happened in this process. As shown in the following figure:

After we configure the relevant parameters, the horizontal auto-scaling starts to take effect, but the indicator is not ready at this time, so we see two records of failure to obtain the indicator. After we continue to send messages to the component in an infinite loop, the horizontal auto-scaling checks that the CPU usage exceeds the target value, and we start to talk about the number of instances scaling to 4 and then to 6. When we stopped sending messages, the load went down, the horizontal auto-scaling checked that the CPU usage was lower than the target value, and the number of instances was directly scaled to 1.

Kato records the instance change process of the component, so that the operation and maintenance personnel can check the instance change of the component at any time.