Component Health Check

Health detection is to truly reflect the running status of the component business. Without configuring the health check, the running state of the component is determined by the container (process) state, that is, if the container is successfully started, the component is considered to have entered the Ready state. The component instance in the Ready state will be immediately added to the traffic processing. However, we all know that most of the program business requires startup time. It takes a period of time from the beginning of the startup to the preparation of the business.

Generally, the more complex components take longer. Accepting business requests by components that are not yet ready will cause some requests to fail. Especially in the process of component rolling upgrade, for stateless components, the platform implements a mechanism of first starting a new instance to accept traffic, and then closing the old instance, if the health status of the component is not accurately reflected. The effect of rolling upgrade will be greatly reduced. So we need to use a mechanism to verify the true state of the component as much as possible, this is the component health detection.

Currently, component health detection supports the following two mechanisms:

  • TCP port detection This detection method is to try to establish a TCP connection with the port configured by the component. If it is established normally, it is considered to be in a healthy state.
  • HTTP service detection The establishment of monitoring on the port does not fully represent the normal business. Therefore, for HTTP services, the specified route can be requested to determine the component health status based on the status code. This model is more precise.

After the component is started, it must go through a health check to indicate its status. When the component is unhealthy, there are two ways to deal with it:

Set components as unhealthy

When a component instance is set as unhealthy, it will be offline from the application gateway and ServiceMesh network. Wait for it to work normally and then automatically go online again. But if the component has only one instance, Kato will not take it offline.

Restart component instance

Some components may form a deadlock process due to code blocking and other reasons. The component cannot be provided but the process still runs. The unhealthy state of such components can only be handled by restarting the instance.

Therefore, users can judge and choose the appropriate processing method according to the business status.

Operating Procedures

The configuration of component health detection is in the component control panel/other settings page.

  1. Click the edit button of the health check, the pop-up window displays the configuration items of the health check.

    • Port: Select the port for the component to perform health detection. If the actual detection port of the component does not exist in the option, please add it in the port management page.
    • Probe protocol: According to the above, the protocol selection supports TCP and HTTP, and the subsequent setting items are somewhat different for different protocols.
    • Unhealthy handling method: The default is “offline”, you can choose “restart”.
    • Setting items corresponding to HTTP protocol: After selecting HTTP protocol, you can set the path and request header for detection (for example, Token request is required). Note that the routing request must return a status code less than 400, which is considered healthy.
    • Initialization waiting time: refers to the time to wait for the component instance to start before starting detection, the default is 4 seconds.
    • Detection interval: refers to the time interval between two consecutive detection tasks.
    • Detection timeout period: If the request is blocked when there is a problem in the detection request, the timeout period will take effect.
    • The number of consecutive successes: refers to the number of consecutive successful detections when the component instance is marked as healthy.

The above information is filled in according to the actual situation. After saving, the component health detection mechanism needs to be updated to take effect.

  1. Enable/disable health detection

Under special circumstances, developers may need to temporarily disable health checks to keep the components in a healthy state. You can use the enable/disable health check function. After the modification, the component needs to be updated to take effect.

Common Problems

Request failure occurred during component rolling update

When this problem occurs, it is strongly recommended to set more precise health detection rules, such as using HTTP mode.

How to set up health check for components of other protocols

Mysql, Redis and other application layer protocols currently do not support accurate detection, please use TCP mode. In the future, we will increase the use of cmd for detection, which can better support different types of components.