General Data Initialization Plugin

Overview

Project address: https://github.com/gridworkz/data-initer-plugin

This is a plug-in for initializing data, applicable to all cloud platforms based on the Kubernetes system, including Kato.

The basic principle is implemented using Kubernetes' init container. The container where the plug-in is located will run until completion before the business container is started. Through the defined download and decompression logic, the pre-prepared initialization data compression package (only supports zip, tgz, tar.gz three formats) will be decompressed to the target directory In the middle, the downloading process supports resumable downloading. Of course, we have to set the target directory persistently in advance.

The environment variable configuration required by the plugin is as follows:

ENVVALUETip
FILE_URLurlInitial file download address
FILE_PATHpath to dirWhen a single directory is initialized, specify the persistent directory address; when multiple directories are initialized, specify /
EXTRACT_FILEtrue/falseThe initialization file is automatically decompressed by default
DOWNLOAD_ARGS-X ,–xxAdditional command line parameters for wget
LOCK_PATHpath to dirThe path to save the lock file, specify any existing persistent directory
DEBUGanything trueEnable Debug log

Build Plugins in Kato

In Kato’s plug-in mechanism, there is a natural support for the init container-initialization type plug-ins.

1. New plug-in

2. Fill in the Build Source Information

The key information includes:

  • Source code address: https://github.com/gridworkz/data-initer-plugin.git When choosing Dockerfile to install, the code address that needs to be provided
  • Code version: main

Next, click Create plug-in and wait for the build to succeed.

3. Declare the Plugin Configuration

In this step, we need to declare what configuration the plug-in can receive. From the overview section, we know that this plug-in needs to define several environment variables when it works normally.

Enter the configuration group management office, add a group of configurations:

After saving the configuration, the plugin is ready.

How to Use Plugins

1. Prerequisites

  • The service components that need to be initialized have set the persistent directory.
  • The persistent data has been packaged (support format zip, tgz, tar.gz) and uploaded to the object storage.

2. Install and configure the plugin

  • Install the general data initialization plug-in that has been made for the service component.
  • View the configuration, enter the download address (FILE_URL) of the initialization data package, the target persistent directory (FILE_PATH), and the lock file storage directory (LOCK_PATH) to update the configuration.
  • Update the memory, because the initialization type plug-in will automatically exit after running, so you don’t have to worry about occupying too much resources. The setting of the memory value can be enlarged as much as possible, preferably slightly larger than the size of the persistent data packet. This will speed up the download and decompression speed.

3. Build and start the service component

Observation log, if the output is as follows, it means that the data initialization process has started:

7b554df4b7bb:Connecting to kato-pkg.oss-us-virginia.aliyuncs.com (106.14.228.173:443)

7b554df4b7bb:data.tgz 0% | | 367k 2:45:46 ETA

After waiting for the download and decompression to complete, the service component will enter the normal startup process.

4. Remove the Plugin

The plug-in has the permission to read and write the persistent data directory of the service component. Although we have added the implementation logic to prevent repeated initialization, we still strongly require to uninstall the plug-in after the data is successfully initialized.

How to Initialize Multiple Directories

In the basic usage method above, data can be initialized for a specified directory. However, in actual situations, it may be necessary to initialize data for multiple directories at the same time, and multiple directories are not subordinate to the same parent directory. For example, it is necessary to initialize data to 3 directories of /app, /data/a, and /root/.b at the same time.

In view of this situation, we can solve this problem from the perspective of data packaging. Pack all data into the same compressed package through the specified packaging method, and after downloading and decompressing, directly decompress it to each directory.

The packaging method is as follows:

tar cvzf data.tgz /app /data/a /root/.b

After uploading the packaged file and storing the object, you get the download URL, for example https://gridworkz-delivery.oss-us-virginia.aliyuncs.com/somedir/data.tgz

When operating in the Kato platform, the process is roughly as follows:

  • Persist all 3 target directories
  • Install general data initialization plugin
  • Fill in FILE_URL and use the URL just obtained
  • Fill in FILE_PATH, its value is / This is the most critical step
  • Fill in LOCK_PATH, its value must be an existing persistent storage path
  • Start the component to start the initialization process

About Locking Files

After the first initialization is completed, a hidden file will be generated under LOCK_PATH, namely the lock file. The subsequent restart process will identify this lock file, and if it exists, skip the initialization process. The purpose of this is to avoid re-initializing data.

About Object Storage

We recommend to put the initialization data package in the object storage and provide a download address that can be accessed by the Kato platform.

Common self-built object storage has the following two situations:

  • Object storage software based on the S3 protocol, such as Minio, can be searched and installed after Kato is connected to the open source application store.
  • Object storage services provided by public cloud service providers, such as Alibaba Cloud OSS.