Current location - Trademark Inquiry Complete Network - Trademark registration - Elasticsearch data migration and cluster disaster recovery
Elasticsearch data migration and cluster disaster recovery

This article discusses how to migrate ES data across clusters and how to implement intra-city, cross-machine room disaster recovery and remote disaster recovery of ES.

In the production practice of ES, the following problems are often encountered:

According to business needs, the following scenarios exist:

If it is the first scenario, Writing can be stopped during the data migration process, and methods such as elasticsearch-dump, logstash, reindex, and snapshot can be used for data migration. In fact, these tools can be roughly divided into two categories:

If it is the second scenario, the old cluster cannot stop writing during the data migration process, and the data consistency needs to be solved according to the actual business scenario. Question:

The following describes the usage of several tools for data migration when the old cluster can stop writing.

elasticsearch-dump is an open source ES data migration tool, github address: /taskrabbit/elasticsearch-dump

The following operations use the elasticdump command to migrate the companydatabase index in cluster x.x.x.1 to cluster x.x.x.2. Note that the first command first migrates the settings of the index. If you directly migrate the mapping or data, you will lose the configuration information of the index in the original cluster, such as the number of shards and the number of copies. Of course, you can also create the index directly in the target cluster. Then synchronize mapping and data

Logstash supports reading data from one ES cluster and then writing it to another ES cluster, so logstash can be used for data migration. The specific configuration file is as follows:

The above configuration file synchronizes all indexes of the source ES cluster to the target cluster. Of course, you can set up to synchronize only specified indexes. For more functions of logstash, please refer to logstash official documentation logstash official documentation.

reindex It is an API interface provided by Elasticsearch, which can migrate data from one cluster to another.

The snapshot api is a set of api interfaces used by Elasticsearch to back up and restore data. Cross-cluster data migration can be performed through the snapshot api. The principle is to create a data snapshot from the source ES cluster, and then migrate it to the target ES. Recovery in the cluster. You need to pay attention to the ES version issue:

If the old cluster cannot stop writing, and online data migration is performed at this time, the data consistency of the old and new clusters needs to be ensured. At present, it seems that except for the officially provided CCR function, there is no mature online data migration method that can strictly ensure data consistency. At this time, you can start from the business scenario and choose an appropriate data migration solution based on the characteristics of the business data written.

Generally speaking, the characteristics of business written data are as follows:

Let’s analyze in detail how to choose the appropriate data migration method under the different characteristics of written data. .

In log or APM scenarios, the data is time series data, and generally indexes are created on a daily basis. The data of that day will only be written to the current index. At this time, you can first synchronize the existing index data that is no longer written to the new cluster at one time, and then use logstash or other tools to incrementally synchronize the index for that day. After the data is equalized, switch the business access to ES to in the new cluster.

The specific implementation plan is:

The add only data writing method can be sorted according to the order of data writing (sorted according to _doc, if there is a timestamp field, it can also be sorted according to Timestamp sorting) pulls data in batches from the old cluster, and then writes them in batches to the new cluster; you can write a program and use the scroll api or search_after parameter to pull incremental data in batches, and then use the bulk api to write them in batches.

Use scroll to pull incremental data:

The above operation can be performed once every minute, pulling up the newly generated data in the previous minute, so the data is synchronized between the old cluster and the new cluster. The delay is one minute.

Use search_after to pull incremental data in batches:

The above operations can be executed at customized event intervals as needed. Modify the value of the search_after parameter each time it is executed to obtain the multiple data after the specified value. piece of data; search_after is actually equivalent to a cursor, which advances every time it is executed to obtain the latest data.

The difference between using scroll and search_after is:

In addition, if you do not want to migrate the incremental data of the old cluster to the new cluster by writing a program, you can use logstash combined with scroll to perform incremental data For migration, the configuration files you can refer to are as follows:

During use, you can adjust the scheduled task parameters schedule and scroll-related parameters according to actual business needs.

If the business scenario involves both appending and updating existing data when writing to ES, what is more important at this time is how to solve the data synchronization problem of the update operation. For new data, you can use the incremental migration hot index method described above to synchronize it to the new cluster. For updated data, if the index has a field similar to updateTime to mark the time of data update, you can write a program or logstash and use the scroll api to pull the updated incremental data in batches based on the updateTime field, and then write it to a new cluster.

The logstash configuration files that can be referenced are as follows:

In various practical applications, the synchronization of newly added data and updated data can be performed at the same time. However, if there is no field like updateTime in the index that can identify which data has been updated, there currently seems to be no better synchronization method. CCR can be used to ensure data consistency between the old cluster and the new cluster.

If the business writes to ES both new (add) data and updated (update) and deleted (delete) data, you can use the CCR function in the commercial version of X-pack plug-in after 6.5. Data migration. However, there are some restrictions on using CCR, which must be paid attention to:

The specific usage method is as follows:

If the business is to write data to ES through middleware such as kafka, you can use the following In the way shown in the figure, logstash is used to consume Kafka data into the new cluster. After the data of the old cluster and the new cluster are completely equal, you can switch to the new cluster for business query, and then take the old cluster offline for processing.

The advantages of using middleware for synchronous double writing are:

Of course, double writing can also be solved in other ways, such as building a self-built proxy and writing to the proxy when writing business , the proxy forwards the request to one or more clusters, but this method has the following problems:

As the business scale grows, the business side is concerned about the data reliability and cluster stability of the ES cluster used. The requirements for such aspects are getting higher and higher, so a better cluster disaster recovery solution is needed to support the needs of the business side.

If the company builds its own ES cluster through physical machines in its own IDC computer room, when solving cross-computer room disaster recovery, it often deploys two ES clusters in two computer rooms, one master and one master. Prepare, and then solve the problem of data synchronization; there are generally two ways of data synchronization. One way is double writing, which is implemented by the business side to ensure data consistency. However, double writing is a challenge for the business side, and it is necessary to ensure that the data is stored on both sides. Success can be considered only if all clusters are written successfully. Another method is asynchronous replication. The business side only writes to the main cluster, and then synchronizes the data to the backup cluster in the background. However, it is more difficult to ensure data consistency. The third method is to connect the two computer rooms through a dedicated line to achieve cross-computer room deployment, but the cost is higher.

Due to the complexity of data synchronization, when cloud vendors implement cross-computer room disaster recovery for ES clusters, they often deploy only one cluster and use ES's own capabilities to synchronize data.

The first feature of a foreign cloud vendor's cross-computer room deployment of ES clusters is that it does not force the use of dedicated master nodes. As shown in the figure above, a cluster has only two nodes, serving as both data nodes and candidate master nodes; primary shards and replica shards Distributed in two availability zones, because of the existence of replica shards, the cluster is still available after availability zone 1 fails. However, if the network between the two availability zones is interrupted, a split-brain problem will occur. As shown in the figure below, using three dedicated master nodes will eliminate the split-brain problem.

But what if a region does not have three availability zones? Then two dedicated master nodes can only be placed in one of the availability zones, such as the solution of a domestic cloud vendor:

However, there are still problems in the process of rebuilding nodes. As shown in the figure above, the quorum of the cluster itself should be 2. After availability zone 1 hangs up, there is only one dedicated master node left in the cluster. You need to change the quorum parameter (discovery.zen .minimum_master_nodes) is adjusted to 1 before the cluster can normally select the master. After the two dedicated master nodes that died are restored, the quorum parameter (discovery.zen.minimum_master_nodes) needs to be adjusted to 2 to avoid the occurrence of split-brain.

Of course, there are still solutions that can avoid the two possible problems of being unable to choose the master and split-brain, as shown below, the solution idea of ??a domestic cloud vendor:

Create When using a dual-availability zone cluster, you must select 3 or 5 dedicated master nodes. The backend will deploy only dedicated master nodes in a hidden availability zone. Advantage 1 of the solution is that if one availability zone fails, the cluster can still select the master node normally. , avoiding the situation where the master cannot be selected because the quorum legal votes are not met; 2, because three or five dedicated master nodes must be selected, which also avoids brain splitting.

I want to compare the method of using two clusters, one active and one standby, for cross-computer room disaster recovery. Cloud vendors have solved the originally complicated primary and standby data synchronization problem by deploying clusters across computer rooms. However, it is more worrying. What is more important is whether the network delay between computer rooms or availability zones will cause cluster performance to decrease. Here, for Tencent Cloud's dual-availability zone cluster, standard benchmark tools were used to perform a stress test on two single-availability zone and dual-availability zone clusters of the same specifications. The stress test results are shown in the following figure:

From Judging from the query latency and write latency indicators of the stress test results, there is no obvious difference between the two types of clusters. This is mainly due to the improvement of the underlying network infrastructure on the cloud, and the network latency between availability zones is very low. .

Similar to cross-computer room disaster recovery in the same city, the general solution for remote disaster recovery is to deploy two clusters, one master and one backup, in two remote computer rooms. When writing business, only the primary cluster is written, and then the data is asynchronously synchronized to the backup cluster. However, the implementation will be more complicated, because the problem of data consistency between the primary and backup clusters needs to be solved, and if it crosses regions, the network delay will be relatively high; Also, when the primary cluster fails and the backup cluster is switched to the backup cluster, the data on both sides may not be equal yet, resulting in inconsistencies, resulting in business damage. Of course, double writing can be achieved with the help of middleware such as kafka, but as data links increase, write delays also increase, and if there is a problem with kafka, the failure may be catastrophic.

A more common asynchronous replication method is to use the snapshot backup function to perform a backup in the main cluster regularly, such as every hour, and then restore it in the backup cluster, but there will be one in the main and backup clusters. Hours of data latency. Taking Tencent Cloud as an example, Tencent Cloud's ES cluster supports data backup to object storage COS, because it can be used to achieve data synchronization between the active and standby clusters. For specific operation steps, please refer to /document/product/845/19549.

After the official launch of the CCR function in version 6.5, the problem of data synchronization between clusters has been solved.

CCR can be used to implement off-site disaster recovery for ES clusters:

CCR is similar to a data subscription method. The main cluster is the Leader, the backup cluster is the Follower, and the backup cluster pulls data from the main cluster in a pull manner. and write requests; when the Follwer Index is defined, the Follwer Index will be initialized and all underlying segment files will be synchronized from the Leader in the form of a snapshot. After the initialization is completed, the write request will be pulled. After the write request is pulled, Follwer side performs replay to complete data synchronization. The advantage of CCR is of course that it can synchronize UPDATE/DELETE operations, the data consistency problem is solved, and the synchronization delay is also reduced.

In addition, based on CCR, it can be combined with the cross-machine room disaster recovery cluster mentioned above to realize a multi-center ES cluster in two places. In the Shanghai region, a multi-availability zone cluster is deployed to achieve high availability across computer rooms. At the same time, a standby cluster is deployed in the Beijing region as a Follower to synchronize data using CCR, thereby taking another step forward in cluster availability and achieving cross-computer room capacity in the same city. disaster, and achieved cross-regional disaster recovery.

However, when a failure occurs and the cluster access needs to be switched from Shanghai to Beijing, there will be some restrictions, because the Follwer Index in CCR is read-only and cannot be written, and needs to be switched to a normal index. can be written, and the process is irreversible. However, avoidance should be done on the business side, such as using a new normal index when writing, and the business using aliases for query. When the Shanghai region is restored, the data will be synchronized back in reverse.

The problem now is to ensure the integrity of the cluster data in the Shanghai region. After the Shanghai region is restored, you can create a new Follower Index in the Shanghai region, and use the index being written in the Beijing region to synchronize the data with the Leader. After the data is complete After tying the level, switch to the Shanghai region for reading and writing. Note that you need to create a new Leader index to write data when switching.

The data synchronization process is as follows:

1. The Shanghai main cluster provides services normally, and the Beijing backup cluster follows data from the main cluster

2. The Shanghai main cluster In the event of a failure, the business is switched to the Beijing backup cluster for reading and writing. After the Shanghai main cluster is restored, the data is followed from the Beijing cluster