Once upon a time our DBA team had a task. We had to move a ZooKeeper ensemble which we had been using for Clickhouse cluster. Everyone is used to moving an ensemble by moving its data files. It seems easy and obvious but our Clickhouse cluster had more than 400 TB replicated data. All replication information had been collected in ZooKeeper cluster from the very beginning. At the end of the day we couldn’t miss even a row of data. Then we looked for information on the internet. Unfortunately there was a good tutorial about 3.4.5 and didn’t fit our version 3.6.2. So we decided to use “the extending” for moving our ensemble.
Work Plan
Here we have a ZooKeeper ensemble consisting of 3 instances running on 3 independent servers. Also we had given 3 new servers where we had to move the ensemble in. Additionally we found another 1 temporary server for our quorum. Why? Because a Clickhouse cluster with ReplicatedMergeTree tables provides writing only with a ZooKeeper ensemble quorum else we can only have a reading. For better understanding here is the scheme
3 (id: 1, 2, 3) +1 (id: 4) + 3 (id: 5, 6, 7)
The genuine ensemble is located in 3 servers (id: 1, 2, 3). We added 1 temporary server (id: 4) and new servers (id: 5, 6, 7). When the ensemble is synchronised we are ready to remove unnecessary instances (id: 1, 2, 3, 4). Finally we had an ensemble with quorum which is running on new servers (id: 5, 6, 7)
Extending the genuine ensemble
In the official documentation you can find information about an ensemble extending. However it’s not a tutorial.
The genuine ensemble runs on 3 independent servers with CentOS 7
server.1=zk-1:2888:3888
server.2=zk-2:2888:3888
server.3=zk-3:2888:3888
Check your configuration file and if reconfigEnabled=true
is absent you must add it. Also you will face environment issues leading to dynamic reconfiguration problems. The solution is to add
Dzookeeper.skipACL=yes
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dzookeeper.skipACL=yes"
inside the $ZK_HOME/bin/zkEnv.sh file. Now you are ready to restart the ensemble. Make sure you stop a Clickhouse cluster first.
It’s time to prepare new servers:
Upload the apache-zookeeper tarball to new servers (in our case apache-zookeeper-3.6.2-bin.tar.gz);
Create OS user for zookeeper and extract tarball in home directory of zookeeper;
Create a directory for zookeeper data files and a myid file which is an indicator for instance. For example we already have 3 instances in the genuine ensemble (server.1=zk-1:2888:3888 server.2=zk-2:2888:3888 server.3=zk-3:2888:3888.) myid files of new instances will contain numbers 4, 5, 6, 7;
Add rules in firewall (ports: 2888, 3888, 2181, 7000);
Create a service file
When everything is ready run $ZK_HOME/bin/zkCli.sh and enter
reconfig -add server.4=zk-4:2888:3888:participant;2181
If you find zoo.conf.dynamic.100000000 in $ZK_HOME/conf directory on all 3 servers you will be on the right path. Then you can start zookeeper server on the 4th server and zoo.conf.dynamic.100000000 must appear in $ZK_HOME/conf as well. After each other reconfig the number after zoo.conf.dynamic (100000000) will change. Repeat this step for instances 5, 6, 7. At the end the ensemble must contain all 7 instances synchronised.
Removing unnecessary instances from the ensemble
Removing can be done with the well known $ZK_HOME/bin/zkCli.sh. Just execute
reconfig -remove 1
reconfig -remove 2
reconfig -remove 3
reconfig -remove 4
After removing stop/disable zookeeper services on servers 1, 2, 3, 4 and check $ZK_HOME/ /conf/zoo.conf.dynamic.XXXXXXX. They must be the same.
Summary
At the end of the day we moved the ensemble from servers zk-1, zk-2, zk-3 to zk-5, zk-6, zk-7. Additionally we recommend to add
metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
in zoo.conf and open the 7000 port of the firewall. This is a metrics exporter for Prometheus.