Add etcd nodes

A healthy etcd cluster must have an odd number of nodes to provide consensus. etcd can be run with a single node, but fault tolerance requires at least three nodes.

Caution

Adding an etcd node is a delicate operation; an error during the process may render the cluster unusable. If possible, perform this operation during periods of Deephaven system downtime (e.g., at night or after trading hours for a system supporting trading operations) to mitigate the risk.

Prerequisites

New servers or VMs to run the new etcd processes must already be in place and running. They must be in a state that is ready to install Deephaven.

The files, directories, and credentials as specified in general installation prerequisites are also required.

A Deephaven system backup or an etcd backup is strongly recommended.

Important

Since the configuration of etcd on new nodes requires root privileges, the steps in this procedure assume the user performing most of the actions is running as root (sudo su). If you are not running as root, you will get access denied errors when attempting to run some of these commands. If you have sufficient sudo rights, running the commands with sudo -u $DH_ADMIN_USER or just sudo may be sufficient, but an admin with root access will likely need to perform some of the steps.

Note

Some commands below reference a DH_ADMIN_USER environment variable. This variable is configured by sourcing dh_users:

source /usr/illumon/latest/bin/dh_users

This should be done as a prerequisite whenever starting operations in this section or changing user contexts (e.g., after running sudo su).

Procedure to add etcd nodes

The process documented here uses the Deephaven installer to update the Deephaven configuration and deploy Deephaven binaries to the new nodes. When increasing the number of etcd nodes, the initial execution of the upgrade will fail because the installer does not have the logic to make these changes. This is expected.

The overall process is:

  1. "Upgrade" the cluster to add the new Deephaven/etcd nodes.
  2. After the upgrade fails, manually reconfigure the etcd cluster to add the new etcd nodes.
  3. Once the etcd cluster is healthy, re-run the installer to complete the install and ensure a consistent state for the cluster.

The procedure that follows details how to add two nodes to a cluster. Since an etcd cluster always needs an odd number of nodes, nodes will almost always be added in pairs.

1. Edit the cluster.cnf

Start with the cluster.cnf that was used the previous time the installer was run; if you don't already have this file, you can find it on one of the existing nodes: /etc/sysconfig/deephaven/cluster.cnf.

Remove the lower part of the file generated by the installer jar. This typically starts with DH_NODE_COUNT=.

Add DH_NODE_n entries for the new servers. Set properties like DH_NODE_n_ROLE_QUERY and DH_NODE_n_ROLE_MERGE as desired for your environment, but ensure each new node has DH_NODE_n_ROLE_ETCD=true.

The relevant section of the cluster.cnf may look like this:

#Infra
DH_NODE_1_ROLE_INFRA=true
DH_NODE_1_ROLE_ETCD=true
DH_NODE_1_ROLE_QUERY=false

#Query1
DH_NODE_2_ROLE_ETCD=true
DH_NODE_2_ROLE_QUERY=true

#Query2
DH_NODE_3_ROLE_ETCD=true
DH_NODE_3_ROLE_QUERY=true

2. Run the upgrade

Place the edited cluster.cnf in the installer directory set up in the prerequisites (general installation prerequisites).

Run the installer jar: java -jar ./Installer-<version-number>.jar.

Run the master installer: ./master_install.sh.

This should eventually fail, but not until it has copied Deephaven binaries to the new servers. The failure should have errors related to etcd_configure.sh.setup_check. If other (non-etcd related) errors have occurred, they must be resolved before continuing this process. See Installation troubleshooting.

3. Manually create the new etcd configuration

::::important You should first sudo to root (sudo su) on each node for this and the following steps. Then, run source /usr/illumon/latest/bin/dh_users to set needed environment variables. ::::

Get the cluster ID by running:

basename $(readlink /etc/etcd/dh/latest)

This should be an 8-character alphanumeric string. The cluster ID will be needed in later steps.

[irisadmin@dh-infra-1 ~]$ basename $(readlink /etc/etcd/dh/latest)
c93d43794

To re-generate a configuration file for the Deephaven etcd cluster with the new nodes added, run:

config_generator.sh

Provide the IP addresses of all existing nodes and of the new nodes being added. This will create a new /etc/sysconfig/deephaven/etcd/dh_etcd_config.tgz etcd configuration package file. It is recommended to first back up the old one in case it is needed to revert the add nodes operation.

`/usr/illumon/latest/install/etcd/config_generator.sh --self-signed --servers <IP address 1> <IP address 2> <IP address 3> --cluster-id "<cluster_ID>" --root-user-dhadmin`

Create a temporary directory in which to make changes to the new configuration archive. Copy the new .tgz to that temporary directory and unpack it.

mkdir /tmp/etcdconfig
cp /etc/sysconfig/deephaven/etcd/dh_etcd_config.tgz /tmp/etcdconfig
cd /tmp/etcdconfig
tar xf dh_etcd_config.tgz

The configuration archive contains etcd service configuration that will go under /etc/etcd/dh/, and etcd client configuration (accounts and credentials) that will go under /etc/sysconfig/deephaven/etcd/client/. Since this configuration is being created for a cluster that already has data and credentials set up, replace the newly generated passwords in the archive with the previously existing ones from the cluster. The command line below automates this process:

for d in /etc/sysconfig/deephaven/etcd/client/*/ ; do \
    user=$(basename "$d"); \
    command cp -vf "${d}password" "dh-etcd/client/$user/"; \
done

Then re-tar the configuration archive, replacing the previous temp file:

tar -czf ./dh_etcd_config.tgz dh-etcd/

4. Copy and deploy the configuration

Copy the new dh_etcd_config.tgz file to each etcd node server in the Deephaven cluster. scp is typically used. Place the file somewhere convenient for the root user to access, like /tmp.

From the directory where the dh_etcd_config.tgz file has been placed, unpack each node's configuration (including the nodes that already exist):

/usr/illumon/latest/install/config_packager.sh etcd unpackage-server <node number>
/usr/illumon/latest/install/config_packager.sh etcd unpackage-client

For the above commands, the node number is the index of the IP address as it was entered when calling config_generator.sh in step 3. The server whose IP address is first is node number 1, and so on.

5. Configure the new nodes

The new nodes must be configured one at a time.

6. Add the first new node

On all existing nodes, and the first new node to be configured, edit the etcd config.yaml to list only the existing nodes and the one new node in the initial-cluster property value. The IP addresses and token in your file should not be changed; the only changes should be to remove the endpoint entries for any nodes not yet being configured, and, on the new node, to change the initial-cluster-state value to existing.

For example, on a new node being added to a cluster that initially had only one etcd node, this section of /etc/etcd/dh/latest/config.yaml looks like this:

...
initial-cluster-state: existing
initial-cluster-token: c2ca0c898
initial-cluster: etcd-1=https://10.128.1.149:2380,etcd-2=https://10.128.1.150:2380
...

Restart etcd on existing nodes using:

systemctl restart dh-etcd

Add the first new node as a learner. A learner will pull Raft database state information from the existing nodes in the cluster. Once it is in sync with the cluster, the learner will be promoted to regular member status. When adding the new node, set the ETCDCTL_ENDPOINTS environment variable in-line with the call to etcdctl.sh to override the endpoints from the etcd config.yaml, since not all of the endpoints listed there will currently be accessible. In the example below, there is one active node in the cluster (10.128.0.149), and 10.128.1.150 (a.k.a. etcd-2) is the new node being added. Use the IP address of one of your already existing nodes for ETCDCTL_ENDPOINTS and the IP address of your new node, with the appropriate etcd-n name, for the add arguments in your environment.

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://10.128.1.149:2379 /usr/illumon/latest/bin/etcdctl.sh member add etcd-2 --peer-urls=https://10.128.1.150:2380 --learner

Check that the new node has been correctly added by running sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://<IP Address of first existing node>:2379 /usr/illumon/latest/bin/etcdctl.sh member list -w table:

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://10.128.1.149:2379 /usr/illumon/latest/bin/etcdctl.sh member list -w table
+------------------+-----------+--------+---------------------------+---------------------------+------------+
|        ID        |  STATUS   |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+-----------+--------+---------------------------+---------------------------+------------+
|  ffa7e976c04084c | unstarted |        |                           | https://10.128.1.150:2379 |       true |
| 774b09d3a6ab0f18 |   started | etcd-1 | https://10.128.1.149:2380 | https://10.128.1.149:2379 |      false |
+------------------+-----------+--------+---------------------------+---------------------------+------------+

This should show the new node as unstarted and with IS LEARNER set to true.

Start the new node by running on the new node:

/usr/illumon/latest/install/etcd/enable_dh_etcd_systemd.sh

To check that the new node is started, use:

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://<IP Address of first existing node>:2379 /usr/illumon/latest/bin/etcdctl.sh member list -w table

The progress of the synchronization process can be viewed on the preexisting node(s) with:

sudo -u $DH_ADMIN_USER /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table

This should return a table like the following:

+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.128.1.149:2379 | 774b09d3a6ab0f18 |  3.5.12 |  2.6 MB |      true |      false |        19 |       3000 |               3000 |        |
| https://10.128.1.150:2379 |  ffa7e976c04084c |  3.5.12 |  2.5 MB |     false |       true |        19 |       2999 |               2999 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Note

endpoint status may throw an error about rpc being unavailable for learners. In these cases, it will still normally return the status for the learner node, but may omit other nodes. To check the status of omitted nodes, use:

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://<IP address of node to check>:2379 /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table

Once the new node's RAFT INDEX is within one step of the indexes of other active nodes, it can be promoted. To promote a new node, execute the etcdctl.sh member promote command against a preexisting node from the cluster.

For the example nodes above, this would be done with:

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://10.128.1.149:2379 /usr/illumon/latest/bin/etcdctl.sh member promote ffa7e976c04084c

Final verification with sudo -u $DH_ADMIN_USER /usr/illumon/latest/bin/etcdctl.sh member list -w table should now show the new node is no longer a learner.

+------------------+---------+--------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+---------------------------+---------------------------+------------+
|  ffa7e976c04084c | started | etcd-2 | https://10.128.1.150:2380 | https://10.128.1.150:2379 |      false |
| 774b09d3a6ab0f18 | started | etcd-1 | https://10.128.1.149:2380 | https://10.128.1.149:2379 |      false |
+------------------+---------+--------+---------------------------+---------------------------+------------+

Finally, edit the new node's /etc/etcd/dh/latest/config.yaml to change initial-cluster-state back to new. After saving the changes, restart etcd using systemctl restart etcd.

7. Add the second new node

On all existing nodes, and the second new node to be configured, edit the etcd config.yaml to list all the existing nodes and the new nodes in the initial-cluster property value. This should already be the case for the config file on the new node, but the previous nodes may still have shorter lists from step 6, when the first new node was being added. The IP addresses and token in your file should not be changed; the only changes should be to add the endpoint entry for the second new node, if needed, and, on the second new node, to change the initial-cluster-state value to existing.

For example, on a second new node being added to a cluster that initially had only one etcd node, this section of /etc/etcd/dh/latest/config.yaml looks like this:

...
initial-cluster-state: existing
initial-cluster-token: c2ca0c898
initial-cluster: etcd-1=https://10.128.1.149:2380,etcd-2=https://10.128.1.150:2380,etcd-3=https://10.128.1.151:2380
...

Restart etcd on existing nodes using:

systemctl restart dh-etcd

Add the second new node as a learner. A learner will pull Raft database state information from the existing nodes in the cluster. Once it is in sync with the cluster, the learner will then be promoted to regular member status. When adding the new node, set the ETCDCTL_ENDPOINTS environment variable in-line with the call to etcdctl.sh to override the endpoints from the etcd config.yaml, since not all of the endpoints listed there will currently be accessible. In the example below, there are two active nodes in the cluster (10.128.0.149 and 10.128.1.150). 10.128.1.151 (a.k.a. etcd-3) is the new node being added. Use the IP address of your already existing nodes for ETCDCTL_ENDPOINTS and the IP address of your new node, with the appropriate etcd-n name, for the add arguments in your environment.

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://10.128.1.149:2379,https://10.128.1.150:2379 /usr/illumon/latest/bin/etcdctl.sh member add etcd-3 --peer-urls=https://10.128.1.151:2380 --learner

Check that the new node has been correctly added by running sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://<IP Address of first existing node>:2379 /usr/illumon/latest/bin/etcdctl.sh member list -w table:

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://10.128.1.149:2379 /usr/illumon/latest/bin/etcdctl.sh member list -w table
+------------------+-----------+--------+---------------------------+---------------------------+------------+
|        ID        |  STATUS   |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+-----------+--------+---------------------------+---------------------------+------------+
| 8b39a0235f6e1417 | unstarted |        |                           | https://10.128.1.151:2379 |       true |
|  ffa7e976c04084c |   started | etcd-1 | https://10.128.1.150:2380 | https://10.128.1.150:2379 |      false |
| 774b09d3a6ab0f18 |   started | etcd-1 | https://10.128.1.149:2380 | https://10.128.1.149:2379 |      false |
+------------------+-----------+--------+---------------------------+---------------------------+------------+

This should show the new node as unstarted and IS LEARNER set to true.

Start the new node by running:

/usr/illumon/latest/install/etcd/enable_dh_etcd_systemd.sh

Checking sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://<IP Address of first existing node>:2379 /usr/illumon/latest/bin/etcdctl.sh member list -w table should now show the new node is started.

The progress of the synchronization process can be viewed on the preexisting node(s) with:

sudo -u $DH_ADMIN_USER /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table

This should return a table like this:

+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.128.1.149:2379 | 774b09d3a6ab0f18 |  3.5.12 |  2.6 MB |      true |      false |        19 |       3000 |               3000 |        |
| https://10.128.1.150:2379 |  ffa7e976c04084c |  3.5.12 |  2.5 MB |     false |      false |        19 |       2999 |               2999 |        |
| https://10.128.1.151:2379 | 8b39a0235f6e1417 |  3.5.12 |  2.5 MB |     false |       true |        19 |       2998 |               2998 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Note

endpoint status may throw an error about rpc being unavailable for learners. In these cases, it will still normally return the status for the learner node, but may omit other nodes. To check the status of omitted nodes, use: sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://<IP address of node to check>:2379 /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table.

Once the new node's RAFT INDEX is within one step of the indexes of other active nodes, it can be promoted. To promote a new node, execute the etcdctl.sh member promote command against a preexisting node from the cluster.

For the example nodes above, this would be done with:

sudo -u $DH_ADMIN_USER ETCDCTL_ENDPOINTS=https://10.128.1.149:2379,https://10.128.1.150:2379 /usr/illumon/latest/bin/etcdctl.sh member promote 8b39a0235f6e1417

Final verification with sudo -u $DH_ADMIN_USER /usr/illumon/latest/bin/etcdctl.sh member list -w table should now show the new node is no longer a learner.

+------------------+---------+--------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+---------------------------+---------------------------+------------+
|  ffa7e976c04084c | started | etcd-2 | https://10.128.1.150:2380 | https://10.128.1.150:2379 |      false |
| 774b09d3a6ab0f18 | started | etcd-1 | https://10.128.1.149:2380 | https://10.128.1.149:2379 |      false |
| 8b39a0235f6e1417 | started | etcd-3 | https://10.128.1.151:2380 | https://10.128.1.151:2379 |      false |
+------------------+---------+--------+---------------------------+---------------------------+------------+

As a final step for the new node, edit its /etc/etcd/dh/latest/config.yaml to change initial-cluster-state back to new. After saving the changes, restart etcd using systemctl restart etcd.

Checking the status of the cluster should now return a complete table showing all nodes with similar RAFT INDEX values and no errors or warnings.

sudo -u $DH_ADMIN_USER /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.128.1.149:2379 | 774b09d3a6ab0f18 |  3.5.12 |  2.6 MB |     false |      false |        20 |       3018 |               3018 |        |
| https://10.128.1.150:2379 |  ffa7e976c04084c |  3.5.12 |  2.5 MB |      true |      false |        20 |       3019 |               3019 |        |
| https://10.128.1.151:2379 | 8b39a0235f6e1417 |  3.5.12 |  2.5 MB |     false |      false |        20 |       3020 |               3020 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

8. Finish the installation process

Return to the installer directory set up and used in steps 1 and 2.

Run ./master_install.sh to restart the installation. This should now run to completion, performing the remaining post-etcd-configuration installation steps.

Troubleshooting

etcd fails to start when starting a learner node

If /usr/illumon/latest/install/etcd/enable_dh_etcd_systemd.sh fails to start a new learner node, two common causes are that the process was previously started, without correct configuration in place, or the configuration for the new node does not have initial-cluster-state: existing.

In the case where initial-cluster-state is set to new, an attempt to start etcd manually with systemctl start dh-etcd will usually hang. After killing it with Ctrl-C, you can check logs with journalctl -xe. In this situation, this log will generally have a large number of rows about cluster ID mismatch.

Whether or not the initial-cluster-state needs to be corrected for the new node, it is also possible that etcd was started on the new node with otherwise incorrect configuration that caused data to be written to the data directory (/var/lib/etcd/dh/<cluster ID>), and etcd now cannot start because of incorrect or incorrectly owned files in that path.

If /var/lib/etcd/dh/<cluster ID> is not empty, either rename or delete its contents (for instance rm -rf /var/lib/etcd/dh/<cluster ID>/*). Also verify that the ownership of /var/lib/etcd/dh/<cluster ID> and /var/lib/etcd/dh is etcd:$DH_ADMIN_USER.

ls -l /var/lib/etcd/dh/

drwxr-xr-x. 3 etcd irisadmin 20 May  6 12:58 c2ca0c898

Once the above are correct, you can retry starting etcd with systemctl start dh-etcd.

Problems running etcdctl.sh

Verify that echo $DH_ADMIN_USER returns a user name. If not, run source /usr/illumon/latest/bin/dh_users and retry the etcdctl.sh command.

Verify that the full path to etcdctl.sh is being used: /usr/illumon/latest/bin/etcdctl.sh