Chapter 19: Redundancy Setup

The InCenter system can be configured to provide a redundant high availability (HA) cluster so that if one InCenter server running on one computer fails, an automatic failover will occur, allowing processing to continue on another computer with an identical InCenter configuration.

[Note] Note: This chapter covers clusters using InCenter 1.64 or later

This chapter describes setting up redundancy clusters where the HA nodes run InCenter version 1.64.00 or later. HA clusters created prior to 1.64 are not compatible but can be converted to this newer cluster type. Doing this is described at the end of this chapter.

HA cluster setup requires three separate instances of InCenter. Each instance acts as a high availability node (HA node) in an active-passive arrangement. The databases on these three HA nodes are configured to be synchronized. Located in front of the three HA nodes is a third-party load balancing proxy. The proxy polls the management IP of each HA node and sends traffic only to the active node.

If the active HA node fails, one of the other two HA nodes will automatically assume the active role. If two HA nodes were to fail, the remaining HA node will not accept traffic and the entire HA cluster will no longer function. This HA cluster arrangement is illustrated below.

A Redundant Cluster Configuration

Figure 19.1. A Redundant Cluster Configuration

Various proxy software products are available to perform the load balancing task and setting one up will not be described in this document. Only the base InCenter cluster setup will be covered later. The list of ports that must be forwarded by the balancer is also listed at the end of this section.

The rc Command

The main tool for setting up redundancy is the HA cluster configuration command rc in the host Linux system CLI. This is already included as part of the InCenter installation.

[Note] Note: Root privileges are required

When setting up an HA cluster using the rc command, root privileges are required.

The rc command can take the following options:

The parameter -help can be used to display individual help text for a particular option. For example:

$ rc gen-certs -help

HA Cluster Setup Steps

The following steps should be followed to set up a high availability cluster:

  1. Have three InCenter instances ready to be joined together as HA nodes in a cluster. In this example, they will have the DNS resolvable host names hostname1, hostname2 and hostname3. All three should have the same InCenter version installed (1.64.00 or later) and they should be running on separate hardware platforms. The InCenter instances need not be newly installed but note that the current database image of the last HA node added to the cluster (in this example, hostname3) will overwrite the databases of the other two.

    The three instances will need to communicate with each other so the computers might be on the same network although it may be better to have at least one in a different physical location for maximum tolerance to local risks such as power supply failure.

  2. Log into the Linux CLI of the first InCenter instance (assume this is hostname1) and enter the following command:

    $ rc gen-certs -m hostname1 hostname2 hostname3

    This will generate three cluster setup files for the three instances with filenames of the form hostname1.tar.gz, hostname2.tar.gz and hostname3.tar.gz.

    As mentioned previously, hostname1, hostname2 and hostname3 are the DNS resolvable hostnames of the InCenter instances. For example, server1.example.com. The actual IP addresses cannot be used instead.

    It is important to note that a copy of the three files generated should be archived on disk storage outside of the cluster. This is because any of these files may be needed later when replacing a failed HA node with a new InCenter instance.

  3. Use SCP to upload two of the files to their respective instances. For example:

    $ scp -P 2222 hostname2.tar.gz administrator@hostname2:.
    $ scp -P 2222 hostname3.tar.gz administrator@hostname3:.
  4. On the first instance, make it become an HA node with the following command:

    $ rc initiate -c hostname1.tar.gz

    Repeat this on the second instance:

    $ rc initiate -c hostname2.tar.gz

    Repeat this again on the third instance, but this time adding the -f option to finalize the cluster setup and also to make this instance the HA node that is initially active:

    $ rc initiate -c hostname3.tar.gz -f

    Note that the database image of this final instance will automatically overwrite the databases in the other two instances.

  5. The HA cluster should now be operational with three HA nodes. The final step is to start the load balancer located in front of the three instances. The load balancer should be configured to send the following HTTP message to each instance to determine if it is the active HA node:

    	https://<ha-node-IP>:<rest-port>/rc/isPrimary

    Only the active HA node will send back a "200 OK" reply. This reply indicates that this is the HA node that should receive traffic from the load balancer. The <rest-port> value is always 8443.

Setting the Same Host SSH Keys On All HA Nodes

By default, all the different HA nodes in a cluster will use different host SSH keys (the SSH keys in the underlying Linux system used for management access). To avoid warning messages from SSH clients used for management, it can be convenient that all the HA nodes use the same host SSH keys. These keys are not synced in a cluster so to be all the same they need to be changed manually on each HA node by using SCP to upload new key files to the host Linux system but with the same file names.

Change the key files is described in more detail at the end of Section 3.1, SSH Access to the CLI. The recommended series of steps with a cluster is to first download the key files from one particular HA node using SCP and then upload these same files to the other HA nodes, also using SCP. The HA nodes that receive the uploads should then be restarted.

Dealing with HA Node Failure

Should one of the HA nodes fail then the remaining two HA nodes will continue to function as an HA cluster. If the failed HA node was active, the load balancer will automatically send traffic to the next functioning node. However, the failed HA node should be brought back online as soon as possible. This is because of the following risks that lead to cluster failure:

When the failed HA node has been replaced with a functioning InCenter instance, the following steps are used to include the new instance into the cluster. This can be done without needing to stop traffic flowing to the other functioning nodes. In this example, it will be assumed that hostname2 has failed:

  1. Using SCP, upload the original setup file for the HA node to the new InCenter instance:

    $ scp -P 2222 hostname2.tar.gz administrator@hostname2:.

    This is the reason that a copy of the three setup files should have been archived on disk storage outside of the HA nodes, as described earlier.

  2. Log into the new instance and bring the HA node online.

    $ rc initiate -c hostname2.tar.gz

The database of the other HA nodes will now be automatically copied into the new HA node and the full three node HA cluster will have been restored.

If copies of the original setup files are not available then the procedure described previously to set up a new cluster with all three instances should be used instead. In this case, traffic processing will have to stop until the setup is complete and the last HA node initialized should not be the HA node that was replaced.

Removing Nodes from a Cluster

It is possible to remove an HA node from the cluster by logging into Linux on the instance and using the following command:

$ rc disable

Following this command, the HA node will now act as a standalone instance and will no longer respond to the polling sent by a load balancing proxy. It is completely removed from the cluster.

Although a cluster should optimally have three working nodes, removing one HA node might be needed in order to replace it. The replacement procedure after manual removal would be the same as that described previously for replacing a failed HA node.

If an HA node is disabled manually and it was the active HA node, the remaining two HA nodes will decide between themselves which node will now become the active one. Using the disable option on all three HA nodes will dismantle the cluster completely.

Creating and Restoring a System Backup

It should not matter which of the three InCenter instances a system backup is taken from. The database should be synchronized across all of the working HA nodes in a cluster. The steps to take a backup are the following:

  1. Disable one of the HA nodes so it is removed from the cluster (preferably not the active node). The following command is used to do this on the node:

    $ rc disable
  2. Take a backup from the HA node. Doing this is described in Section 17.8, System Backup and Restore.
  3. Add the HA node back into the cluster. If the backup was taken from the hostname3 HA node then this would be added back with the following command:

    $ rc initiate -c hostname3.tar.gz

    It is assumed that the file hostname3.tar.gz that was generated from the initial cluster setup is still stored on the HA node.

The steps to restore a backup are the following:

  1. Take the entire cluster offline by disabling all HA nodes. This is done by entering the rc disable command on each HA node separately.

  2. Restore the backup to just one HA node. Doing this is described in Section 17.8, System Backup and Restore.

  3. Set up the whole cluster from the beginning by adding each HA node, as described previously. Provided that the original .tar.gz files are still present on each HA node, this only requires the rc initiate command to be entered on each.

    Make sure that the HA node that received the backup is the last one to be added to the cluster and that the rc initiate command has the -f option. This will force the last HA node's database to be copied to the other HA nodes. For example, if hostname3 received the backup, the last initiate command would be:

    $ rc initiate -c hostname3.tar.gz -f

Node Logging in a Cluster

In the current InCenter version, the log messages sent by managed nodes are stored only on the HA node which is currently active. The logs collected are not mirrored across all HA nodes. This means that if a failover occurs, these log messages can become split across HA nodes.

[Note] Note: Communication between instances

When redundancy mode is enabled, secure communication between HA node instances in a cluster is done using TCP on port 27017 with authentication being done using the generated X.509 certificates and traffic encryption using TLS/SSL.

Load Balancer Port Forwarding

When setting up the load balancer, the following ports need to be forwarded:

Other Notes About Load Balancer Functioning

The following should be noted about the functioning of the load balancer:

Converting an Older Pre-1.64 Redundancy Cluster

Prior to InCenter version 1.64.00, redundancy was achieved used a different approach. Such older clusters will not function with InCenter version 1.64.00 or later. If upgrading a cluster to 1.64.00 or later, the old cluster should first be dismantled and then a new cluster created based on the latest InCenter version.

The following steps should be used to dismantle a pre-1.64 cluster:

  1. Disable redundancy on the secondary HA node in the old cluster.

  2. Disable redundancy on the primary HA node in the old cluster.

  3. If required to restore the current configuration later, take a configuration and log data backup from the old primary HA node and download the files to a management computer. Doing this is described in Section 17.8, System Backup and Restore.

  4. Shut down the HA nodes.

Continue with the following steps to create a new cluster:

  1. Make sure all the HA nodes in the new cluster are running the latest InCenter version.

  2. If required, restore the configuration and log files from the old cluster to any one of the HA nodes in the new cluster.

  3. Follow the procedure described earlier in this chapter to create a new redundancy cluster.

    If step 2 above has been performed and an older HA node configuration has been restored to one of the new HA nodes, make sure that that HA node is the last node added to the new HA cluster (and includes the -f option). In this way, the restored configuration will also overwrite the configuration of the other HA nodes.