22.2. HA Mechanisms

This section discusses in more depth the mechanisms that Clavister NetShield Firewalls use to implement the HA feature.

Basic Principles

An HA cluster provides a redundant, state-synchronized hardware configuration. The state of the active unit, which includes the flow table and other vital information, is continuously copied to the inactive unit via a single Sync interface. When cluster failover occurs, the inactive unit knows which connections are active, and traffic can continue to flow after the failover with negligible disruption.

The inactive system detects that the active system is no longer operational when it no longer detects sufficient cluster heartbeats.

Heartbeats have the following characteristics:

Heartbeat Monitoring

cOS Stream sends a given number of heartbeats per second on the Sync interface and any interfaces designated as Critical. The interval between heartbeats is made long enough to not be mistaken for a delay that could occur during normal operation.

Both peers send heartbeats to each other and both monitor missed heartbeats in the following way:

[Note] Limits of monitored IP addresses

The MonitorTargets property has a limit of up to 10 IPv6 and 10 IPv4 IP addresses. If this limit is exceeded, the interface monitoring feature for the address type will not function.

For example, if 20 IPv6 addresses are configured as well as 8 IPv4 addresses. None of the IPv6 addresses will be monitored but all the IPv4 addresses will be.

Heartbeats Over a VLAN

It is possible to configure cOS Stream to send and receive heartbeats over a specific VLAN attached to a critical Ethernet interface instead of directly on Ethernet interface itself (default). An example of the CLI command needed to change from heartbeats being transported directly on EthernetInterface if1 to VLAN vlan_10:
System:/> set Interface EthernetInterface if1 HeartbeatTransport=vlan_10
All heartbeats for the Critical Ethernet Interface will be transported over the single selected interface. This means that it is not possible to have heartbeats transported on, for example, two VLANs attached to the same Ethernet interface or on both the Ethernet interface and a VLAN attached to that interface.

Failover Time

The time for failover is typically within seconds which means that clients may experience a failover as a slight burst of packet loss. In the case of TCP, the failover time is well within the range of normal retransmit timeouts so TCP will retransmit the lost packets within a very short space of time and continue communication. UDP does not allow retransmission since it is inherently an unreliable protocol.

Shared IPv4 Addresses and ARP

Both master and slave units in a cluster are aware of the shared IP addresses. However, ARP queries for the shared IPv4 address, or any other IP address published via ARP configuration or through Proxy ARP, are answered by the active unit only.

The hardware address of the shared IPv4 address and other published addresses are not related to the actual MAC addresses of the Ethernet interfaces. Instead, a new MAC address is constructed by cOS Stream. The first part of the constructed address is always 10:00:00. The second part is based on the configuration including the cluster ID.

As the shared IP address always has the same hardware address, there will be no latency time in updating ARP caches of units attached to the same LAN as the cluster when a failover occurs.

When a cluster member discovers that its peer is not operational, it broadcasts gratuitous ARP queries on all interfaces using the shared MAC address as the sender. This allows switches to re-learn within milliseconds where to send packets destined for the shared address. Therefore, the only failover delay is in detecting that the active unit is down.

ARP queries are also broadcast periodically to ensure that switches do not forget where to send packets destined for the shared hardware address.