22.2. HA Mechanisms

Home Prev	cOS Stream 4.00.05 Administration Guide	Next

This section discusses in more depth the mechanisms that Clavister NetShield Firewalls use to implement the HA feature.

Basic Principles

An HA cluster provides a redundant, state-synchronized hardware configuration. The state of the active unit, which includes the flow table and other vital information, is continuously copied to the inactive unit via a single Sync interface. When cluster failover occurs, the inactive unit knows which connections are active, and traffic can continue to flow after the failover with negligible disruption.

The inactive system detects that the active system is no longer operational when it no longer detects sufficient cluster heartbeats.

Heartbeats have the following characteristics:

Heartbeats are Ethernet frames and not IP packets.
Heartbeats cannot be forwarded by a router since they do not contain an IP header.
The Ethernet source and destination address is based on the cluster ID and the role of the sending and receiving unit.
The Ethernet frame type is set as 0xC14B.

Heartbeat Monitoring

cOS Stream sends a given number of heartbeats per second on the Sync interface and any interfaces designated as Critical. The interval between heartbeats is made long enough to not be mistaken for a delay that could occur during normal operation.

Both peers send heartbeats to each other and both monitor missed heartbeats in the following way:

If a critically tagged interface misses a given number of heartbeats over a given period of time, that interface enters a state known as Early Interface Failure Detection. By default, this period of time is 600 milliseconds but it can be manually changed by setting the global property IfaceEarlyDown in HASettings. The separate CLI Reference Guide provides a detailed description of this setting.

This state means that the node will send out queries (ARP for IPv4, NDP for IPv6) from the suspect interface to preconfigured IP addresses to see if any replies are received. This allows cOS Stream to determine if the failure is in the local Ethernet (or VLAN) interface or if the problem is the peer failing to send heartbeats.

Queries are sent to all the IP addresses configured in the MonitorTargets properties of the EthernetInterface and any specified VLAN. These IPs could be a range and could be a mixture of IPv4 and IPv6 addresses. For example, if an address book object called mon_range defines these addresses, the CLI command to use the range would be:

System:/> set Interface EthernetInterface if1 MonitorTargets=mon_range

	Note: MonitorTargets Supports Ethernet and VLAN Interfaces Only
	The MonitorTargets property can only be used with Ethernet or VLAN interfaces. For more information see the following section: Monitor Targets

If none of the critical interfaces are receiving heartbeats for a period of 1900 milliseconds, the peer will be declared as down and a PeerDead event will trigger causing a cluster failover (if active node). it can be manually changed by setting the global property PeerDead in HASettings. The separate CLI Reference Guide provides a detailed description of this setting.
Queries are sent to all configured MonitorTargets addresses. If no reply has been received after a given period of time, the node will consider the local interface to be malfunctioning and generate the log message HA interface offline. If replies are received but no new heartbeats are received after the time period, the other peer is considered to have a malfunctioning interface.

A failover then occurs if the active node detected the malfunction. Depending on certain internal conditions, this can be followed by an automatic restart of the now inactive node in an attempt to resolve the problem.

The administrator can investigate the failed interface further by using the ifstat command and if possible checking the cabling to the interface. Restarting the hardware could clear the problem.
The failure of the Sync interface means the nodess can no longer be synchronized. Since IP monitoring is not possible, the node that has the failed interface cannot be determined and the active node continues as an independent firewall. A log event message indicating this is generated and cluster functionality only returns when the Sync interface problem is resolved.

	Limits of monitored IP addresses
The MonitorTargets property has a limit of up to 10 IPv6 and 10 IPv4 IP addresses. If this limit is exceeded, the interface monitoring feature for the address type will not function. For example, if 20 IPv6 addresses are configured as well as 8 IPv4 addresses. None of the IPv6 addresses will be monitored but all the IPv4 addresses will be.

Limits of monitored IP addresses

The MonitorTargets property has a limit of up to 10 IPv6 and 10 IPv4 IP addresses. If this limit is exceeded, the interface monitoring feature for the address type will not function.

For example, if 20 IPv6 addresses are configured as well as 8 IPv4 addresses. None of the IPv6 addresses will be monitored but all the IPv4 addresses will be.

Heartbeats Over a VLAN

It is possible to configure cOS Stream to send and receive heartbeats over a specific VLAN attached to a critical Ethernet interface instead of directly on Ethernet interface itself (default). An example of the CLI command needed to change from heartbeats being transported directly on EthernetInterface if1 to VLAN vlan_10:

System:/> set Interface EthernetInterface if1 HeartbeatTransport=vlan_10

All heartbeats for the Critical Ethernet Interface will be transported over the single selected interface. This means that it is not possible to have heartbeats transported on, for example, two VLANs attached to the same Ethernet interface or on both the Ethernet interface and a VLAN attached to that interface.

Failover Time

The time for failover is typically within seconds which means that clients may experience a failover as a slight burst of packet loss. In the case of TCP, the failover time is well within the range of normal retransmit timeouts so TCP will retransmit the lost packets within a very short space of time and continue communication. UDP does not allow retransmission since it is inherently an unreliable protocol.

Shared IPv4 Addresses and ARP

Both master and slave units in a cluster are aware of the shared IP addresses. However, ARP queries for the shared IPv4 address, or any other IP address published via ARP configuration or through Proxy ARP, are answered by the active unit only.

The hardware address of the shared IPv4 address and other published addresses are not related to the actual MAC addresses of the Ethernet interfaces. Instead, a new MAC address is constructed by cOS Stream. The first part of the constructed address is always 10:00:00. The second part is based on the configuration including the cluster ID.

As the shared IP address always has the same hardware address, there will be no latency time in updating ARP caches of units attached to the same LAN as the cluster when a failover occurs.

When a cluster member discovers that its peer is not operational, it broadcasts gratuitous ARP queries on all interfaces using the shared MAC address as the sender. This allows switches to re-learn within milliseconds where to send packets destined for the shared address. Therefore, the only failover delay is in detecting that the active unit is down.

ARP queries are also broadcast periodically to ensure that switches do not forget where to send packets destined for the shared hardware address.