UnixPedia : HPUX / LINUX / SOLARIS: HPUX : CLUSTER : LAN Failback issues during Switch Activity

Sunday, March 23, 2014

HPUX : CLUSTER : LAN Failback issues during Switch Activity

LAN Failback issues during Switch Activity

As per this document, failback to primary LAN will not happen automatically and you have to manually enable the card by using cmmodent.

OS: HP-UX 11i.
Serviceguard: Revision 11.18 and later.

I am running Serviceguard (SG) environment with configured LAN cards. I noticed if a failure takes place that results in SG failing subnet from Primary (Pri) to Standby (Stby) LAN card, in some cases when the Pri LAN card becomes available, SG will fail the subnet back to it. In other cases, even after Pri LAN card becomes available, the subnet remains configured on the Stby card. NETWORK_AUTO_FAILBACK is enabled in all our environments.
What causes this and when does SG failback subnet to Pri card after it becomes available?
Also in the cases when SG does not fail subnet back to Primary, is there a manual way to do this?

Solution
In order to understand the answer to these questions, understand the reasons SG fails subnet from Pri LAN to Stby one. From the very first version of SG HP introduced in it the capability of monitoring LAN card called Link Level monitor. When a failure takes place with such monitoring, the subnet is failed from Pri to Stby. Also while subnet is running on Stby, under this type of failure, SG continues to monitor the Pri LAN for health check at the Link Level. When it becomes available, then SG will fail subnet back from Stby to Pri.
This has always worked this way until recent versions when a new type of monitoring was introduced; it is called IP Level monitoring. Under this type of monitoring, when a failure results in failing subnet to Stby, the Pri no longer is monitored. So even if the problem goes away, the subnet continues to run on the Stby card.
The only way to fail subnet back to Pri is to:
Ensure all is OK and it is safe to fail subnet back to Pri.
Run the command:
# cmmodnet -e <Pri_Lan_Name>
For more details consult the SG manual. The most recent version as of the writing of this document is at:
http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c02437444/c02437444.pdf.
For example on page 97 it states:

NOTE:The NETWORK_AUTO_FAILBACK setting applies only to link-level failures, not to failures at the IP level; see "Monitoring LAN Interfaces and Detecting Failure: IP Level" (page 98) for more information about such failures. For more information about the cluster configuration file, see "Cluster Configuration Parameters" (page 143).
So even if that parameter is enabled, it will not help the user when the failure is on the IP Level monitoring.

No comments:

Post a Comment