UnixPedia : HPUX / LINUX / SOLARIS: HPUX : cmviewcl : Cannot view the cluster configuration.

Saturday, May 3, 2014

HPUX : cmviewcl : Cannot view the cluster configuration.



cmviewcl : Cannot view the cluster configuration.
Overview
cmviewcl : Cannot view the cluster configuration.
Procedures

The following recommendations were made

-kill off that long running process 'cmclconfd -p' that was identified per the logs on both nodes
-fix /var/adm/inetd.sec so it at least looks as following for the ident service and then re-load/re-read the inetd config with inetd -c:
ident allow 10.57.41.28 itmpsh08.xyz.com127.0.0.1 120.157.41.27 itmptr07.xyz.com
OR
ident allow 120.57.41.28 \
                   itmptr08.xyz.com\
                   127.0.0.1 \
           120.57.41.27 \
                   itmptr07.xyz.com


Serviceguard Command Problems



Page Content
Problem

Sometimes Serviceguard commands fail and log messages that indicate that either the node is not configured into the cluster, that the binary configuration file misses or other basic problems. The commands that are usually affected by this are:

cmapplyconf
cmcheckconf
cmcp
cmdeleteconf
cmexec
cmgetconf
cmhaltcl
cmhaltnode
cmhaltpkg
cmmodpkg
cmquerycl
cmruncl
cmrunnode
cmrunpkg
cmviewcl
cmviewconf

Symptoms

Here are some typical examples:

# cmviewcl
cmviewcl : Cannot view the cluster configuration.
Either this node is not configured in a cluster, user doesn't have
access to view the cluster configuration, or there is some obstacle
to viewing the configuration. Check the syslog file for more information.
For a list of possible causes, see the Serviceguard manual for cmviewcl.

# cmviewconf
cmviewconf: Unable to get cluster configuration information.
Unable to open communications to configuration daemon: Connection refused
Unable to connect to configuration database.

# cmviewconf
cmviewconf: Either binary file does not exist, or the user doesn't
have access to view the cluster configuration.

# cmhaltpkg dbciSK)1
cmhaltpkg : Unable to open handle to local cluster
Either no cluster configuration file exists, the file is corrupted,
cmclconfd is unable to run, or user root on node nero
doesn't have access to view the configuration.

# cmviewcl

CLUSTER STATUS
alwayson up
Failed to get dlm configuration.

# cmapplyconf -v -C cluster.ascii
Begin cluster verification...
Checking cluster file: cluster.ascii
Checking nodes ... Done
Checking existing configuration ... Done
Node gasteropod is refusing Serviceguard communication.
Please make sure that the proper security access is configured on node
gasteropod through either file-based access (pre-A.11.16 version) or role-based
access (version A.11.16 or higher) and/or that the host name lookup
on node catou-ogsbpc5 resolves the IP address correctly.

Background

The above mentioned commands have in common that they access the SG configuration daemon cmclconfd to collect information for them. Basically if the SG commands do not get a reply from the daemon they will log messages similar to those above. There are numerous causes why a reply is not returned to the command. Ruling them out one after another will usually resolve the problem.

Checklist

  1.  Do you try to run a SG command on a node not having SG configured and that accesses a node that is running in a cluster? E.g. you run cmcheckconf to add a node that is not currently member of the cluster and you run the cmcheckconf on the node that is to be added? For SG A.11.16 and later you should run the command on the node that has SG already configured. Otherwise you would need to change the Role Based Access Policies in the cluster.ascii of the existing cluster to allow external nodes to modify the configuration.
  2.  /usr/lbin/cmclconfd exists and has appropriate execution rights. It's cksum and what string match what is documented (e.g. in patch texts for SG).

# ls -l /usr/lbin/cmclconfd
-r-xr--r-- 1 bin bin 3725848 Mar 14 2005 /usr/lbin/cmclconfd
  3.  The hacl-cfg/tcp and hacl-cfg/udp ports are listed in /etc/services. In a NIS environment the command "ypcat services" list the ports.

# grep hacl-cfg /etc/services
hacl-cfg 5302/tcp # HA Cluster TCP configuration
hacl-cfg 5302/udp # HA Cluster UDP configuration
  4.  The hacl-cfg/tcp and hacl-cfg/udp ports are listed in /etc/inetd.conf on HP-UX as

hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c
  5.  inetd is running and it registered the ports correctly when it was restarted last

# grep hacl-cfg /var/adm/syslog/syslog.log
Aug 22 14:25:30 nero inetd[980]: hacl-cfg/udp: Added service, server /usr/lbin/cmclconfd
Aug 22 14:25:30 nero inetd[980]: hacl-cfg/tcp: Added service, server /usr/lbin/cmclconfd

Netstat -an shows that inetd is listening on hacl-cfg/tcp.

# netstat -an | grep 5302 | grep LISTEN
tcp 0 0 *.5302 *.* LISTEN
  6.  Make sure that /var/adm/inetd.sec does not deny access for cluster nodes to hacl-cfg ports.
  7.  Make sure the subnet masks for a subnet have the same value on all nodes

# ifconfig lan0
lan0: flags=843<UP,BROADCAST,RUNNING,MULTICAST>
inet 16.25.249.67 netmask ffffff00 broadcast 16.25.249.255

If the subnet mask is oncorrect, the udp broadcast sent by most commands may not reach other nodes resulting in a "refusing communications" message.
  8.  Enable inetd connection logging (inetd -l) to verify that a local SG command (e.g. cmviewconf) connects to inetd which in turn starts the cmclconfd server process.

# tail -f /var/adm/syslog.log &
# inetd -l
# Apr 20 14:57:14 nero inetd[1189]: Connection logging enabled

# cmviewconf > /dev/null 2>&1
Apr 20 14:57:47 nero inetd[1395]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Fri Apr 20 14:57:47 2007
Apr 20 14:57:48 nero inetd[1396]: ident/tcp: Connection from localhost (127.0.0.1) at Fri Apr 20 14:57:47 2007

If there are no messages for connection logging, check the next step or consider restarting inetd.
  9.  Firewall software (like IP Filter) must not disallow access to hacl-cfg ports from and to all other cluster nodes on all IP addresses the cluster nodes can potentially talk on. For latest info on SG port requirements <http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Pages/sg_ports.aspx> refer to the SG Release Notes.
  10. Make sure to list all cluster IP addresses in /etc/hosts. Make sure the cluster IP addresses are listed at the top of the file. Use an /etc/nsswitch.conf file of the form:

hosts: files [NOTFOUND=continue] dns
  11. For SG versions SG A.11.15 and earlier: Make sure either /etc/cmcluster/cmclnodelist or if this file does not exist $HOME/.rhosts contain the IP addresses of all cluster nodes and of all subnets the nodes can potentially talk on (not only those configured in the cluster binary /etc/cmcluster/cmclconfig). Make sure these SG versions use most recent patch levels. If you use $HOME/.rhosts make sure to list the cluster IP addresses at the top of the file.
  12. For SG versions SG A.11.16 and later:

If there is no /etc/cmcluster/cmclconfig file: Make sure /etc/cmcluster/cmclnodelist contains the IP addresses of all cluster nodes and of all subnets the cluster nodes can potentially talk on.

If there is /etc/cmcluster/cmclconfig already: Make sure /etc/hosts resolves the
IP addresses of all cluster nodes and of all subnets the cluster nodes can potentially talk on. Also make sure that each of these IP addresses has an alias that matches the hostname of the host that owns the IP.  Below is an example /etc/hosts file:

125.145.162.131          gryf.uksr.hp.com        gryf
120.8.0.131              gryf.uksr.hp.com        gryf
120.8.1.131              gryf.uksr.hp.com        gryf
120.8.2.131              gryf.uksr.hp.com        gryf
125.145.162.132          sly.uksr.hp.com         sly
120.8.0.132              sly.uksr.hp.com         sly
120.8.1.132              sly.uksr.hp.com         sly
120.8.2.132              sly.uksr.hp.com         sly
125.145.162.67           bit.uksr.hp.com         bit
120.30.8.8               bit.uksr.hp.com         bit
125.145.162.69           bot.uksr.hp.com         bot
120.30.8.7               bot.uksr.hp.com         bot

  13. In recent versions of SG, cmclconfd -c (and cmomd) makes use of the identd service. Make sure access to the port 113/tcp is possible (listed as identd or auth port in /etc/services or NIS). To verify that identd works correctly, use the following test:

1)  Choose a connection from netstat -an
# netstat -an | more
Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
[..]
tcp        0      0  127.0.0.1.60934        127.0.0.1.5304          ESTABLISHED

2) telnet to the identd port and enter the two port numbers, separated by comma.

# telnet localhost 113
Trying...
Connected to localhost.
Escape character is '^]'.
60934, 5304           <---- enter port numbers here
60934 , 5304 : USERID : UNIX :root   <----- identds reply
Connection closed by foreign host.

To disable identd usage on HP-UX add the '-i' option to the line of cmclconfd -c and cmomd in /etc/inetd.conf on all cluster nodes and run inetd -c.

  14. If using NIS and with PHNE_37488 installed, identd would log debug messages beginning with "yp_bind client:" on stderr, which confuses cmclconfd and causing many SG commands to fail. This is fixed in PHNE_38906.
  15. On HP-UX 11iv2 systems with Trusted Systems make sure you have PHCO_32794 installed. PHCO_32794 solves a libsec problem that causes inetd to hang if a server process on HP-UX 11iv2 with Trusted System is started for a non-root user. As a workaround you can change the line in /etc/inetd.conf from:

auth        stream tcp6 wait   bin  /usr/lbin/identd   identd

to

auth        stream tcp6 wait   root /usr/lbin/identd   identd

and run "inetd -c". This workaround has also helped on systems that did not use Trusted Systems.
  16. For versions of SG running on HP-UX 11.11 that make use of the security enhancements, make sure you have version 2.7.4 or later of identd installed. Check by doing:

# what /usr/lbin/identd
usr/lbin/identd:
$Revision identd 2.7.4 (PHNE_26305) $

If the version is not sufficient you need to update to a later version of sendmail. Also make sure that you run ARPA patch PHNE_31247 for HPUX 11.11 or later or PHNE_24715 on HPUX 11.00.

  17. There were cases where inetd connection logging (inetd -l) caused delayed responses from identd, which in turn caused delays of SG commands (e.g. cmviewcl). When connection logging is enabled, inetd would perform nameserver queries to lookup the source of incoming connections. On HP-UX 11.23 and later you should make sure to have the ipnodes entry correctly in /etc/nsswitch.conf so that nameservice lookup is done correctly.
  18. There was a case when identd gave invalid responses when the /etc directory has invalid permissions (444 instead of 555).

  19. a) On HP-UX 11iv1 if the Strong Random Number Generator is installed make sure you run version B.11.11.07 or later.

# swlist -l bundle | grep KRNG
KRNG11i B.11.11.09 HP-UX 11.11 Strong Random Number Generator

Also make sure the /dev/random and /dev/urandom device files have a major number matching the "rng" driver:

# ls -l /dev/*random
cr--r--r-- 1 bin bin 62 0x000000 Nov 19 18:09 /dev/random
cr--r--r-- 1 bin bin 62 0x000001 Nov 19 18:09 /dev/urandom
# lsdev | grep 62
62 -1 rng pseudo

b) On HP-UX 11iv2 the Strong Random Generator is installed by default. As in the previous step check that the device files exist and have the correct major number assigned. Also check that the kernel module is in "loaded" state.

# kcmodule rng
Module State Cause Notes
rng loaded explicit loadable, unloadable
  20. Make sure that the number of SG commands running in parallel is not too high. Often users run cmviewcl in shell scripts to automate status monitoring. This can lead to problems when the requests cannot be answered quickly enough. You can check this by determining how long cmclconfd -p (not the ones with -c!) is running already (use ps -ef).

If cmclconfd -p runs for a long time already (days or even weeks) this is an indicaton that many SG commands are executed on the cluster (not only the local node) in parallel. Also high CPU usage of cmclconfd -p and/or cmcld could be an indicator for this (run top(1m) to verify).
a) Read HA Products Newsletter #50<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter50.aspx>, 1st article.
b) Also read HA Products Newsletter #59<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter59.aspx>, 3rd article, for a known problem caused by Openview Operations agents.
c) ISEE might run SG commands too often. Check /opt/runner/runner.conf:

#seconds.  The amount of time between checks of Service Guard
export RunnerSGInterval="30"
  21. If the CPU usage of cmclconfd -p is low and there is no indication of many SG commands being executed, but cmclconfd -p already runs for a long time, it might be that cmclconfd is hung. E.g. A hung of cmclconfd -p has been seen after inadvertent change of the hostname of a cluster member.
To kill it and to get a core file of cmclconfd, kill it with signal SIGABRT ( kill -SIGABRT <pid_of_cmclconfd_-p> ). cmclconfd would automatically be restarted by the next SG command requesting it. Similarly, if the CPU usage of cmclconfd -p is high yet there are no new connections coming into the daemon it may be spinning. Once again kill it with a SIGABRT and it will be re-started automatically when needed. In both cases please provide the core file to WTEC for analysis.
  22. If the SG command's response is delayed by 10s only, you should check if the server's primary lan is available. SG commands would try 10s on the primary lan before using other lans.

Read and adhere to the Special Installation Instructions of the SG patch you are using.

Troubleshooting

If the above checklist does not help to determine the problem, it is recommended to do the following:

  *   on HP-UX enable inetd logging on all cluster nodes by running "inetd -l"
  *   enable cmclconfd logging<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Pages/sg_debug_logging.aspx#70>
  *   run the easiest SG command that shows the problem, e.g. cmviewconf.
  *   check the debug logs and elevate to HA WTEC

Further Reading:

  *   Serviceguard October 2004 Security Patches<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Product%20Information%20Library/sgsecpatch.pdf>
  *   Serviceguard Manual and Release Notes<http://docs.hp.com/en/ha#Serviceguard>
  *   Serviceguard Patch Text and Special Installation Instructions
  *   HA Product Newsletters<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter.aspx>
  *   "SG Configuration" Module<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Product%20Information%20Library/Old%20Training/mod04_sgconfig.pdf> of the Troubleshooting SG training slideset
Keywords.
Cmviewcl

No comments:

Post a Comment