UnixPedia : HPUX / LINUX / SOLARIS: HPUX : cmviewcl : Cannot view the cluster configuration.

cmviewcl : Cannot view the cluster configuration.

Overview

cmviewcl : Cannot view the cluster configuration.

Procedures

The following recommendations were made

-kill off that long running process 'cmclconfd -p' that was identified per the logs on both nodes

-fix /var/adm/inetd.sec so it at least looks as following for the ident service and then re-load/re-read the inetd config with inetd -c:

ident allow 10.57.41.28 itmpsh08.xyz.com127.0.0.1 120.157.41.27 itmptr07.xyz.com

ident allow 120.57.41.28 \

itmptr08.xyz.com\

127.0.0.1 \

120.57.41.27 \

itmptr07.xyz.com

Serviceguard Command Problems

Page Content

Problem

Sometimes Serviceguard commands fail and log messages that indicate that either the node is not configured into the cluster, that the binary configuration file misses or other basic problems. The commands that are usually affected by this are:

cmapplyconf

cmcheckconf

cmcp

cmdeleteconf

cmexec

cmgetconf

cmhaltcl

cmhaltnode

cmhaltpkg

cmmodpkg

cmquerycl

cmruncl

cmrunnode

cmrunpkg

cmviewcl

cmviewconf

Symptoms

Here are some typical examples:

# cmviewcl

cmviewcl : Cannot view the cluster configuration.

Either this node is not configured in a cluster, user doesn't have

access to view the cluster configuration, or there is some obstacle

to viewing the configuration. Check the syslog file for more information.

For a list of possible causes, see the Serviceguard manual for cmviewcl.

# cmviewconf

cmviewconf: Unable to get cluster configuration information.

Unable to open communications to configuration daemon: Connection refused

Unable to connect to configuration database.

# cmviewconf

cmviewconf: Either binary file does not exist, or the user doesn't

have access to view the cluster configuration.

# cmhaltpkg dbciSK)1

cmhaltpkg : Unable to open handle to local cluster

Either no cluster configuration file exists, the file is corrupted,

cmclconfd is unable to run, or user root on node nero

doesn't have access to view the configuration.

# cmviewcl

CLUSTER STATUS

alwayson up

Failed to get dlm configuration.

# cmapplyconf -v -C cluster.ascii

Begin cluster verification...

Checking cluster file: cluster.ascii

Checking nodes ... Done

Checking existing configuration ... Done

Node gasteropod is refusing Serviceguard communication.

Please make sure that the proper security access is configured on node

gasteropod through either file-based access (pre-A.11.16 version) or role-based

access (version A.11.16 or higher) and/or that the host name lookup

on node catou-ogsbpc5 resolves the IP address correctly.

Background

The above mentioned commands have in common that they access the SG configuration daemon cmclconfd to collect information for them. Basically if the SG commands do not get a reply from the daemon they will log messages similar to those above. There are numerous causes why a reply is not returned to the command. Ruling them out one after another will usually resolve the problem.

Checklist

1. Do you try to run a SG command on a node not having SG configured and that accesses a node that is running in a cluster? E.g. you run cmcheckconf to add a node that is not currently member of the cluster and you run the cmcheckconf on the node that is to be added? For SG A.11.16 and later you should run the command on the node that has SG already configured. Otherwise you would need to change the Role Based Access Policies in the cluster.ascii of the existing cluster to allow external nodes to modify the configuration.

2. /usr/lbin/cmclconfd exists and has appropriate execution rights. It's cksum and what string match what is documented (e.g. in patch texts for SG).

# ls -l /usr/lbin/cmclconfd

-r-xr--r-- 1 bin bin 3725848 Mar 14 2005 /usr/lbin/cmclconfd

3. The hacl-cfg/tcp and hacl-cfg/udp ports are listed in /etc/services. In a NIS environment the command "ypcat services" list the ports.

# grep hacl-cfg /etc/services

hacl-cfg 5302/tcp # HA Cluster TCP configuration

hacl-cfg 5302/udp # HA Cluster UDP configuration

4. The hacl-cfg/tcp and hacl-cfg/udp ports are listed in /etc/inetd.conf on HP-UX as

hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p

hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c

5. inetd is running and it registered the ports correctly when it was restarted last

# grep hacl-cfg /var/adm/syslog/syslog.log

Aug 22 14:25:30 nero inetd[980]: hacl-cfg/udp: Added service, server /usr/lbin/cmclconfd

Aug 22 14:25:30 nero inetd[980]: hacl-cfg/tcp: Added service, server /usr/lbin/cmclconfd

Netstat -an shows that inetd is listening on hacl-cfg/tcp.

# netstat -an | grep 5302 | grep LISTEN

tcp 0 0 *.5302 *.* LISTEN

6. Make sure that /var/adm/inetd.sec does not deny access for cluster nodes to hacl-cfg ports.

7. Make sure the subnet masks for a subnet have the same value on all nodes

# ifconfig lan0

lan0: flags=843<UP,BROADCAST,RUNNING,MULTICAST>

inet 16.25.249.67 netmask ffffff00 broadcast 16.25.249.255

If the subnet mask is oncorrect, the udp broadcast sent by most commands may not reach other nodes resulting in a "refusing communications" message.

8. Enable inetd connection logging (inetd -l) to verify that a local SG command (e.g. cmviewconf) connects to inetd which in turn starts the cmclconfd server process.

# tail -f /var/adm/syslog.log &

# inetd -l

# Apr 20 14:57:14 nero inetd[1189]: Connection logging enabled

# cmviewconf > /dev/null 2>&1

Apr 20 14:57:47 nero inetd[1395]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Fri Apr 20 14:57:47 2007

Apr 20 14:57:48 nero inetd[1396]: ident/tcp: Connection from localhost (127.0.0.1) at Fri Apr 20 14:57:47 2007

If there are no messages for connection logging, check the next step or consider restarting inetd.

9. Firewall software (like IP Filter) must not disallow access to hacl-cfg ports from and to all other cluster nodes on all IP addresses the cluster nodes can potentially talk on. For latest info on SG port requirements <http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Pages/sg_ports.aspx> refer to the SG Release Notes.

10. Make sure to list all cluster IP addresses in /etc/hosts. Make sure the cluster IP addresses are listed at the top of the file. Use an /etc/nsswitch.conf file of the form:

hosts: files [NOTFOUND=continue] dns

11. For SG versions SG A.11.15 and earlier: Make sure either /etc/cmcluster/cmclnodelist or if this file does not exist $HOME/.rhosts contain the IP addresses of all cluster nodes and of all subnets the nodes can potentially talk on (not only those configured in the cluster binary /etc/cmcluster/cmclconfig). Make sure these SG versions use most recent patch levels. If you use $HOME/.rhosts make sure to list the cluster IP addresses at the top of the file.

12. For SG versions SG A.11.16 and later:

If there is no /etc/cmcluster/cmclconfig file: Make sure /etc/cmcluster/cmclnodelist contains the IP addresses of all cluster nodes and of all subnets the cluster nodes can potentially talk on.

If there is /etc/cmcluster/cmclconfig already: Make sure /etc/hosts resolves the

IP addresses of all cluster nodes and of all subnets the cluster nodes can potentially talk on. Also make sure that each of these IP addresses has an alias that matches the hostname of the host that owns the IP. Below is an example /etc/hosts file:

125.145.162.131 gryf.uksr.hp.com gryf

120.8.0.131 gryf.uksr.hp.com gryf

120.8.1.131 gryf.uksr.hp.com gryf

120.8.2.131 gryf.uksr.hp.com gryf

125.145.162.132 sly.uksr.hp.com sly

120.8.0.132 sly.uksr.hp.com sly

120.8.1.132 sly.uksr.hp.com sly

120.8.2.132 sly.uksr.hp.com sly

125.145.162.67 bit.uksr.hp.com bit

120.30.8.8 bit.uksr.hp.com bit

125.145.162.69 bot.uksr.hp.com bot

120.30.8.7 bot.uksr.hp.com bot

13. In recent versions of SG, cmclconfd -c (and cmomd) makes use of the identd service. Make sure access to the port 113/tcp is possible (listed as identd or auth port in /etc/services or NIS). To verify that identd works correctly, use the following test:

1) Choose a connection from netstat -an

# netstat -an | more

Active Internet connections (including servers)

Proto Recv-Q Send-Q Local Address Foreign Address (state)

[..]

tcp 0 0 127.0.0.1.60934 127.0.0.1.5304 ESTABLISHED

2) telnet to the identd port and enter the two port numbers, separated by comma.

# telnet localhost 113

Trying...

Connected to localhost.

Escape character is '^]'.

60934, 5304 <---- enter port numbers here

60934 , 5304 : USERID : UNIX :root <----- identds reply

Connection closed by foreign host.

To disable identd usage on HP-UX add the '-i' option to the line of cmclconfd -c and cmomd in /etc/inetd.conf on all cluster nodes and run inetd -c.

14. If using NIS and with PHNE_37488 installed, identd would log debug messages beginning with "yp_bind client:" on stderr, which confuses cmclconfd and causing many SG commands to fail. This is fixed in PHNE_38906.

15. On HP-UX 11iv2 systems with Trusted Systems make sure you have PHCO_32794 installed. PHCO_32794 solves a libsec problem that causes inetd to hang if a server process on HP-UX 11iv2 with Trusted System is started for a non-root user. As a workaround you can change the line in /etc/inetd.conf from:

auth stream tcp6 wait bin /usr/lbin/identd identd

auth stream tcp6 wait root /usr/lbin/identd identd

and run "inetd -c". This workaround has also helped on systems that did not use Trusted Systems.

16. For versions of SG running on HP-UX 11.11 that make use of the security enhancements, make sure you have version 2.7.4 or later of identd installed. Check by doing:

# what /usr/lbin/identd

usr/lbin/identd:

$Revision identd 2.7.4 (PHNE_26305) $

If the version is not sufficient you need to update to a later version of sendmail. Also make sure that you run ARPA patch PHNE_31247 for HPUX 11.11 or later or PHNE_24715 on HPUX 11.00.

17. There were cases where inetd connection logging (inetd -l) caused delayed responses from identd, which in turn caused delays of SG commands (e.g. cmviewcl). When connection logging is enabled, inetd would perform nameserver queries to lookup the source of incoming connections. On HP-UX 11.23 and later you should make sure to have the ipnodes entry correctly in /etc/nsswitch.conf so that nameservice lookup is done correctly.

18. There was a case when identd gave invalid responses when the /etc directory has invalid permissions (444 instead of 555).

19. a) On HP-UX 11iv1 if the Strong Random Number Generator is installed make sure you run version B.11.11.07 or later.

# swlist -l bundle | grep KRNG

KRNG11i B.11.11.09 HP-UX 11.11 Strong Random Number Generator

Also make sure the /dev/random and /dev/urandom device files have a major number matching the "rng" driver:

# ls -l /dev/*random

cr--r--r-- 1 bin bin 62 0x000000 Nov 19 18:09 /dev/random

cr--r--r-- 1 bin bin 62 0x000001 Nov 19 18:09 /dev/urandom

# lsdev | grep 62

62 -1 rng pseudo

b) On HP-UX 11iv2 the Strong Random Generator is installed by default. As in the previous step check that the device files exist and have the correct major number assigned. Also check that the kernel module is in "loaded" state.

# kcmodule rng

Module State Cause Notes

rng loaded explicit loadable, unloadable

20. Make sure that the number of SG commands running in parallel is not too high. Often users run cmviewcl in shell scripts to automate status monitoring. This can lead to problems when the requests cannot be answered quickly enough. You can check this by determining how long cmclconfd -p (not the ones with -c!) is running already (use ps -ef).

If cmclconfd -p runs for a long time already (days or even weeks) this is an indicaton that many SG commands are executed on the cluster (not only the local node) in parallel. Also high CPU usage of cmclconfd -p and/or cmcld could be an indicator for this (run top(1m) to verify).

a) Read HA Products Newsletter #50<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter50.aspx>, 1st article.

b) Also read HA Products Newsletter #59<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter59.aspx>, 3rd article, for a known problem caused by Openview Operations agents.

c) ISEE might run SG commands too often. Check /opt/runner/runner.conf:

#seconds. The amount of time between checks of Service Guard

export RunnerSGInterval="30"

21. If the CPU usage of cmclconfd -p is low and there is no indication of many SG commands being executed, but cmclconfd -p already runs for a long time, it might be that cmclconfd is hung. E.g. A hung of cmclconfd -p has been seen after inadvertent change of the hostname of a cluster member.

To kill it and to get a core file of cmclconfd, kill it with signal SIGABRT ( kill -SIGABRT <pid_of_cmclconfd_-p> ). cmclconfd would automatically be restarted by the next SG command requesting it. Similarly, if the CPU usage of cmclconfd -p is high yet there are no new connections coming into the daemon it may be spinning. Once again kill it with a SIGABRT and it will be re-started automatically when needed. In both cases please provide the core file to WTEC for analysis.

22. If the SG command's response is delayed by 10s only, you should check if the server's primary lan is available. SG commands would try 10s on the primary lan before using other lans.

Read and adhere to the Special Installation Instructions of the SG patch you are using.

Troubleshooting

If the above checklist does not help to determine the problem, it is recommended to do the following:

* on HP-UX enable inetd logging on all cluster nodes by running "inetd -l"

* enable cmclconfd logging<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Pages/sg_debug_logging.aspx#70>

* run the easiest SG command that shows the problem, e.g. cmviewconf.

* check the debug logs and elevate to HA WTEC

UnixPedia : HPUX / LINUX / SOLARIS

Humor

Saturday, May 3, 2014

HPUX : cmviewcl : Cannot view the cluster configuration.

No comments:

Post a Comment