cmviewcl : Cannot view the cluster
configuration.
Overview
|
cmviewcl
: Cannot view the cluster configuration.
|
Procedures
|
The following recommendations were made
-kill off that long running process 'cmclconfd -p' that was
identified per the logs on both nodes
-fix /var/adm/inetd.sec so it at least looks as following for
the ident service and then re-load/re-read the inetd config with inetd -c:
ident allow 10.57.41.28 itmpsh08.xyz.com127.0.0.1 120.157.41.27
itmptr07.xyz.com
OR
ident allow 120.57.41.28 \
itmptr08.xyz.com\
127.0.0.1 \
120.57.41.27 \
itmptr07.xyz.com
Serviceguard Command Problems
Page Content
Problem
Sometimes Serviceguard commands fail and log messages that
indicate that either the node is not configured into the cluster, that the binary
configuration file misses or other basic problems. The commands that are
usually affected by this are:
cmapplyconf
cmcheckconf
cmcp
cmdeleteconf
cmexec
cmgetconf
cmhaltcl
cmhaltnode
cmhaltpkg
cmmodpkg
cmquerycl
cmruncl
cmrunnode
cmrunpkg
cmviewcl
cmviewconf
Symptoms
Here are some typical examples:
# cmviewcl
cmviewcl : Cannot view the cluster configuration.
Either this node is not configured in a cluster, user doesn't
have
access to view the cluster configuration, or there is some
obstacle
to viewing the configuration. Check the syslog file for more
information.
For a list of possible causes, see the Serviceguard manual for
cmviewcl.
# cmviewconf
cmviewconf: Unable to get cluster configuration information.
Unable to open communications to configuration daemon:
Connection refused
Unable to connect to configuration database.
# cmviewconf
cmviewconf: Either binary file does not exist, or the user
doesn't
have access to view the cluster configuration.
# cmhaltpkg dbciSK)1
cmhaltpkg : Unable to open handle to local cluster
Either no cluster configuration file exists, the file is
corrupted,
cmclconfd is unable to run, or user root on node nero
doesn't have access to view the configuration.
# cmviewcl
CLUSTER STATUS
alwayson up
Failed to get dlm configuration.
# cmapplyconf -v -C cluster.ascii
Begin cluster verification...
Checking cluster file: cluster.ascii
Checking nodes ... Done
Checking existing configuration ... Done
Node gasteropod is refusing Serviceguard communication.
Please make sure that the proper security access is configured
on node
gasteropod through either file-based access (pre-A.11.16
version) or role-based
access (version A.11.16 or higher) and/or that the host name
lookup
on node catou-ogsbpc5 resolves the IP address correctly.
Background
The above mentioned commands have in common that they access
the SG configuration daemon cmclconfd to collect information for them.
Basically if the SG commands do not get a reply from the daemon they will log
messages similar to those above. There are numerous causes why a reply is not
returned to the command. Ruling them out one after another will usually
resolve the problem.
Checklist
1. Do you try to run a SG command on a node not having
SG configured and that accesses a node that is running in a cluster? E.g. you
run cmcheckconf to add a node that is not currently member of the cluster and
you run the cmcheckconf on the node that is to be added? For SG A.11.16 and
later you should run the command on the node that has SG already configured.
Otherwise you would need to change the Role Based Access Policies in the
cluster.ascii of the existing cluster to allow external nodes to modify the
configuration.
2. /usr/lbin/cmclconfd exists and has appropriate
execution rights. It's cksum and what string match what is documented (e.g.
in patch texts for SG).
# ls -l /usr/lbin/cmclconfd
-r-xr--r-- 1 bin bin 3725848 Mar 14 2005 /usr/lbin/cmclconfd
3. The hacl-cfg/tcp and hacl-cfg/udp ports are
listed in /etc/services. In a NIS environment the command "ypcat
services" list the ports.
# grep hacl-cfg /etc/services
hacl-cfg 5302/tcp # HA Cluster TCP configuration
hacl-cfg 5302/udp # HA Cluster UDP configuration
4. The hacl-cfg/tcp and hacl-cfg/udp ports are
listed in /etc/inetd.conf on HP-UX as
hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd
-c
5. inetd is running and it registered the ports
correctly when it was restarted last
# grep hacl-cfg /var/adm/syslog/syslog.log
Aug 22 14:25:30 nero inetd[980]: hacl-cfg/udp: Added service,
server /usr/lbin/cmclconfd
Aug 22 14:25:30 nero inetd[980]: hacl-cfg/tcp: Added service,
server /usr/lbin/cmclconfd
Netstat -an shows that inetd is listening on hacl-cfg/tcp.
# netstat -an | grep 5302 | grep LISTEN
tcp 0 0 *.5302 *.* LISTEN
6. Make sure that /var/adm/inetd.sec does not
deny access for cluster nodes to hacl-cfg ports.
7. Make sure the subnet masks for a subnet have
the same value on all nodes
# ifconfig lan0
lan0: flags=843<UP,BROADCAST,RUNNING,MULTICAST>
inet 16.25.249.67 netmask ffffff00 broadcast 16.25.249.255
If the subnet mask is oncorrect, the udp broadcast sent by
most commands may not reach other nodes resulting in a "refusing
communications" message.
8. Enable inetd connection logging (inetd -l) to
verify that a local SG command (e.g. cmviewconf) connects to inetd which in
turn starts the cmclconfd server process.
# tail -f /var/adm/syslog.log &
# inetd -l
# Apr 20 14:57:14 nero inetd[1189]: Connection logging enabled
# cmviewconf > /dev/null 2>&1
Apr 20 14:57:47 nero inetd[1395]: hacl-cfg/tcp: Connection
from localhost (127.0.0.1) at Fri Apr 20 14:57:47 2007
Apr 20 14:57:48 nero inetd[1396]: ident/tcp: Connection from
localhost (127.0.0.1) at Fri Apr 20 14:57:47 2007
If there are no messages for connection logging, check the
next step or consider restarting inetd.
9. Firewall software (like IP Filter) must not
disallow access to hacl-cfg ports from and to all other cluster nodes on all
IP addresses the cluster nodes can potentially talk on. For latest info on SG
port requirements
<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Pages/sg_ports.aspx>
refer to the SG Release Notes.
10. Make sure to list all cluster IP addresses in
/etc/hosts. Make sure the cluster IP addresses are listed at the top of the
file. Use an /etc/nsswitch.conf file of the form:
hosts: files [NOTFOUND=continue] dns
11. For SG versions SG A.11.15 and earlier: Make sure
either /etc/cmcluster/cmclnodelist or if this file does not exist
$HOME/.rhosts contain the IP addresses of all cluster nodes and of all
subnets the nodes can potentially talk on (not only those configured in the
cluster binary /etc/cmcluster/cmclconfig). Make sure these SG versions use
most recent patch levels. If you use $HOME/.rhosts make sure to list the
cluster IP addresses at the top of the file.
12. For SG versions SG A.11.16 and later:
If there is no /etc/cmcluster/cmclconfig file: Make sure
/etc/cmcluster/cmclnodelist contains the IP addresses of all cluster nodes
and of all subnets the cluster nodes can potentially talk on.
If there is /etc/cmcluster/cmclconfig already: Make sure
/etc/hosts resolves the
IP addresses of all cluster nodes and of all subnets the
cluster nodes can potentially talk on. Also make sure that each of these IP
addresses has an alias that matches the hostname of the host that owns the IP.
Below is an example /etc/hosts file:
125.145.162.131
gryf.uksr.hp.com gryf
120.8.0.131
gryf.uksr.hp.com gryf
120.8.1.131
gryf.uksr.hp.com gryf
120.8.2.131
gryf.uksr.hp.com
gryf
125.145.162.132
sly.uksr.hp.com sly
120.8.0.132
sly.uksr.hp.com sly
120.8.1.132
sly.uksr.hp.com sly
120.8.2.132
sly.uksr.hp.com sly
125.145.162.67
bit.uksr.hp.com
bit
120.30.8.8
bit.uksr.hp.com bit
125.145.162.69
bot.uksr.hp.com bot
120.30.8.7
bot.uksr.hp.com bot
13. In recent versions of SG, cmclconfd -c (and cmomd)
makes use of the identd service. Make sure access to the port 113/tcp is
possible (listed as identd or auth port in /etc/services or NIS). To verify
that identd works correctly, use the following test:
1) Choose a connection from netstat -an
# netstat -an | more
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local
Address Foreign
Address (state)
[..]
tcp
0 0
127.0.0.1.60934
127.0.0.1.5304
ESTABLISHED
2) telnet to the identd port and enter the two port numbers,
separated by comma.
# telnet localhost 113
Trying...
Connected to localhost.
Escape character is '^]'.
60934,
5304 <----
enter port numbers here
60934 , 5304 : USERID : UNIX :root <-----
identds reply
Connection closed by foreign host.
To disable identd usage on HP-UX add the '-i' option to the
line of cmclconfd -c and cmomd in /etc/inetd.conf on all cluster nodes and
run inetd -c.
14. If using NIS and with PHNE_37488 installed, identd would
log debug messages beginning with "yp_bind client:" on stderr,
which confuses cmclconfd and causing many SG commands to fail. This is fixed
in PHNE_38906.
15. On HP-UX 11iv2 systems with Trusted Systems make
sure you have PHCO_32794 installed. PHCO_32794 solves a libsec problem that
causes inetd to hang if a server process on HP-UX 11iv2 with Trusted System
is started for a non-root user. As a workaround you can change the line in
/etc/inetd.conf from:
auth stream tcp6
wait bin /usr/lbin/identd identd
to
auth stream tcp6
wait root /usr/lbin/identd identd
and run "inetd -c". This workaround has also helped
on systems that did not use Trusted Systems.
16. For versions of SG running on HP-UX 11.11 that make
use of the security enhancements, make sure you have version 2.7.4 or later
of identd installed. Check by doing:
# what /usr/lbin/identd
usr/lbin/identd:
$Revision identd 2.7.4 (PHNE_26305) $
If the version is not sufficient you need to update to a later
version of sendmail. Also make sure that you run ARPA patch PHNE_31247 for
HPUX 11.11 or later or PHNE_24715 on HPUX 11.00.
17. There were cases where inetd connection logging
(inetd -l) caused delayed responses from identd, which in turn caused delays
of SG commands (e.g. cmviewcl). When connection logging is enabled, inetd
would perform nameserver queries to lookup the source of incoming
connections. On HP-UX 11.23 and later you should make sure to have the
ipnodes entry correctly in /etc/nsswitch.conf so that nameservice lookup is
done correctly.
18. There was a case when identd gave invalid responses
when the /etc directory has invalid permissions (444 instead of 555).
19. a) On HP-UX 11iv1 if the Strong Random Number
Generator is installed make sure you run version B.11.11.07 or later.
# swlist -l bundle | grep KRNG
KRNG11i B.11.11.09 HP-UX 11.11 Strong Random Number Generator
Also make sure the /dev/random and /dev/urandom device files
have a major number matching the "rng" driver:
# ls -l /dev/*random
cr--r--r-- 1 bin bin 62 0x000000 Nov 19 18:09 /dev/random
cr--r--r-- 1 bin bin 62 0x000001 Nov 19 18:09 /dev/urandom
# lsdev | grep 62
62 -1 rng pseudo
b) On HP-UX 11iv2 the Strong Random Generator is installed by
default. As in the previous step check that the device files exist and have
the correct major number assigned. Also check that the kernel module is in
"loaded" state.
# kcmodule rng
Module State Cause Notes
rng loaded explicit loadable, unloadable
20. Make sure that the number of SG commands running in
parallel is not too high. Often users run cmviewcl in shell scripts to
automate status monitoring. This can lead to problems when the requests
cannot be answered quickly enough. You can check this by determining how long
cmclconfd -p (not the ones with -c!) is running already (use ps -ef).
If cmclconfd -p runs for a long time already (days or even
weeks) this is an indicaton that many SG commands are executed on the cluster
(not only the local node) in parallel. Also high CPU usage of cmclconfd -p
and/or cmcld could be an indicator for this (run top(1m) to verify).
a) Read HA Products Newsletter
#50<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter50.aspx>,
1st article.
b) Also read HA Products Newsletter
#59<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter59.aspx>,
3rd article, for a known problem caused by Openview Operations agents.
c) ISEE might run SG commands too often. Check
/opt/runner/runner.conf:
#seconds. The amount of time between checks of Service
Guard
export RunnerSGInterval="30"
21. If the CPU usage of cmclconfd -p is low and there
is no indication of many SG commands being executed, but cmclconfd -p already
runs for a long time, it might be that cmclconfd is hung. E.g. A hung of
cmclconfd -p has been seen after inadvertent change of the hostname of a
cluster member.
To kill it and to get a core file of cmclconfd, kill it with
signal SIGABRT ( kill -SIGABRT <pid_of_cmclconfd_-p> ). cmclconfd would
automatically be restarted by the next SG command requesting it. Similarly,
if the CPU usage of cmclconfd -p is high yet there are no new connections
coming into the daemon it may be spinning. Once again kill it with a SIGABRT
and it will be re-started automatically when needed. In both cases please
provide the core file to WTEC for analysis.
22. If the SG command's response is delayed by 10s
only, you should check if the server's primary lan is available. SG commands
would try 10s on the primary lan before using other lans.
Read and adhere to the Special Installation Instructions of
the SG patch you are using.
Troubleshooting
If the above checklist does not help to determine the problem,
it is recommended to do the following:
* on HP-UX enable inetd logging on all
cluster nodes by running "inetd -l"
* enable cmclconfd
logging<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Pages/sg_debug_logging.aspx#70>
* run the easiest SG command that shows the
problem, e.g. cmviewconf.
* check the debug logs and elevate to HA
WTEC
Further Reading:
* Serviceguard October 2004 Security
Patches<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Product%20Information%20Library/sgsecpatch.pdf>
* Serviceguard Manual and Release
Notes<http://docs.hp.com/en/ha#Serviceguard>
* Serviceguard Patch Text and Special
Installation Instructions
* HA Product
Newsletters<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HA/Pages/Newsletter.aspx>
* "SG Configuration"
Module<http://ent162.sharepoint.hp.com/teams/esssupport/InsideESSSupport/InsideWTEC/HAProducts/Product%20Information%20Library/Old%20Training/mod04_sgconfig.pdf>
of the Troubleshooting SG training slideset
|
Keywords.
|
Cmviewcl
|
No comments:
Post a Comment