What are the
steps to process an 11i crash dump?
ANSWER:
PROCESSING
HPUX DUMPS (11.11 - 11.31)
(Please read completely before using)
===============================================================================
WHAT ARE DUMP PROCESSING TOOLS ?
===============================================================================
(Please read completely before using)
===============================================================================
WHAT ARE DUMP PROCESSING TOOLS ?
===============================================================================
If HP-UX
crashes, system firmware will save critical O/S state data from RAM to the swap
LVOL or a dump device, then re-boots the system where the kernel copies the
dump to a file system directory.
Depending on
the version of HPUX, q4 , crashinfo , and crashlite
may be used to process the dumps to provide details about the cause of the
crash.
== STEP 1
===== WHERE IS THE DUMP? ==========================================
1.1 /var/adm/crash/ is the default destination
directory for dumps. If a directory is not specified in the boot-time dump
configuration file, do so now.
/etc/rc.config.d/savecrash : SAVECRASH_DIR=/var/adm/crash (or preferred location)
If "/var or configure SAVECRASH_DIR=
with a file system that can save all of the physical memory installed on
the system. When done, run the savecrash command
again as follows: " occurs during the dump
save, dump uncompress or processing, you may need to free up space in
# /sbin/savecrash -rvf <TARGET_DIRECTORY>
1.2 Determine
if a recent crash.N (11.X) directory exists in the dump directory as follows:
# ll /var/adm/crash/c* (dump directory)
"N"
increments with each new dump.
1.3 If the
system dump is not at the expected path, try to save it using the following
command:
# savecrash -rvf <directory>
If this
results in "invalid dump header ",
a valid dump does not exist in the swap/dump device. (Swapping may have
occurred)
1.4 /etc/shutdownlog and /var/adm/crash/c*/INDEX
contain a useful crash "panic" statement. If shutdownlog does not exist, issue the following
command:
# touch /etc/shutdownlog
== STEP 2
===== CD TO THE DUMP DIRECTORY ====================================
2.1 cd to the dump directory (IMPORTANT!)
Example: # cd /var/adm/crash/crash.0
Example: # cd /var/adm/crash/crash.0
2.2 gunzip the kernel file if it is zipped:
# gunzip vmunix.gz
== STEP 3
===== USE THE LATEST TOOL TO READ THE DUMP =========================
3.1 Download
the latest version of crashinfo via FTP:
Once you get
an FTP prompt, type:
bin
get crashinfo.shar
quit
3.2 Unpack
and run crashinfo:
sh crashinfo.shar
./crashinfo.exe
./crashinfo . > crash.txt
== STEP 4
===== REVIEW AND SEND DATA =======================================
4.1 HPUX uses the acronyms HPMC and MCA to denote a hardware failure.
Was the crash due to a hardware failure?
Type:
# grep -e MCA -e HPMC crash.txt | grep -I event
Was the crash due to a hardware failure?
Type:
# grep -e MCA -e HPMC crash.txt | grep -I event
If any of the
following lines result from the grep, open a hardware repair case for the
system:
"crash event was an MCA" "crash event was an HPMC" "Crash Event 0 (HPMC, struct crash_event_table_struct..."
The
OnlineDiag software bundle captures HPMC and MCA details in the /var/tombstones/ts * files for HPMCs and /var/tombstones/mca* filenames for MCAs.
Check the 'dumptime ' in the index file:
# grep dumptime INDEX
If an HPMC or
MCA has occurred, locate the 'ts' or 'mca' file (usually ts99 or the latest
mca* file) created after the "dumptime". Email the file per
instructions below.
If an HPMC
did not occur, proceed to the next step.
4.2 Skip the
remainder of this step if the following lines are not found in crash.txt .
MC/ServiceGuard:
Unable to maintain contact with cmcld daemon.
Performing TOC to ensure data integrity.
Performing TOC to ensure data integrity.
If these
statements occur, determine the NODE_TIMEOUT value as follows:
For Serviceguard 11.18 and
older use: #
cmviewconf | grep node timeout
RETURNS: node_timeout=16000000
RETURNS: node_timeout=16000000
For Serviceguard 11.19 and
newer use#
cmviewcl -v -f line -s config | grep member_timeout
RETURNS: member_timeout=16000000
RETURNS: member_timeout=16000000
If the value
returned is 2 seconds, then this probably caused the crash.
When the
kernel is too busy to send a Serviceguard heartbeat packet to the other nodes
within the NODE_TIMEOUT period, the other
nodes reformed a cluster and 'orphaned' this node - causing a TOC/reboot.
Update the
cluster NODE_TIMEOUT to 8 seconds and stop
here.
If the NODE_TIMEOUT is not the problem then include the
following files when you send in email for the case:
output from
command "cmviewcl -v -f line -s config -v
"
(from all of the nodes in the cluster that did not crash.)
(from the node that crashed.)
output from
command: "
"
(Serviceguard 'flight recorder' logs) *
4.3 Generate
a list of installed patches as follows:
# /usr/sbin/swlist -l product > patchlist.txt
4.4 Zip and
Email the following files as requested:
crash.txt
patchlist.txt
/etc/shutdownlog
If an HPMC
was detected: /var/tombstones/ts99
If an MCA was detected: /var/tombstones/mca*
If dump was the result of a hang: /var/adm/syslog/OLDsyslog.log
If dump was a Serviceguard TOC: files listed in step 4.2
If an MCA was detected: /var/tombstones/mca*
If dump was the result of a hang: /var/adm/syslog/OLDsyslog.log
If dump was a Serviceguard TOC: files listed in step 4.2
EMAIL
REQUIREMENTS:
To: HPSupport_global@hp.com
Cc: hpcu@atl.hp.com
Subject:<CASE:YOUR_CASE-NUM> [Note there should be no spaces between your case ID and the ":" at the beginning and the ">" at the end.
To: HPSupport_global@hp.com
Cc: hpcu@atl.hp.com
Subject:<CASE:YOUR_CASE-NUM> [Note there should be no spaces between your case ID and the ":" at the beginning and the ">" at the end.
Example
Subject:
Subject:<CASE:3601123456>
EMAIL
RECOMMENDATIONS:
Unless
requested, DO NOT send this data to the engineer's personal email address.
Send files as attachments when possible.
Send fresh messages, not replies.
Mail size must be < 2MB. Anything greater will be denied.
After emailing the data, please notify HP that dump email has been sent for action (via callback or ITRC note).
Send files as attachments when possible.
Send fresh messages, not replies.
Mail size must be < 2MB. Anything greater will be denied.
After emailing the data, please notify HP that dump email has been sent for action (via callback or ITRC note).
If e-mail or
ftp is not available, create a tar tape/CD of the crash.n files (relative
pathing please) and send the media to:
You actually are a gem Sir.
ReplyDeleteRegards
Neha