UnixPedia : HPUX / LINUX / SOLARIS: 2018

Friday, December 28, 2018

How many mode of Bonding defination.



The value can be one of:
  • balance-rr or 0 — Sets a round-robin policy for fault tolerance and load balancing. Transmissions are received and sent out sequentially on each bonded slave interface beginning with the first one available.
  • active-backup or 1 — Sets an active-backup policy for fault tolerance. Transmissions are received and sent out through the first available bonded slave interface. Another bonded slave interface is only used if the active bonded slave interface fails.
  • balance-xor or 2 — Transmissions are based on the selected hash policy. The default is to derive a hash by XOR of the source and destination MAC addresses multiplied by the modulo of the number of slave interfaces. In this mode traffic destined for specific peers will always be sent over the same interface. As the destination is determined by the MAC addresses this method works best for traffic to peers on the same link or local network. If traffic has to pass through a single router then this mode of traffic balancing will be suboptimal.
  • broadcast or 3 — Sets a broadcast policy for fault tolerance. All transmissions are sent on all slave interfaces.
  • 802.3ad or 4 — Sets an IEEE 802.3ad dynamic link aggregation policy. Creates aggregation groups that share the same speed and duplex settings. Transmits and receives on all slaves in the active aggregator. Requires a switch that is 802.3ad compliant.
  • balance-tlb or 5 — Sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. The outgoing traffic is distributed according to the current load on each slave interface. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed slave. This mode is only suitable for local addresses known to the kernel bonding module and therefore cannot be used behind a bridge with virtual machines.
  • balance-alb or 6 — Sets an Adaptive Load Balancing (ALB) policy for fault tolerance and load balancing. Includes transmit and receive load balancing for IPv4traffic. Receive load balancing is achieved through ARP negotiation. This mode is only suitable for local addresses known to the kernel bonding module and therefore cannot be used behind a bridge with virtual machines.

Wednesday, November 28, 2018

Linux : How to restart vmtoolsd cleanly.

Few time we see that vmtools process can't be starting properly. We have to find if it is used by some process of not.

Restart the vmtoolsd after kill the process.

#-> cat /var/log/messages| grep -i tool
Oct 14 14:27:43 SAS_SYSTEM ansible-command: Invoked with warn=True executable=None _uses_shell=True _raw_params=#/etc/init.d/vmware-tools status;date;uptime removes=None creates=None chdir=None
Oct 14 17:42:02 SAS_SYSTEM init: vmware-tools post-stop process (26407) terminated with status 1
Oct 14 23:15:02 SAS_SYSTEM init: vmware-tools post-stop process (30772) terminated with status 1


[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/init.d/vmware-tools status
Checking vmware-tools...
vmware-tools    start/running

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> ps -ef |grep -i vmtoolsd
root      7519  6939  0 18:00 pts/1    00:00:00 grep -i vmtoolsd
[root@SAS_SYSTEM:/var/adm/install-logs]#


[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> ps -ef |grep -i vm
root       351     2  0 Jun01 ?        00:00:00 [vmw_pvscsi_wq_2]
root       916     2  0 Jun01 ?        00:04:00 [vmmemctl]
root      7642  6939  0 18:01 pts/1    00:00:00 grep -i vm

lsof |grep -i vmtoolsd

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> lsof |grep -i vmtoolsd
[root@SAS_SYSTEM:/var/adm/install-logs]#

Is this physical or SDDC server

SDDC

#/etc/vmware-tools/services.sh  status

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh status
vmtoolsd is not running
[root@SAS_SYSTEM:/var/adm/install-logs]#

#/etc/vmware-tools/services.sh  start

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh start
[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh status
vmtoolsd is not running
[root@SAS_SYSTEM:/var/adm/install-logs]#

#/etc/vmware-tools/services.sh  restart

#/etc/vmware-tools/services.sh  stop

#/etc/vmware-tools/services.sh  start

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh restart
Stopping VMware Tools services in the virtual machine:
   Guest operating system daemon:                          [  OK  ]
   VGAuthService:                                          [  OK  ]
   VMware User Agent (vmware-user):                        [  OK  ]
   Unmounting HGFS shares:                                 [  OK  ]
   Guest filesystem driver:                                [  OK  ]
   VM communication interface socket family:               [WARNING]
   VM communication interface:                             [WARNING]
[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh status
vmtoolsd is not running
[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh stop
Stopping VMware Tools services in the virtual machine:
   Guest operating system daemon:                          [  OK  ]
   VGAuthService:                                          [  OK  ]
   VMware User Agent (vmware-user):                        [  OK  ]
   Unmounting HGFS shares:                                 [  OK  ]
   Guest filesystem driver:                                [  OK  ]
   VM communication interface socket family:               [WARNING]
   VM communication interface:                             [WARNING]
[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh start
[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh status
vmtoolsd is not running


Check any process is holding the vmtools process.

[root@SAS_SYSTEM:/var/adm/install-logs]#

lsof |grep -i tool

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> lsof |grep -i tool
nbdisco    2784      root  mem       REG             253,10  7876150     110784 /usr/openv/lib/libvCloudTools.so
nbdisco    2784      root  mem       REG             253,10 11624190     110746 /usr/openv/lib/libnbuVmwareTools.so
sapcimb   47221      root  DEL       REG              253,7              240710 /usr/lib/vmware-tools/lib64/libvmGuestLib.so/libvmGuestLib.so
[root@SAS_SYSTEM:/var/adm/install-logs]#


Here 47221 is holding the vmtools process.

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> ps -ef |grep -i 47221
root      9544  6939  0 18:09 pts/1    00:00:00 grep -i 47221
root     47221  2962  0 Jun22 ?        00:00:00 /usr/sap/hostctrl/exe/sapcimb -format flat -tracelevel 1 -nonull -continue-on-error -metadata -enumi -namespace root/cimv2 -class SAP_MetricValue
[root@SAS_SYSTEM:/var/adm/install-logs]#

#kill -9 47221     

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> ps -ef |grep -i 47221
root      9739  6939  0 18:10 pts/1    00:00:00 grep -i 47221
[root@SAS_SYSTEM:/var/adm/install-logs]#

#/etc/vmware-tools/services.sh restart

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> #/etc/vmware-tools/services.sh status
vmtoolsd is running
[root@SAS_SYSTEM:/var/adm/install-logs]#

ps -ef |grep -i vmtoolsd

[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> ps -ef| grep -i tool
root     11333     1  0 18:11 ?        00:00:00 /usr/sbin/vmtoolsd
root     11580  6939  0 18:11 pts/1    00:00:00 grep -i tool
[root@SAS_SYSTEM:/var/adm/install-logs]#
#-> ps -ef| grep -i tool
root     11333     1  0 18:11 ?        00:00:00 /usr/sbin/vmtoolsd
root     11580  6939  0 18:11 pts/1    00:00:00 grep -i tool
[root@SAS_SYSTEM:/var/adm/install-logs]#

Tuesday, November 27, 2018

How to check and collect log for centrify issue.


This needs to be done which we are having the issue and right now I can see that the agent is using 40% of the CPU.

On the Centrify Unix server, as root or sudo, please run the following commands: 

1) Run: /usr/share/centrifydc/bin/addebug on
(Switch on debug log and watch for any errors) 

2) Run: /usr/share/centrifydc/bin/addebug clear
(Will clear any previous debug log /var/log/centrifydc.log) 

3) Make sure /var/log/centrifydc.log is growing in size. 

4) Reproduce the issue -- restart adclient, monitor until %cpu by adclient goes high, mark the %cpu for our reference, then go to step 5. 

5) Run: /usr/share/centrifydc/bin/addebug off
(Switch off debug log) 

6) Run: /usr/bin/adinfo -t
7) Reply back to the email thread to include support@centrify.com or attach logs online with the following files: 

Monday, November 26, 2018

How to check open file allowed for this user

maximum open file allowed for this user has reached to limit.

[root@Heiniken:/root]#
#-> ulimit -u sascmxep -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1450297
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1450297
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[root@Heiniken:/root]#
#-> lsof  -u sascmxep | wc -l
1602


[root@Heiniken:/root]#

Friday, November 23, 2018

HPUX : Printer Unix queue enablement and old job cleanup script

Unix : - HPUX
This script is applicable for HPUx print steup.

#cat >SS-Printer_Enabling_and_Defucnt_jobid_check.sh
#Running below code to find disable printer and enabling them.
#
SKDSB_Q=`lpstat -p | grep -i disable | awk '/printer/ {print $2}'`
for var in $SKDSB_Q
do
SKPRNT_STATE=`lpstat -p$var | grep -i disabled|wc -l`

#Removing the Job ID which do not have Data associated with it and
#Causing Printer to go in Hung state and impacting JOB flow

echo "# printer $var `date` : `lpstat -o$var |grep -i "???"`" >> /var/adm/lp/enable_printer_script.log
lpstat -o$var |grep -i "???"
if [ $? -eq 0 ]
then
echo "# printer $var `date` :`lpstat -o$var |head -10`" >> /var/adm/lp/enable_printer_script.log
DEFUNCTJOBID=`lpstat -o$var |grep -i $var |head -1 |awk '{print $1}'`
echo "# printer $var `date` :Canceling  $DEFUNCTJOBID from printer Queue $var" >> /var/adm/lp/enable_printer_script.log
cancel $DEFUNCTJOBID
fi

#Enabling the printer if queue is disable

if [ $SKPRNT_STATE -ne 0 ]
then
echo "# printer $var `date` : `lpstat -p$var | grep -i $var ` " >> /var/adm/lp/enable_printer_script.log
enable $var
echo "# printer $var `date` : `lpstat -p$var | grep -i $var `" >> /var/adm/lp/enable_printer_script.log
fi
done


find /var/spool/lp/request/  -xdev -type f -mtime +30 -exec ls -ltr {} \; >/tmp/queuelistforremoval.txt
cat /tmp/queuelistforremoval.txt |grep -vE "remotesending|sendingstatus|cancel" |awk '{print $9 }' |while read i
do
rm $i
done

exit

--------------

Add above script in crontab to schedule it for every 15 minutes.

Linux : Sending Mail in HTML formate

User is not able to send the html format content over the mail, it coming in distorted way. below is solution proposed to resolve it.

Add orange content is html format file.

#-> cat body_text.txt
From: abct@bcs.com
To:  abct@bcs.com
Subject: MIME Test
Mime-Version: 1.0
Content-Type: text/html
<html><body><font color="#1F497D"> Hi Team,</br></br>Count of HIVE tables against Teradata staging tables is not same for validity date '''2018-11-18'''.</br></br></font><table border="1"  ><tr><td align="center" bgcolor="#2F75B5" valign="top"><font color="white">Hive Table Name</font></td><td align="left" bgcolor="#2F75B5" valign="top"><font color="white">Hive Count</font></td><td align="left" bgcolor="#2F75B5" valign="top"><font color="white">Teradata Table Name</font></td><td align="left" bgcolor="#2F75B5" valign="top"><font color="white">Teradata Count</font></td><td align="left" bgcolor="#2F75B5" valign="top"><font color="white">Status</font></td></tr><tr><tr><td align="left"  valign="top">GSGO_M09_TD_F42199_H</td><td align="left"  valign="top">317067288</td><td align="left"  valign="top">M09_TD_F42199_H</td><td align="left"  valign="top">308660931</td><td align="center" bgcolor="#C00000" valign="top"><font color="white">FAILED</font></td></tr><tr><td align="left"  valign="top">GSGO_M29_TD_BIC_OHZDP_HC045_H</td><td align="left"  valign="top">19682893</td><td align="left"  valign="top">M29_TD_BIC_OHZDP_HC045_H</td><td align="left"  valign="top">19954073</td><td align="center" bgcolor="#C00000" valign="top"><font color="white">FAILED</font></td></tr><tr><td align="left"  valign="top">GSGO_M02_TD_F3102_H</td><td align="left"  valign="top">62696104</td><td align="left"


#cat body_text.txt |sendmail -t

Linux : Storage Migration plan on linux system

1. Verify the exiting Lun
# mutlipath -ll
2.Verify the LVM disk
# pvs;vgs;lvs
3. Verify the output of disk

4. Once the storage team assing the disk, please scan the disk and make sure it is available to OS
# hp_rescan -a
# df -h
# mutlipath -ll
fdisk -l -------------To check the new attached disk ( Suppose disk is /dev/disk/by-id/scsi-mpathc )
# pvs;vgs;lvs
# pvcreate /dev/disk/by-id/scsi-mpathc
# pvscan
# pvdisplay /dev/disk/by-id/scsi-mpathb /dev/disk/by-id/scsi-mpathc
# vgs
# vgextend vg01 /dev/disk/by-id/scsi-mpathc
# lvs -a -o +devices,size
# pvmove -background /dev/disk/by-id/scsi-mpathb /dev/disk/by-id/scsi-mpathc
# lvs -a -o +devices,size ------------ Verify the LV is now the part of /dev/disk/by-id/scsi-mpathc


 pvdisplay /dev/disk/by-id/scsi-mpathb /dev/disk/by-id/scsi-mpathc
   44  2018-11-22 17:16:41 vgreduce vg01 /dev/disk/by-id/scsi-mpathb
   45  2018-11-22 17:16:48 pvremove /dev/disk/by-id/scsi-mpathb

 ########################################################################################################################

 Remove the old lun from VG

 vgreduce vg01 /dev/disk/by-id/scsi-mpathb
 pvremove /dev/disk/by-id/scsi-mpathb

Remove the LUN from the system.

#dmsetup remove /dev/disk/by-id/scsi-mpathb

Thursday, November 22, 2018

Data corruption due to Fs active on multiple nodes.

On earth system after data copy , data are corrupted below are output seen while doing ll or du in directory

[root@earth:/root]#
#-> cd /sap_refresh02
[root@earth:/sap_refresh02]#
#-> ll
ls: cannot access RSE: No such device or address
ls: cannot access INCRBKP: No such device or address
total 24
?????????? ? ?      ?            ?            ? INCRBKP
drwxr-xr-x 2 root   root        96 Oct  8  2014 lost+found
drwxrwxrwx 2 oracle oinstall 24576 Nov 22 05:13 RMAN_RBE_RSE
?????????? ? ?      ?            ?            ? RSE
drwxrwxr-x 2 oracle oinstall    96 Nov 22 12:34 RSE_DB
drwxrwxr-x 2 oracle oinstall    96 Nov 22 12:34 RSE_DB_INCR
-rw------- 1 root   root         0 Nov 22 11:20 sifh
drwx------ 2 root   root        96 Nov 22 13:34 Test
[root@earth:/sap_refresh02]#
#-> ll RSE_DB
total 0
[root@earth:/sap_refresh02]#
#-> ll RSE_DB_INCR
total 0
[root@earth:/sap_refresh02]#
#-> ll RMAN_RBE_RSE
ls: cannot access RMAN_RBE_RSE/PSAPDAT_4.tf: No such device or address
ls: cannot access RMAN_RBE_RSE/PSAPDAT_11.tf: No such device or address
ls: cannot access RMAN_RBE_RSE/PSAPDAT_12.tf: No such device or address
ls: cannot access RMAN_RBE_RSE/PSAPDAT_13.tf: No such device or address
ls: cannot access RMAN_RBE_RSE/PSAPDAT_14.tf: No such device or address
ls: cannot access RMAN_RBE_RSE/PSAPDAT_18.tf: No such device or address
ls: cannot access RMAN_RBE_RSE/PSAPDAT_19.tf: No such device or address


Reason :
   Same set of lun are active on other nodes and in mounted state, active IO on those causing the corruption of LV.
 
Solution :
   deactivate the volume and remove LV from other nodes.

System is rebooting abnormally due to block size issue on one of FS.



System is rebooting abnormally due to block size issue on one of FS.


 1 KiB blocksize is used for a file system and it was expressed in a previous case, 01960120 , that this should not be used to done.

------------------------------------------------------------8> - possible mitigation activities
o As the issue may be a function of the file system block size, refrain from using a file system block size of 1KiB.
------------------------------------------------------------8 sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.17.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/664521057/crash/vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Sat Nov 10 16:17:01 2018
UPTIME: 11 days, 10:24:29
LOAD AVERAGE: 8.78, 9.25, 8.80
TASKS: 1093
NODENAME: ITSUSRALSP05403
RELEASE: 3.10.0-693.17.1.el7.x86_64
VERSION: #1 SMP Sun Jan 14 10:36:03 EST 2018
MACHINE: x86_64 (2397 Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at fs/jbd2/journal.c:766!

crash> mod -t
NAME TAINTS
redirfs OE
gsch OE

o Existing file system errors
crash> log | grep -i ext | grep -v gsch
[778861.340894] EXT4-fs (dm-12): error count since last fsck: 115
[778861.340898] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[778861.340901] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[865368.129522] EXT4-fs (dm-12): error count since last fsck: 115
[865368.129526] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[865368.129528] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[951874.916775] EXT4-fs (dm-12): error count since last fsck: 115
[951874.916780] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[951874.916782] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[987861.755674] RIP: 0010:[] [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
[987861.758359] RIP [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]

o Several messages relating to the third-party kernel module and ext
crash> log | grep -i ext | grep gsch_flt | awk '{for (i=2;i<=NF;i++){printf "%s ",$i ; if (i==NF) print ""}}' | sort | uniq -c | sort -rn
243 gsch_flt_add_mnt(/var/tmp @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/ @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/tmp @ Unknown[ef53(ext3)]) done: 0
121 gsch_flt_add_mnt(/boot @ Unknown[ef53(ext3)]) done: 0

o Processes just started and were in an uninterruptible state.
crash> ps -m | grep UN
[ 0 00:00:00.000] [UN] PID: 4073 TASK: ffff880431bdcf10 CPU: 14 COMMAND: "oracle_4073_mra"
[ 0 00:00:00.000] [UN] PID: 29624 TASK: ffff8804ee2c0000 CPU: 12 COMMAND: "ora_j000_mraq04"
[ 0 00:00:00.005] [UN] PID: 3805 TASK: ffff8806cd771fa0 CPU: 1 COMMAND: "oracle_3805_mra"
[ 0 00:00:00.017] [UN] PID: 43209 TASK: ffff8807d3e09fa0 CPU: 6 COMMAND: "oracle_43209_mr"

o Crashing process
crash> bt
PID: 2296 TASK: ffff88115d290fd0 CPU: 4 COMMAND: "jbd2/dm-12-8"
#0 [ffff88115cbcf930] machine_kexec at ffffffff8105c63b
#1 [ffff88115cbcf990] __crash_kexec at ffffffff81106922
#2 [ffff88115cbcfa60] crash_kexec at ffffffff81106a10
#3 [ffff88115cbcfa78] oops_end at ffffffff816b0aa8
#4 [ffff88115cbcfaa0] die at ffffffff8102e87b
#5 [ffff88115cbcfad0] do_trap at ffffffff816b01f0
#6 [ffff88115cbcfb20] do_invalid_op at ffffffff8102b174
#7 [ffff88115cbcfbd0] invalid_op at ffffffff816bd1ae
[exception RIP: jbd2_journal_next_log_block+121]
RIP: ffffffffc014ad99 RSP: ffff88115cbcfc88 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88115b417800 RCX: 0000000000000008
RDX: 0000000000038818 RSI: ffff88115cbcfd38 RDI: ffff88115b41782c
RBP: ffff88115cbcfca0 R8: ffff8804464fbbc8 R9: 0000000000000000
R10: 0000000000000001 R11: 0000040000000400 R12: ffff88115b417828
R13: ffff88115cbcfd38 R14: ffff88115b417800 R15: 000000000000000b
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88115cbcfc80] jbd2_journal_next_log_block at ffffffffc014ad40 [jbd2]
#9 [ffff88115cbcfca8] jbd2_journal_commit_transaction at ffffffffc01437c8 [jbd2]
#10 [ffff88115cbcfe48] kjournald2 at ffffffffc0149a79 [jbd2]
#11 [ffff88115cbcfec8] kthread at ffffffff810b270f
#12 [ffff88115cbcff50] ret_from_fork at ffffffff816b8798

crash> mount | awk 'NR == 1 || $0 ~ "vg_oraarch-lv_oraarch"'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff881159887780 ffff88115b714000 ext3 /dev/mapper/vg_oraarch-lv_oraarch /u02/oraarch

o 1 KiB blocksize again.
crash> super_block.s_blocksize ffff88115b714000
s_blocksize = 1024

Is there a reason why the 1 KiB blocksize is still being used?

### Next Steps

o State why the 1 KiB block size is being used when it was expressed previously to avoid such a small blocksize.
~~~~~

Resolution : 
  1. increase the block size to 4K recommended.


Linux : How to chagne the block size of the logical volume

1) Check the block size of current device.
$tune2fs -l /dev/vg_oraarch/lv_oraarch |grep -i "Block size"
2) Unmount filesystem to change block size.
$ umount /u02/oraarch
3) Create filesystem to change new block size.
$ mkfs -t ext3 -b 4096 /dev/vg_oraarch/lv_oraarch 
4) Mount to check the changed block size.
$mount  /u02/oraarch
$tune2fs -l /dev/vg_oraarch/lv_oraarch |grep -i "Block size"

Latency issue in Bond0

Check the bond configuration on the linux server:

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: transmit load balancing
Primary Slave: None
Currently Active Slave: eth10
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth9
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 6c:5b:e5:XX:1a:64
Slave queue ID: 0

Slave Interface: eth10
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 6c:Xb:X5:aa:1c:61
Slave queue ID: 0

#check bonding configuration:
#-> cat /etc/sysconfig/network/ifcfg-bond0
DEVICE='bond0'
BOOTPROTO=static
BROADCAST=
IPADDR=XX.XX.AA.VV/23
NETWORK=
STARTMODE=auto
USERCONTROL=no
#LLADDR=
#ETHTOOL_OPTIONS=
BONDING_MASTER=yes
BONDING_MODULE_OPTS='miimon=100 mode=5'
BONDING_SLAVE0='eth9'
BONDING_SLAVE1='eth10'

#check for packet dropped on NIC
 for x in $(seq 1 20); do ip -s link show dev eth9 | grep -A1 'RX.*dropped'; sleep 2; done
 for x in $(seq 1 20); do ip -s link show dev eth10 | grep -A1 'RX.*dropped'; sleep 2; done

#->  for x in $(seq 1 20); do ip -s link show dev eth9 | grep -A1 'RX.*dropped'; sleep 2; done
    RX: bytes  packets  errors  dropped overrun mcast
    1662600144 24960287 150     67145216 0       0
    RX: bytes  packets  errors  dropped overrun mcast
    1662601168 24960302 150     67145223 0       0
    RX: bytes  packets  errors  dropped overrun mcast
    1662601584 24960308 150     67145233 0       0
    RX: bytes  packets  errors  dropped overrun mcast
    1662602246 24960318 150     67145244 0       0
    RX: bytes  packets  errors  dropped overrun mcast
    1662603078 24960331 150     67145262 0       0

Resolution :
  1. Reset the card with ifenslave
  2. Reseat the blade into enclosure
  3. Bond0 can be breaked to monitor the packat drops.

Saturday, November 10, 2018

What is the maximum number of groups (GIDs) a user can belong to when using NFS with AUTH_UNIX / AUTH_SYS on RHEL


What is the maximum number of groups (GIDs) a user can belong to when using NFS with AUTH_UNIX / AUTH_SYS on RHEL


Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • NFS

Issue

  • GIDs of users in more than 16 groups are not recognized properly on NFS.
  • I can not change ownership of subdirectory of nfs filesystem, following is the error messages:
Raw
$ chown USER:GROUP /nfs-mount-point/file
chown: changing ownership of `/nfs-mount-point/file': Operation not permitted
  • User getting "Permission Denied" error while creating a file on NFS share.
Raw
# su - testuser1
$ touch example
touch: cannot touch `example`: Permission denied 
  • What limits are in place for group settings on NFS ?

Resolution

  • If the NFS environment requires a user to belong to more than 16 groups then use RPCSEC_GSS (e.g. with Kerberos) instead of AUTH_UNIX / AUTH_SYS. How to configure NFSv4 with kerberos authentication?
  • On RHEL 6 and newer the NFS-server can be instructed to discard the groups given by the NFS-client. The --manage-gids option for rpc.mountd (see man rpc.mountd) needs to be set on the NFS-server in /etc/sysconfig/nfs. The flag tells the server to ignore the 16 groups sent by the client and resolve group membership locally (NFS-server side). This effectively bypasses the limit imposed by the RPC data structure and requires that the NFS-server see either the same or a superset of the groups available to the NFS-client. Note that AUTH_SYSwith --manage-gids is less secure than switching to RPCSEC_GSS. If RPCSEC_GSS is an option for your environment, it is a better solution.
  • NOTE: If you intend to do file locking over NFS, there maybe a limitation in NFSv3's file locking protocol (NLM) where it is unable to use RPCSEC_GSS. In that situation one should use NFSv4 with RPCSEC_GSS (e.g. with Kerberos).

Root Cause

  • NFS uses the RPC protocol to authenticate users.
  • The RPC protocol's AUTH_UNIX / AUTH_SYS Credentials structure limits the number of groups to 16, as specified in RFC 5531.

         struct authsys_parms {
            unsigned int stamp;
            string machinename<255>;
            unsigned int uid;
            unsigned int gid;
            unsigned int gids<16>;
         }
    
  • NFS uses the AUTH_SYS protocol by default.

Diagnostic Steps

  1. Capture a tcpdump.
  2. Inspect the the RPC request (e.g. SETATTR Call).

    Credentials
        Flavor: AUTH_UNIX (1)
        Length: 92
        Stamp: STAMP
        Machine Name: NAME
            length: 7
            contents: NAME
            fill bytes: opaque data
        UID: 901
        GID: 901
        Auxiliary GIDs (16) [901, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965]
            GID: 901
            GID: 951
            GID: 952
            GID: 953
            GID: 954
            GID: 955
            GID: 956
            GID: 957
            GID: 958
            GID: 959
            GID: 960
            GID: 961
            GID: 962
            GID: 963
            GID: 964
            GID: 965
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
    
  3. For the SETATTR example, note that the GID the file is being set to, 982, does not appear in the credentials above.

    Network File System, SETATTR Call FH:0x2713c62b
    [Program Version: 3]
    [V3 Procedure: SETATTR (2)]
    object
        length: 32
        [hash (CRC-32): 0x2713c62b]
        decode type as: unknown
        filehandle: 000000000000fe1b0009bdae6802ffffffffffffffffffff...
    new_attributes
        mode: no value
            set_it: no value (0)
        uid: value follows
            set_it: value follows (1)
            uid: 901
        gid: value follows
            set_it: value follows (1)
            gid: 982
        size: no value
            set_it: no value (0)
        atime: don't change
            set_it: don't change (0)
        mtime: don't change
            set_it: don't change (0)
    guard: no value
        check: no value (0)
    
Based on the second resolution provided i.e. to discard the groups given by the NFS client.
Here are the steps to reproduce it.
  • Created a user testuser1 with same uid and gid on NFS server and client.
  • Created 20 groups with same gid on NFS server and client
  • Made user testuser1 members of these 20 groups
  • Created 20 directories and gave them respective ownerships
On NFS server :

[root@server-test test1]# id -a testuser1
uid=20362(testuser1) gid=20362(testuser1) groups=20362(testuser1),20363(group1),20364(group2),20365(group3),20366(group4),20367(group5),20368(group6),20369(group7),20370(group8),20371(group9),20372(group10),20373(group11),20374(group12),20375(group13),20376(group14),20377(group15),20378(group16),20379(group17),20380(group18),20381(group19),20382(group20)
On client :

server-test:/test1 /testnfs nfs rw,relatime,vers=3,rsize=16384,wsize=16384,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.65.210.132,mountvers=3,mountport=43813,mountproto=udp,local_lock=none,addr=10.65.210.132 0 0

[root@client-test testnfs]# usermod -G group1,group2,group3,group4,group5,group6,group7,group8,group9,group10,group11,group12,group13,group14,group15,group16,group17,group18,group19,group20 testuser1

[root@client-test testnfs]# id -a testuser1
uid=20362(testuser1) gid=20362(testuser1) groups=20362(testuser1),20363(group1),20364(group2),20365(group3),20366(group4),20367(group5),20368(group6),20369(group7),20370(group8),20371(group9),20372(group10),20373(group11),20374(group12),20375(group13),20376(group14),20377(group15),20378(group16),20379(group17),20380(group18),20381(group19),20382(group20)
  • Setup on server and client is as follows:

drwxrwxr-x. 2 testuser1 group1  4.0K Aug 19 15:15 1
drwxr-xr-x. 2 testuser1 group2  4.0K Aug 19 15:15 2
drwxr-xr-x. 2 testuser1 group3  4.0K Aug 19 15:15 3
drwxr-xr-x. 2 testuser1 group4  4.0K Aug 19 15:15 4
drwxr-xr-x. 2 testuser1 group5  4.0K Aug 19 15:15 5
drwxr-xr-x. 2 testuser1 group6  4.0K Aug 19 15:15 6
drwxr-xr-x. 2 testuser1 group7  4.0K Aug 19 15:15 7
drwxr-xr-x. 2 testuser1 group8  4.0K Aug 19 15:16 8
drwxr-xr-x. 2 testuser1 group9  4.0K Aug 19 15:16 9
drwxr-xr-x. 2 testuser1 group10 4.0K Aug 19 15:16 10
drwxr-xr-x. 2 testuser1 group11 4.0K Aug 19 15:16 11
drwxr-xr-x. 2 testuser1 group12 4.0K Aug 19 15:16 12
drwxr-xr-x. 2 testuser1 group13 4.0K Aug 19 15:16 13
drwxr-xr-x. 2 testuser1 group14 4.0K Aug 19 15:16 14
drwxr-xr-x. 2 testuser1 group15 4.0K Aug 19 15:16 15
drwxr-xr-x. 2 testuser1 group16 4.0K Aug 19 15:16 16
drwxr-xr-x. 2 testuser1 group17 4.0K Aug 19 15:16 17
drwxr-xr-x. 2 testuser1 group18 4.0K Aug 19 15:16 18
drwxr-xr-x. 2 testuser1 group19 4.0K Aug 19 15:16 19
drwxr-xr-x. 2 testuser1 group20 4.0K Aug 19 15:16 20
  • Tried changing ownership of one of the directory.

[root@client-test /]# su - testuser1
$ cd /testnfs
testnfs]$ chown testsuer1:group20 1
chown: changing ownership of `1': Operation not permitted <--------------Error reported

  • Whereas locally on the NFS server, this operation works without any problem :

[root@server-test test1]# su - testuser1                      [  OK  ]
[testuser1@server-test ~]$ cd /test1
[testuser1@server-test test1]$ chown testuser1:group20 1
[testuser1@server-test test1]$ ls -ld 1
drwxrwxr-x. 2 testuser1 group20 4096 Aug 19 15:15 1
  • Tcpdump captured during the same time also shows "Permission Error".
    Inspect the the RPC request (e.g. SETATTR Call). For the SETATTR example, note that the GID the dir is being set to, 20382, does not appear in the credentials below.

Credentials
Flavor: AUTH_UNIX (1)
Length: 100
Stamp: 0x004f0cff
Machine Name: client-test
length: 14
contents: client-test
fill bytes: opaque data
UID: 20362
GID: 20362
Auxiliary GIDs (16) [20362, 20363, 20364, 20365, 20366, 20367, 20368, 20369, 20370, 20371, 20372, 20373, 20374, 20375, 20376, 20377]
GID: 20362
GID: 20363
GID: 20364
GID: 20365
GID: 20366
GID: 20367
GID: 20368
GID: 20369
GID: 20370
GID: 20371
GID: 20372
GID: 20373
GID: 20374
GID: 20375
GID: 20376
GID: 20377
Verifier
Flavor: AUTH_NULL (0)
Length: 0

Network File System, SETATTR Call FH:0x1622cb12
[Program Version: 3]
[V3 Procedure: SETATTR (2)]
object
length: 28
[hash (CRC-32): 0x1622cb12]
decode type as: unknown
filehandle: 01000601c8166eb71d244e24a1af18679aeb921c02000200...
new_attributes
mode: no value
set_it: no value (0)
uid: value follows
set_it: value follows (1)
uid: 20362
gid: value follows
set_it: value follows (1)
gid: 20382
size: no value
set_it: no value (0)
atime: don't change
set_it: don't change (0)
mtime: don't change
set_it: don't change (0)
guard: no value
check: no value (0)
  • Then on NFS server, I made following entry in /etc/sysconfig/nfs and restarted NFS service:

RPCMOUNTDOPTS="--manage-gids"
  • Unmount NFS share from client and re-mounted again.
  • Lastly, I tried to change the ownership once again as follows and this time it worked without any problems :

# mount -t nfs -o vers=3 10.65.210.132:/test1 /testnfs
# su - testuser1
$ cd /testnfs
$ chown testuser1:group18 1
$ ls -ld 1
drwxrwxr-x. 2 testsuer1 group18 4096 Aug 19 15:15 1
Thus we can conclude, setting rpc.mountd --manage-gids solves NFS limitation of 16 groups.

Monday, May 28, 2018

HPUX : Device busy while vg deactivation

-- Target Instance: p sasd_tgt_instance_t 0xe0000001cacf2840 --
target state             = sasds_tgt_ready
current open count       = 1                                             <<<<<<<<
it_nxs_abt_cap           = TGT_IT_NXS_ABT_UNKNOWN
tgt_info:
tgt_hdl                  = 0x13
iport_hdl                = 0x0
tgt_sasaddr              = 0x5000c5003bfef6c9                   <<<<<<<<< this is c4t4d0
tgt_health               = SAS_HEALTH_ONLINE
iport_sasaddr            = 0x500605b002a6aab4
tgt_type                 = SAS_TGT_TYPE_SCSI
tgt_proto_cap            = SAS_TGT_PROTO_SSP_CAPABLE
tgt_topology             = SAS_TGT_TOPO_EXPANDER
slot                     = 11
enc_id                   = 0x2
tgt_enc type             = SAS_TGT_ENC_TYPE_EXT_SES2
-- Target statistics --
        tgt_open_cnt             = 31697
        tgt_close_cnt            = 31696
        tgt_scsi_layer_ios       = 143145362
        tgt_scsi_layer_io_success= 143132736
        tgt_scsi_layer_io_fails  = 12754
-- Target Instance: p sasd_tgt_instance_t 0xe0000001cad3f080 --
target state             = sasds_tgt_ready  
current open count       = 1                                             <<<<<<<<<
it_nxs_abt_cap           = TGT_IT_NXS_ABT_UNKNOWN
tgt_info:
tgt_hdl                  = 0x14
iport_hdl                = 0x0
tgt_sasaddr              = 0x5000c5003c0ab4b5                 <<<<<<<< this is c4t5d0
tgt_health               = SAS_HEALTH_ONLINE
iport_sasaddr            = 0x500605b002a6aab4
tgt_type                 = SAS_TGT_TYPE_SCSI
tgt_proto_cap            = SAS_TGT_PROTO_SSP_CAPABLE
tgt_topology             = SAS_TGT_TOPO_EXPANDER
slot                     = 12
enc_id                   = 0x2
tgt_enc type             = SAS_TGT_ENC_TYPE_EXT_SES2
-- Target statistics --
        tgt_open_cnt             = 31630
        tgt_close_cnt            = 31629
        tgt_scsi_layer_ios       = 3174756
        tgt_scsi_layer_io_success= 3162137
        tgt_scsi_layer_io_fails  = 12747
-- Target Instance: p sasd_tgt_instance_t 0xe0000001cad66040 --
target state             = sasds_tgt_ready
current open count       = 1                                             <<<<<<<<<<<
it_nxs_abt_cap           = TGT_IT_NXS_ABT_UNKNOWN
tgt_info:
tgt_hdl                  = 0x13
iport_hdl                = 0x0
tgt_sasaddr              = 0x5000c5003c099ddd                 <<<<<<<<<< this is c5t4d0
tgt_health               = SAS_HEALTH_ONLINE
iport_sasaddr            = 0x500605b002a697c4
tgt_type                 = SAS_TGT_TYPE_SCSI
tgt_proto_cap            = SAS_TGT_PROTO_SSP_CAPABLE
tgt_topology             = SAS_TGT_TOPO_EXPANDER
slot                     = 11
enc_id                   = 0x2
tgt_enc type             = SAS_TGT_ENC_TYPE_EXT_SES2
-- Target statistics --
        tgt_open_cnt             = 31692
        tgt_close_cnt            = 31691
        tgt_scsi_layer_ios       = 99698532
        tgt_scsi_layer_io_success= 99685901
        tgt_scsi_layer_io_fails  = 12758
-- Target Instance: p sasd_tgt_instance_t 0xe0000001cad68040 --
target state             = sasds_tgt_ready
current open count       = 1                                             <<<<<<<<<
it_nxs_abt_cap           = TGT_IT_NXS_ABT_UNKNOWN
tgt_info:
tgt_hdl                  = 0x14
iport_hdl                = 0x0
tgt_sasaddr              = 0x5000c5003c0af631                  <<<<<<<< this is c5t5d0
tgt_health               = SAS_HEALTH_ONLINE
iport_sasaddr            = 0x500605b002a697c4
tgt_type                 = SAS_TGT_TYPE_SCSI
tgt_proto_cap            = SAS_TGT_PROTO_SSP_CAPABLE
tgt_topology             = SAS_TGT_TOPO_EXPANDER
slot                     = 12
enc_id                   = 0x2
tgt_enc type             = SAS_TGT_ENC_TYPE_EXT_SES2
-- Target statistics --
        tgt_open_cnt             = 31621
        tgt_close_cnt            = 31620
        tgt_scsi_layer_ios       = 3173364
        tgt_scsi_layer_io_success= 3160744
        tgt_scsi_layer_io_fails  = 12747
From ioscan:
target      10  0/3/0/0/0/0.0.0.4            tgt          CLAIMED     DEVICE
disk         6  0/3/0/0/0/0.0.0.4.0          sdisk        CLAIMED     DEVICE       HP      EG0300FAWHV
                              /dev/dsk/c4t4d0   /dev/rdsk/c4t4d0
        Acpi(HPQ0002,PNP0A08,300)/Pci(0|0)/Pci(0|0)/Sas(Addr5000C5003BFEF6C9, Lun0)
target      11  0/3/0/0/0/0.0.0.5            tgt          CLAIMED     DEVICE
disk         7  0/3/0/0/0/0.0.0.5.0          sdisk        CLAIMED     DEVICE       HP      EG0300FAWHV
                              /dev/dsk/c4t5d0   /dev/rdsk/c4t5d0
        Acpi(HPQ0002,PNP0A08,300)/Pci(0|0)/Pci(0|0)/Sas(Addr5000C5003C0AB4B5, Lun0)
target      17  0/6/0/0/0/0/2/0/0/0.0.0.4    tgt          CLAIMED     DEVICE
disk        12  0/6/0/0/0/0/2/0/0/0.0.0.4.0  sdisk        CLAIMED     DEVICE       HP      EG0300FAWHV
                              /dev/dsk/c5t4d0   /dev/rdsk/c5t4d0
        Acpi(HPQ0002,PNP0A08,600)/Pci(0|0)/Pci(0|0)/Pci(2|0)/Pci(0|0)/Sas(Addr5000C5003C099DDD, Lun0)
target      18  0/6/0/0/0/0/2/0/0/0.0.0.5    tgt          CLAIMED     DEVICE
disk        13  0/6/0/0/0/0/2/0/0/0.0.0.5.0  sdisk        CLAIMED     DEVICE       HP      EG0300FAWHV
                              /dev/dsk/c5t5d0   /dev/rdsk/c5t5d0
        Acpi(HPQ0002,PNP0A08,600)/Pci(0|0)/Pci(0|0)/Pci(2|0)/Pci(0|0)/Sas(Addr5000C5003C0AF631, Lun0)
### strings /etc/lvmtab ###
/dev/vg00
/dev/dsk/c3t0d0s2
/dev/dsk/c3t0d1s2
/dev/vg01
/dev/dsk/c4t4d0
/dev/dsk/c5t4d0
/dev/vg04
/dev/dsk/c4t5d0
/dev/dsk/c5t5d0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
vg01 is active and the file systems on its lvols are mounted on jupiterA. Hence, it makes sense the LUNs will have their "current open count" as "1"
Interestingly, luns that are part of vg04 also have their open count 1 which means they are in use.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
/4707390189-473/sysinfo_jupiterA# grep pvchange syslog.log
Aug  5 22:22:01 jupiterA LVM[29891]: pvchange -a y /dev/dsk/c4t5d0
Aug  5 22:22:16 jupiterA LVM[29920]: pvchange -a y /dev/dsk/c5t5d0
Aug  5 22:24:38 jupiterA LVM[1693]: pvchange -a n /dev/dsk/c4t5d0
Aug  5 22:25:48 jupiterA LVM[2042]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:07:34 jupiterA LVM[7052]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:07:47 jupiterA LVM[7070]: pvchange -a y /dev/dsk/c5t5d0
Aug  6 00:37:04 jupiterA LVM[16730]: pvchange -a N /dev/dsk/c4t5d0
Aug  6 00:37:14 jupiterA LVM[16785]: pvchange -a N /dev/dsk/c5t5d0
Aug  6 00:44:28 jupiterA LVM[12621]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:44:36 jupiterA LVM[13366]: pvchange -a y /dev/dsk/c5t5d0
/4707390189-473/sysinfo_jupiterA# grep vgchange syslog.log
Aug  5 22:42:15 jupiterA LVM[6987]: vgchange -a r /dev/vg04
Aug  6 00:09:44 jupiterA LVM[7633]: vgchange -a r /dev/vg04
Aug  6 00:44:54 jupiterA LVM[14807]: vgchange -a r /dev/vg04
/4707390189-473/sysinfo_jupiterA# grep pvchange syslog.log
Aug  5 22:22:01 jupiterA LVM[29891]: pvchange -a y /dev/dsk/c4t5d0
Aug  5 22:22:16 jupiterA LVM[29920]: pvchange -a y /dev/dsk/c5t5d0
Aug  5 22:24:38 jupiterA LVM[1693]: pvchange -a n /dev/dsk/c4t5d0
Aug  5 22:25:48 jupiterA LVM[2042]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:07:34 jupiterA LVM[7052]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:07:47 jupiterA LVM[7070]: pvchange -a y /dev/dsk/c5t5d0
Aug  6 00:37:04 jupiterA LVM[16730]: pvchange -a N /dev/dsk/c4t5d0
Aug  6 00:37:14 jupiterA LVM[16785]: pvchange -a N /dev/dsk/c5t5d0
Aug  6 00:44:28 jupiterA LVM[12621]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:44:36 jupiterA LVM[13366]: pvchange -a y /dev/dsk/c5t5d0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I've noticed the following I/O related entries which are all logged for vg 0x040000 which is nothing but
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
crw-r-----   1 root       sys         64 0x000000 Sep 28  2011 /dev/vg00/group
crw-r--r--   1 root       sys         64 0x010000 Sep 30  2011 /dev/vg01/group
crw-r--r--   1 root       sys         64 0x040000 Sep 30  2011 /dev/vg04/group
Aug  6 00:37:04 jupiterA vmunix: LVM: VG 64 0x040000: Flushing the deferred attach list.
Aug  6 00:37:04 jupiterA vmunix: LVM: VG 64 0x040000: PVLink 31 0x045000 Detached.
Aug  6 00:37:04 jupiterA LVM[16730]: pvchange -a N /dev/dsk/c4t5d0
Aug  6 00:37:14 jupiterA vmunix: LVM: VG 64 0x040000: PVLink 31 0x055000 Detached.
Aug  6 00:37:14 jupiterA LVM[16785]: pvchange -a N /dev/dsk/c5t5d0
Aug  6 00:37:14 jupiterA vmunix: LVM: NOTICE: VG 64 0x040000: LV 1: All I/O requests to this LV that were
Aug  6 00:37:14 jupiterA vmunix: LVM: VG 64 0x040000: Flushing the deferred attach list.
Aug  6 00:37:14 jupiterA vmunix:        waiting indefinitely for an unavailable PV have now completed.
Aug  6 00:44:28 jupiterA LVM[12621]: pvchange -a y /dev/dsk/c4t5d0
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 4: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix:        indefinitely for an unavailable PV. These requests will be queued until
Aug  6 00:44:28 jupiterA vmunix:        the PV becomes available (or a timeout is specified for the LV).
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 7: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 9: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 8: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 10: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 14: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 16: Some I/O requests to this LV are waiting
Aug  6 00:44:29 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 5: Some I/O requests to this LV are waiting
Aug  6 00:44:29 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 13: Some I/O requests to this LV are waiting
Aug  6 00:44:29 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 1: Some I/O requests to this LV are waiting
Aug  6 00:44:36 jupiterA LVM[13366]: pvchange -a y /dev/dsk/c5t5d0
Aug  6 00:44:54 jupiterA LVM[14807]: vgchange -a r /dev/vg04
Aug  6 00:44:29 jupiterA vmunix:        indefinitely for an unavailable PV. These requests will be queued until
Aug  6 00:45:04 jupiterA  above message repeats 9 times
/4707390189-473/sysinfo_jupiterA# grep "LVM: NOTICE" syslog.log
Aug  6 00:37:14 jupiterA vmunix: LVM: NOTICE: VG 64 0x040000: LV 1: All I/O requests to this LV that were
hpuxftp@HPUXFTP_b8u8:/home/hpuxftp/crashdump/4707390189-473/sysinfo_jupiterA# grep "LVM: WARNING" syslog.log
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 4: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 7: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 9: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 8: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 10: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 14: Some I/O requests to this LV are waiting
Aug  6 00:44:28 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 16: Some I/O requests to this LV are waiting
Aug  6 00:44:29 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 5: Some I/O requests to this LV are waiting
Aug  6 00:44:29 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 13: Some I/O requests to this LV are waiting
Aug  6 00:44:29 jupiterA vmunix: LVM: WARNING: VG 64 0x040000: LV 1: Some I/O requests to this LV are waiting
Conclusion:
1. Although, vg04 was activated read-only we see that there are I/O requests to quite a number of its LVs.
2. You would not be able to deactivate this vg till they all either complete or timeout. If there are any application that you know may use data on this vg, you may shutdown the application and try to deactivate. Else, the last option is to reboot the node (please take necessary measures since it is a cluster node.)
3. The LVM subsystem thinks that the PVs are unavailable. Bearing in mind that the I/O requests are waiting for an unavailable PV, the possibilities are PVs are too busy or there is some delay at the connectivity-level(between this node and to the PVs)
4. Since, there are two PVs on two different HW paths, the point of failure must be common to both. Could you send me the diagram of this set-up (connectivity)?