When using Storage Foundation Cluster Volume Manager
(SFRAC/SFCFS) with shared disk groups of version 120 or higher, disk groups
contain an attribute called dgfailpolicy. This attribute determines how the
node should react if it loses access to disk in the corresponding disk group.
If shared disk groups are set to the default dgfailpolicy of dgdisable a
cluster wide panic could ensure and/or the database can halt clusterwide.,
should the Cluster Volume Manager (CVM) master lose connectivity to storage. To
avoid this behavior dgfailpolicy should be set to leave for shared diskgroups.
From your node
#-> uname -a
HP-UX mickey B.11.23 U ia64 2937989941 unlimited-user license
[root@mickey:/.root]#
#->
[root@mickey:/.root]#
#-> vxdg list
NAME
STATE ID
localdg02
enabled,cds
1152277302.85.mickey
csrcppdg01 enabled,shared,cds
1152539575.189.donald
csrlindg01 enabled,shared,cds 1232465597.394.donald
racdg01
enabled,shared,cds 1152739766.65.mickey
sptrpdg01 enabled,shared,cds
1152290440.161.donald
tempdg
enabled,shared,cds 1193249181.293.donald
totalpdg01 enabled,shared,cds
1177157791.111.mickey
[root@mickey:/.root]#
#-> vxdg list csrcppdg01
Group: csrcppdg01
dgid: 1152539575.189.donald
import-id: 33792.478
flags: shared cds
version: 120
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: donald=sw mickey=sw
ssb:
on
detach-policy: global
dg-fail-policy: dgdisable ß---Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.1774 permlen=0 free=0
templen=0 loglen=0
[root@mickey:/.root]#
#-> vxdg list csrlindg01
Group: csrlindg01
dgid: 1232465597.394.donald
import-id: 33792.480
flags: shared cds
version: 120
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: donald=sw mickey=sw
ssb:
on
detach-policy: global
dg-fail-policy: dgdisable ß---Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.5260 permlen=0 free=0
templen=0 loglen=0
[root@mickey:/.root]#
#-> vxdg list racdg01
Group: racdg01
dgid: 1152739766.65.mickey
import-id: 33792.484
flags: shared cds
version: 120
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: donald=sw mickey=sw
ssb:
on
detach-policy: global
dg-fail-policy: dgdisable ß---Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.1201 permlen=0 free=0
templen=0 loglen=0
[root@mickey:/.root]#
#-> vxdg list sptrpdg01
Group: sptrpdg01
dgid: 1152290440.161.donald
import-id: 33792.474
flags: shared cds
version: 120
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: donald=sw mickey=sw
ssb:
on
detach-policy: global
dg-fail-policy: dgdisable ß---Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.22839 permlen=0 free=0
templen=0 loglen=0
[root@mickey:/.root]#
#-> vxdg list tempdg
Group: tempdg
dgid: 1193249181.293.donald
import-id: 33792.482
flags: shared cds
version: 120
alignment: 8192 (bytes)
local-activation: off
cluster-actv-modes: donald=sw mickey=off
ssb:
on
detach-policy: global
dg-fail-policy: dgdisable ß---Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.1153 permlen=0 free=0
templen=0 loglen=0
[root@mickey:/.root]#
#-> vxdg list totalpdg01
Group: totalpdg01
dgid: 1177157791.111.mickey
import-id: 33792.476
flags: shared cds
version: 120
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: donald=sw mickey=sw
ssb:
on
detach-policy: global
dg-fail-policy: dgdisable ß---Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.4742 permlen=0 free=0
templen=0 loglen=0
[root@mickey:/.root]#
Cause:
In a CVM RAC environment where a shared disk group is using a
dgfailpolicy of dgdisable, should the master lose connectivity to all disks in
the disk group, the master will disable the disk group (dgdisable). As this is
a CVM environment the disk group is also disabled across all slave nodes (as
all nodes must have a consistent view of the configuration as seen by the
master).
Once a disk group is dgdisabled any new opens against volumes in
that disk group will fail. Some examples of when opens are attempted are:
- When a volume containing a file system is mounted
- When an I/O is attempted against a raw volume device
This scenario can have potentially severe implications. For
example if using Oracle RAC with vote devices on raw volumes, as soon as the
corresponding disk group is dgdisabled cluster wide, all nodes will be unable
to perform I/O to vote disks meaning that they can no longer heartbeat. As a
result of this all nodes will be panic'd by Oracle Cluster Ready Services (CRS)
causing a cluster wide loss of service.
Solution:
To avoid this issue all shared disk groups of version 120 and
higher should be set to use a dgfailpolicy of leave. Once set, should the
master lose connectivity to disks in the disk group, it will panic and leave
the cluster rather than disabling the disk group cluster wide. This then allows
one of the surviving slave nodes to take over the master role and assuming that
the new master has not issues with connectivity to storage allows the surviving
members of the cluster to continue to function as normal.
vxdg -g <diskgroup> set dgfailpolicy=leave
This policy is consistent through reboots.
=================
In SFCFS 6.0 later, the dg fail policy is obsolete. From SFCFS 6.0 release
notes:
Availability of shared disk group configuration copies
If the Cluster Volume Manager (CVM) master node loses access to
a configuration
copy, CVM redirects the read or write requests over the network
to another node
that has connectivity to the configuration copy. This behavior
ensures that the
disk group stays available.
In previous releases, CVM handled disconnectivity according to
the disk group
failure policy (dgfail_policy). This behavior still applies if
the disk group version
is less than 170. The dgfail_policy is not applicable to disk
groups with a version
of 170 or later.