HAIP is not starting on all nodes clsu_get_private_ip_addr failed retVal:7 – GRID down
Errors reported in occsd.log
[ CSSD][2358634240]clssnmReadNodeInfo:clsu_get_private_ip_addr failed retVal:7
[GIPCHDEM][4160411392]gipchaDaemonCheckInterfaces: failed to read private interface information ret 1
[ CLSINET][4160411392]failed to retrieve GPnP profile, grv 13
Resource status
NAME TARGET STATE SERVER STATE_DETAILS
------------------------- ---------- ---------- ------------ ------------------
ora.asm ONLINE OFFLINE
ora.cluster_interconnect.haip ONLINE OFFLINE
ora.crf ONLINE ONLINE grac41
ora.crsd ONLINE OFFLINE
ora.cssd ONLINE ONLINE grac41
ora.cssdmonitor ONLINE ONLINE grac41
ora.ctssd ONLINE ONLINE grac41 OBSERVER
ora.diskmon OFFLINE OFFLINE
ora.drivers.acfs ONLINE OFFLINE
ora.evmd ONLINE INTERMEDIATE grac41
ora.gipcd ONLINE ONLINE grac41
ora.gpnpd ONLINE ONLINE grac41
ora.mdnsd ONLINE ONLINE grac41
[grid@grac41 cssd]$ grep failed ocssd.log
..
2014-11-12 15:03:15.672: [ CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21
2014-11-12 15:03:15.672: [ CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21
2014-11-12 15:03:15.744: [ CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter auth rep (9) failed with rc 21
2014-11-12 15:03:15.744: [ CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter diagwait (14) failed with rc 21
2014-11-12 15:03:19.857: [ CSSD][2358634240]clssscGetParameterProfile: profile fetch failed for parameter ocrid (4) with return code 5
2014-11-12 15:03:20.438: [ CLSF][2340329216]checksum failed for disk:/dev/asm_data_10G_disk3:
2014-11-12 15:03:20.439: [ CLSF][2340329216]checksum failed for disk:/dev/asm_data_10G_disk1:
2014-11-12 15:03:20.439: [ CLSF][2340329216]checksum failed for disk:/dev/asm_data_10G_disk2:
2014-11-12 15:03:38.753: [ CSSD][2358634240]clssnmReadNodeInfo:clsu_get_private_ip_addr failed retVal:7
..
2014-11-12 14:39:14.591: [UiServer][337995520] CS(0x7fc3f00be840)set Properties ( root,0x2967c60)
2014-11-12 14:39:14.596: [ GPNP][4160411392]clsgpnpm_doconnect: [at clsgpnpm.c:1210] GIPC gipcretConnectionRefused (29) gipcConnect(ipc-ipc://GPNPD_grac41)
2014-11-12 14:39:14.597: [ GPNP][4160411392]clsgpnpm_doconnect: [at clsgpnpm.c:1211] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url "ipc://GPNPD_grac41"
2014-11-12 14:39:14.597: [ GPNP][4160411392]clsgpnpm_exchange: [at clsgpnpm.c:2072] Result: (13) CLSGPNP_NO_DAEMON. Failed to connect to call url "ipc://GPNPD_grac41", msg=0x7fc40c068b70 dom=0x7fc40c36f620
2014-11-12 14:39:14.597: [ GPNP][4160411392]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2115] Result: (13) CLSGPNP_NO_DAEMON. Error in get-profile SOAP exchange to callurl "ipc://GPNPD_grac41".
2014-11-12 14:39:14.597: [ GPNP][4160411392]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2243] Result: (13) CLSGPNP_NO_DAEMON. Error get-profile CALL to remote "ipc://GPNPD_grac41" disco ""
2014-11-12 14:39:14.597: [ CLSINET][4160411392]failed to retrieve GPnP profile, grv 13
2014-11-12 14:39:14.597: [GIPCHDEM][4160411392]gipchaDaemonCheckInterfaces: failed to read private interface information ret 1
2014-11-12 14:39:14.617: [UiServer][337995520] CS(0x7fc3f00beeb0)set Properties ( root,0x29615f0)
2014-11-12 14:39:14.638: [UiServer][337995520] CS(0x7fc3f00bf3e0)set Properties ( root,0x296be30)
2014-11-12 14:39:14.677: [ CRSCOMM][359008000] IpcL: connection to member 9 has been removed
2014-11-12 14:39:14.677: [CLSFRAME][359008000] Removing IPC Member:{Relative|Node:0|Process:9|Type:3}
2014-11-12 14:39:14.677: [CLSFRAME][359008000] Disconnected from AGENT process: {Relative|Node:0|Process:9|Type:3}
Verify GPnP Setup with cluvfy
[grid@grac41 cluvfy]$ $GRID_HOME/bin/gpnptool get 2>/dev/null | xmllint --format - | grep Network
<gpnp:Network-Profile>
<gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/>
<gpnp:Network id="net3" Adapter="eth3" IP="192.168.3.64" Use="cluster_interconnect"/>
</gpnp:HostNetwork>
</gpnp:Network-Profile>
--> CI address 192.168.3.64 looks wrong
[grid@grac41 cluvfy]$ ./bin/cluvfy stage -pre crsinst -n grac41,grac42 -networks eth1:192.168.1.0:PUBLIC/eth3:192.168.3.64:cluster_interconnect
..
ERROR:
PRVG-11050 : No matching interfaces "eth3" for subnet "192.168.3.64" on nodes "grac41,grac42"
PRVF-4090 : Node connectivity failed for interface "eth3"
Fix the wrong setup by following Node : 1094024.1
-
How To Repair and Start the Clusterware if the Cluster Interconnect was Removed/Modified from GPNP (Doc ID 1094024.1)
GRID stack must be shutdown first on every node during manual profile update - can be done in rolling fashion.
2.1 Ensure stack is down on all nodes
root@grac41 gipcd]# crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
root@grac42 gipcd]# crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
root@grac43 gipcd]# crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
2.2 Backup gpnp profile directly in gpnpd cache
$ cp $GRID_HOME/gpnp/grac41/profiles/peer/profile.xml $GRID_HOME/gpnp/grac41/profiles/peer/profile.xml.orig
2.3 Get the current profile sequence SEQ either in gpnp profile xml as <GPnP-Profile ProfileSequence="xx" or using gpnptool as
$ gpnptool getpval -p=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -prf_sq -o-
30
2.4 Modify profile directly in gpnpd cache - note: modification will make profile invalid until resigned!
a. If cluster_interconnect information is not availble in gpnp profile, use public network temporary as cluster_interconnect
gpnptool edit -p=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -o=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -ovr -prf_sq=<SEQ+1> -net0:net_use=public,cluster_interconnect
b. If cluster_interconnect information is in gpnp profile, only interface name is wrong (same sample as cluster_interconnect changing from ce2 to
ce5 and network id is "net3" in gpnp profile)
gpnptool edit -p=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -o=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -ovr -prf_sq=<SEQ+1> -net3:net_ada=ce5
c. If cluster_interconnect information is in gpnp profile, only subnet is wrong (same sample as cluster_interconnect subnet changed from 192.168.0.0 to 192.168.20.0,
network id is "net2" in gpnp profile)
gpnptool edit -p=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -o=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -ovr -prf_sq=<SEQ+1> -net2:net_ip=192.168.20.0
---> We change subnet from 192.168.3.64 to 192.168.3.0
[grid@grac41 ~]$ gpnptool edit -p=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -o=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -ovr -prf_sq=31 -net3:net_ip=192.168.3.0
Resulting profile written to "/u01/app/11204/grid/gpnp/grac41/profiles/peer/profile.xml".
Success.
2.5 Sign the profile, directly in gpnpd cache
[grid@grac41 ~]$ gpnptool edit -p=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -o=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -ovr -prf_sq=31 -net3:net_ip=192.168.3.0
Resulting profile written to "/u01/app/11204/grid/gpnp/grac41/profiles/peer/profile.xml".
Success.
[grid@grac41 ~]$ $GRID_HOME/bin/gpnptool get 2>/dev/null | xmllint --format - | grep Network
<gpnp:Network-Profile>
<gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/>
<gpnp:Network id="net3" Adapter="eth3" IP="192.168.3.0" Use="cluster_interconnect"/>
</gpnp:HostNetwork>
</gpnp:Network-Profile>
2.6 Start the stack on the node where editing is done. If it is fine, then start remote nodes
Profile may get propagated to all or some of the nodes of the cluster (depending on profile connectivity settings and if mdnsd, gpnpd running on other nodes).
2.7. Check profile on every node to ensure profile propagation
run "gpnptool get" on every node
If profile did not propagated, shut down node stack, copy new profile.xml (<GRID_HOME>/gpnp/<edited_NODENAME>/profiles/peer/profile.xml
to <GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml) from the node you did editing on to the node in question and restart the stack on that node.
Go to step 1.7 in the previous section and start there.
Reference
- How To Repair and Start the Clusterware if the Cluster Interconnect was Removed/Modified from GPNP (Doc ID 1094024.1)