HAIP is not starting on all nodes clsu_get_private_ip_addr failed retVal:7 - GRID down

Table of Contents

HAIP is not starting on all nodes clsu_get_private_ip_addr failed retVal:7 – GRID down

Errors reported in occsd.log 
   [    CSSD][2358634240]clssnmReadNodeInfo:clsu_get_private_ip_addr failed retVal:7 
   [GIPCHDEM][4160411392]gipchaDaemonCheckInterfaces: failed to read private interface information ret 1
   [ CLSINET][4160411392]failed to retrieve GPnP profile, grv 13 

Resource status 
NAME                           TARGET     STATE           SERVER       STATE_DETAILS
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     OFFLINE
ora.cluster_interconnect.haip  ONLINE     OFFLINE
ora.crf                        ONLINE     ONLINE          grac41
ora.crsd                       ONLINE     OFFLINE
ora.cssd                       ONLINE     ONLINE          grac41
ora.cssdmonitor                ONLINE     ONLINE          grac41
ora.ctssd                      ONLINE     ONLINE          grac41       OBSERVER
ora.diskmon                    OFFLINE    OFFLINE
ora.drivers.acfs               ONLINE     OFFLINE
ora.evmd                       ONLINE     INTERMEDIATE    grac41
ora.gipcd                      ONLINE     ONLINE          grac41
ora.gpnpd                      ONLINE     ONLINE          grac41
ora.mdnsd                      ONLINE     ONLINE          grac41

[grid@grac41 cssd]$  grep failed  ocssd.log
..
2014-11-12 15:03:15.672: [    CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21
2014-11-12 15:03:15.672: [    CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21
2014-11-12 15:03:15.744: [    CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter auth rep (9) failed with rc 21
2014-11-12 15:03:15.744: [    CSSD][2358634240]clssscGetParameterOLR: OLR fetch for parameter diagwait (14) failed with rc 21
2014-11-12 15:03:19.857: [    CSSD][2358634240]clssscGetParameterProfile: profile fetch failed for parameter ocrid (4) with return code 5
2014-11-12 15:03:20.438: [    CLSF][2340329216]checksum failed for disk:/dev/asm_data_10G_disk3:
2014-11-12 15:03:20.439: [    CLSF][2340329216]checksum failed for disk:/dev/asm_data_10G_disk1:
2014-11-12 15:03:20.439: [    CLSF][2340329216]checksum failed for disk:/dev/asm_data_10G_disk2:
2014-11-12 15:03:38.753: [    CSSD][2358634240]clssnmReadNodeInfo:clsu_get_private_ip_addr failed retVal:7
..
2014-11-12 14:39:14.591: [UiServer][337995520] CS(0x7fc3f00be840)set Properties ( root,0x2967c60)
2014-11-12 14:39:14.596: [    GPNP][4160411392]clsgpnpm_doconnect: [at clsgpnpm.c:1210] GIPC gipcretConnectionRefused (29) gipcConnect(ipc-ipc://GPNPD_grac41)
2014-11-12 14:39:14.597: [    GPNP][4160411392]clsgpnpm_doconnect: [at clsgpnpm.c:1211] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url "ipc://GPNPD_grac41"
2014-11-12 14:39:14.597: [    GPNP][4160411392]clsgpnpm_exchange: [at clsgpnpm.c:2072] Result: (13) CLSGPNP_NO_DAEMON. Failed to connect to call url "ipc://GPNPD_grac41", msg=0x7fc40c068b70 dom=0x7fc40c36f620
2014-11-12 14:39:14.597: [    GPNP][4160411392]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2115] Result: (13) CLSGPNP_NO_DAEMON. Error in get-profile SOAP exchange to callurl "ipc://GPNPD_grac41".
2014-11-12 14:39:14.597: [    GPNP][4160411392]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2243] Result: (13) CLSGPNP_NO_DAEMON. Error get-profile CALL to remote "ipc://GPNPD_grac41" disco ""
2014-11-12 14:39:14.597: [ CLSINET][4160411392]failed to retrieve GPnP profile, grv 13
2014-11-12 14:39:14.597: [GIPCHDEM][4160411392]gipchaDaemonCheckInterfaces: failed to read private interface information ret 1
2014-11-12 14:39:14.617: [UiServer][337995520] CS(0x7fc3f00beeb0)set Properties ( root,0x29615f0)
2014-11-12 14:39:14.638: [UiServer][337995520] CS(0x7fc3f00bf3e0)set Properties ( root,0x296be30)
2014-11-12 14:39:14.677: [ CRSCOMM][359008000] IpcL: connection to member 9 has been removed
2014-11-12 14:39:14.677: [CLSFRAME][359008000] Removing IPC Member:{Relative|Node:0|Process:9|Type:3}
2014-11-12 14:39:14.677: [CLSFRAME][359008000] Disconnected from AGENT process: {Relative|Node:0|Process:9|Type:3}

Verify GPnP Setup with cluvfy

[grid@grac41 cluvfy]$  $GRID_HOME/bin/gpnptool get 2>/dev/null  |  xmllint --format -  | grep Network
  <gpnp:Network-Profile>
    <gpnp:HostNetwork id="gen" HostName="*">
      <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/>
      <gpnp:Network id="net3" Adapter="eth3" IP="192.168.3.64" Use="cluster_interconnect"/>
    </gpnp:HostNetwork>
  </gpnp:Network-Profile>
--> CI address 192.168.3.64 looks wrong 
[grid@grac41 cluvfy]$   ./bin/cluvfy stage -pre crsinst -n grac41,grac42  -networks eth1:192.168.1.0:PUBLIC/eth3:192.168.3.64:cluster_interconnect
..
ERROR: 
PRVG-11050 : No matching interfaces "eth3" for subnet "192.168.3.64" on nodes "grac41,grac42"
PRVF-4090 :  Node connectivity failed for interface "eth3"

Fix the wrong setup by following Node : 1094024.1

How To Repair and Start the Clusterware if the Cluster Interconnect was Removed/Modified from GPNP (Doc ID 1094024.1)

 
GRID stack must be shutdown first on every node during manual profile update - can be done in rolling fashion.

2.1 Ensure stack is down on all nodes
root@grac41 gipcd]#  crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
root@grac42 gipcd]#  crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
root@grac43 gipcd]#  crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services


2.2 Backup gpnp profile directly in gpnpd cache
$ cp  $GRID_HOME/gpnp/grac41/profiles/peer/profile.xml $GRID_HOME/gpnp/grac41/profiles/peer/profile.xml.orig

2.3 Get the current profile sequence SEQ either in gpnp profile xml as <GPnP-Profile ProfileSequence="xx" or using gpnptool as
$ gpnptool getpval -p=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -prf_sq -o-
30

2.4 Modify profile directly in gpnpd cache - note: modification will make profile invalid until resigned!

a. If cluster_interconnect information is not availble in gpnp profile,  use public network temporary as cluster_interconnect
gpnptool edit -p=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -o=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -ovr -prf_sq=<SEQ+1> -net0:net_use=public,cluster_interconnect

b. If cluster_interconnect information is in gpnp profile, only interface name is wrong (same sample as cluster_interconnect changing from ce2 to 
ce5 and network id is "net3" in gpnp profile) 
gpnptool edit -p=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -o=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -ovr -prf_sq=<SEQ+1> -net3:net_ada=ce5

c. If cluster_interconnect information is in gpnp profile, only subnet is wrong (same sample as cluster_interconnect subnet changed from 192.168.0.0 to 192.168.20.0, 
network id is "net2" in gpnp profile)
gpnptool edit -p=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -o=<GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml -ovr -prf_sq=<SEQ+1> -net2:net_ip=192.168.20.0

---> We change subnet from    192.168.3.64 to  192.168.3.0
[grid@grac41 ~]$ gpnptool edit -p=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -o=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -ovr -prf_sq=31 -net3:net_ip=192.168.3.0
Resulting profile written to "/u01/app/11204/grid/gpnp/grac41/profiles/peer/profile.xml".
Success.

2.5 Sign the profile, directly in gpnpd cache
[grid@grac41 ~]$ gpnptool edit -p=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -o=$GRID_HOME/gpnp/grac41/profiles/peer/profile.xml -ovr -prf_sq=31 -net3:net_ip=192.168.3.0
Resulting profile written to "/u01/app/11204/grid/gpnp/grac41/profiles/peer/profile.xml".
Success.
[grid@grac41 ~]$ $GRID_HOME/bin/gpnptool get 2>/dev/null  |  xmllint --format -  | grep Network
  <gpnp:Network-Profile>
    <gpnp:HostNetwork id="gen" HostName="*">
      <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/>
      <gpnp:Network id="net3" Adapter="eth3" IP="192.168.3.0" Use="cluster_interconnect"/>
    </gpnp:HostNetwork>
  </gpnp:Network-Profile>

2.6 Start the stack on the node where editing is done. If it is fine, then start remote nodes
Profile may get propagated to all or some of the nodes of the cluster (depending on profile connectivity settings and if mdnsd, gpnpd running on other nodes).


2.7. Check profile on every node to ensure profile propagation
run "gpnptool get" on every node
If profile did not propagated, shut down node stack, copy new profile.xml (<GRID_HOME>/gpnp/<edited_NODENAME>/profiles/peer/profile.xml 
to <GRID_HOME>/gpnp/<NODENAME>/profiles/peer/profile.xml) from the node you did editing on to the node in question and restart the stack on that node.
Go to step 1.7 in the previous section and start there.

Reference

How To Repair and Start the Clusterware if the Cluster Interconnect was Removed/Modified from GPNP (Doc ID 1094024.1)

HAIP is not starting on all nodes clsu_get_private_ip_addr failed retVal:7 – GRID down

Verify GPnP Setup with cluvfy

Fix the wrong setup by following Node : 1094024.1

Reference

Leave a Reply Cancel reply