Pitfalls changing Public IP address in a RAC cluster env with detailed debugging steps

Overview

  • Changing the PUBLIC interface in a RAC env is not that simple and you need to take into account
    • Nameserver changes
    • DHCP server changes including VIPs
    • /etc/hosts changes
    • GNS VIP changes
    • PUBLIC interface changes
      #  oifcfg getif  ->  eth1  192.168.5.0  global  public
  • In any case you should read : How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)

If you still get problem the here some debugging details:

  • Note this tutorial use 12.1.0.2 CW logfiles structure which simplifies using grep command
    a lot as all traces can be found at:  $GRID_HOME/diag/crs/hract21/crs/trace
  • Download script crsi and run this script during booting you CRS stack with watch utility
    This gives you a good idea what component is failing or gets restarted and finally switch
    to status OFFLINE
  • As said again and again cluvfy is your friend to quickly identify the root problem
  • If the network adapter  info in profile.xml doesn’t match the ifconfig data GIPCD will not start ( This is true for PUBLIC and CLUSTERINTERCONNECT info )

In this tutorial we will debug following scenarios by reading logfiles, running OS command and by running cluvfy:

  • Case I   : Nameserver not responding –  GIPCD not starting
  • Case II  : Different  IP address in /etc/hosts and NameServer Lookup  – GIPCD not starting
  • Case III : Wrong Cluster Interconnect Address – GIPCD not starting
  • Case IV  : DHCP server sends wrong IP address – VIPs not starting
  • Case V   : Wrong GNS VIP address – GNS not starting

Potential Errors and Error types

In generell we have  2 types of Network related error

  • OS related errors ( either bind() or getaddrinfo() system call was failing )

    • If you you want to find an GIPCD related errors around between 2015-02-03 12:00:00 and 2015-02-03 12:09:50  you may run :     $ grep “2015-02-03 12:0″ *  | grep ” slos “
    • In this tutorial we handle bind()  OS system calls but you may check your traces for:
      send(),recv(), listen() and  connect() system call failures too !
    • Note – Only GIPCD errors prints OS errors with slos printout like :  slos loc :  getaddrinfo
    • For other components like MDNSD daemon  you may grep your CW traces
      for error strings: “Address already in use” , “Error Connection timed out”, “Cannot assign requested address”
  • Logical Errors
    • Are not easy to debug as we need to read and understand the CW logs more in detail.

Error Details

Error I :  Name Server related Errors – getaddrinfo () was failing

 OS system call:  getaddrinfo() is failing with errno 110:   Error Connection timed out (110)
 --> see Case I
 Search all CW traces with TS 2015-02-03 09:20:00 --> 2015-02-03 09:29:59" for failed OS Call: getaddrinfo
 [grid@hract21 trace]$  grep "2015-02-03 09:2" *  | grep " getaddrinfo"
 gipcd_2.trc:2015-02-03 09:20:09.946273 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo(
 gipcd_2.trc:2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo

Error II : bind() fails  as the local IP address is not avaiable on your system  (verify with ifconfig )

OS system call:  bind () is failing with errno 99 : Error: Cannot assign requested address (99)
 --> see Case II,III
 Search all CW traces with TS 2015-02-03 15:30:00 --> 2015-02-03 15:39:59" for failed OS Call: bind
 [grid@hract21 trace]$  grep "2015-02-03 15:3" *  | grep " bind"
 gipcd_2.trc:2015-02-03 15:34:47.898380 :GIPCXCPT:2106038016:  gipcmodNetworkProcessBind: slos loc :  bind
 gipcd_2.trc:2015-02-03 16:39:43.587972 :GIPCXCPT:1288218368:  gipcmodNetworkProcessBind: slos loc :  bind

--> If OS system call:  bind () is failing with errno 98 Error : Address already in use (98)
please read :  
Troubleshooting Clusterware and Clusterware component error : Address already in use

Error III: Logical Errros ( not related OS errors )

  • Wrong DHCP Server response : see Case IV
  • Wrong GNS Server address     : see Case V

Case I:  Nameserver not responding –  GIPCD not starting

[root@hract21 Desktop]#  watch crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> ora.gipcd in state INTERMEDIATE/OFFLINE ora.evmd in state INTERMEDIATE

As GIPCD doesn't come up  review tracefile :  gipcd.trc
2015-02-03 09:20:14.952363 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos op  :  sgipcnPopulateAddrInfo
2015-02-03 09:20:14.952373 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos dep :  Connection timed out (110)
2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo(
2015-02-03 09:20:14.952391 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos info:  server not available,try again
2015-02-03 09:20:14.952455 :GIPCXCPT:2157598464:  gipcResolveF [gipcInternalBind : gipcInternal.c : 537]: EXCEPTION[ ret gipcretFail (1) ]  failed to resolve address 0x7f035c033c10 [0000000000000311] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x4000
2015-02-03 09:20:14.952486 :GIPCXCPT:2157598464:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretFail (1) ]  failed to bind endp 0x7f035c033070 [000000000000030f] { gipcEndpoint : localAddr 'tcp://hract21.example.com', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x40008000, flags-2 0x0, usrFlags 0x240a0 }, addr 0x7f035c034890 [0000000000000316] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x200a0
2015-02-03 09:20:14.952552 :GIPCXCPT:2157598464:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretFail (1)
--> getaddrinfo() system all is failing -> Nameserver lookup issue

Verify Error with OS commands
[grid@hract21 trace]$  nslookup hract21
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

Verify Error with cluvfy 
[grid@hract21 CLUVFY]$  cluvfy comp nodeapp -n hract21
PRVF-0002 : could not retrieve local node name

Fix -> Verify the Nameserver is up and running 
1) Is your nameserver running ?
[root@ns1 ~]# service named status
version: 9.9.3-RedHat-9.9.3-P1.el6
CPUs found: 4
worker threads: 4
UDP listeners per interface: 4
number of zones: 101
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running
named (pid  9193) is running...

2) Can you ping your nameserver ?
[oracle@hract21 JAVA]$ ping ns1.example.com
PING ns1.example.com (192.168.5.50) 56(84) bytes of data.
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=2 ttl=64 time=0.293 ms

3) Verify that nameserver is listening on required IP/Adress and Port 
[root@ns1 ~]# netstat -auen  | grep ":53 "
udp        0      0 192.168.5.50:53             0.0.0.0:*                               25         56734      
udp        0      0 127.0.0.1:53                0.0.0.0:*                               25         56732  

Case II  : Different  IP address in /etc/hosts and NameServer Lookup – GIPCD not starting

****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE     OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> CSSD and GIPCD remains OFFLINE - switches STATE_DETAILS from STABLE to STARTING but doen't up

gipcd.trc:
2015-02-03 15:35:02.928327 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-02-03 15:35:02.928333 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos dep :  Cannot assign requested address (99)
2015-02-03 15:35:02.928337 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos loc :  bind
2015-02-03 15:35:02.928342 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos info:  addr '192.168.6.121:0'
2015-02-03 15:35:02.928391 :GIPCXCPT:937420544:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  failed to bind endp 0x7f4624027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.6.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7f4624033be0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7f4624033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 15:35:02.928405 :GIPCXCPT:937420544:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928419 :GIPCXCPT:937420544:  gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928429 :GIPCHDEM:937420544:  gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  terminating daemon thread due to exception
2015-02-03 15:35:02.928455 :GIPCXCPT:1281627904:  gipchaInternalRegister: daemon thread state invalid gipchaThreadStateFailed (5), ret gipcretFail (1)
2015-02-03 15:35:02.928477 :GIPCHGEN:1281627904:  gipchaRegisterF [gipchaInternalResolve : gipchaInternal.c : 1204]: EXCEPTION[ ret gipcretFail (1) ]  failed to register ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, name '(null)', flags 0x4000
2015-02-03 15:35:02.928544 :GIPCHGEN:1281627904:  gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 863]: EXCEPTION[ ret gipcretFail (1) ]  failed to resolve ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, host 'hract21', port 'gipcdha_hract21_', flags 0x0
2015-02-03 15:35:02.928569 :GIPCXCPT:1281627904:  gipcInternalResolve: failed to resolve addr 0x7f4638099680 [000000000000016a] { gipcAddress : name 'gipcha://hract21:gipcdha_hract21_', objFlags 0x0, addrFlags 0x4 }, ret gipcretFail (1)
 
Verify Error with OS commands
[grid@hract21 trace]$ nslookup hract21
Server:        192.168.5.50
Address:    192.168.5.50#53
Name:    hract21.example.com
Address: 192.168.5.121

[grid@hract21 trace]$ ping hract21
PING hract21 (192.168.6.121) 56(84) bytes of data.
--> Opps why to different results for nslookup and ping ?
Verify IP address from  /etc/hosts
[grid@hract21 trace]$ grep hract21 /etc/hosts
192.168.6.121 hract21 hract21.example.com

Verify Error with cluvfy  
[grid@hract21 CLUVFY]$ cluvfy comp nodereach -n  hract21
Verifying node reachability 
Checking node reachability...
PRVF-6006 : unable to reach the IP addresses "hract21" from the local node
PRKC-1071 : Nodes "hract21" did not respond to ping in "3" seconds, 
PRKN-1035 : Host "hract21" is unreachable
Verification of node reachability was unsuccessful on all the specified nodes. 

-> Fix : Keep your /etc/hosts and your Bind server in sync 
         When Changing Bind Server always verify the change in /etc/hosts too

 

Case III : Wrong Cluster Interconnect Address – GIPCD not starting

[root@hract21 Desktop]#  watch crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      hract21         STARTING
ora.cssdmonitor                1   ONLINE    ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    INTERMEDIATE hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> GPNPD remains in status INTERMEDIATE GIPCD is in state OFFLINE

gipcd.trc:
2015-02-03 16:39:18.324221 :GIPCHDEM:20907776:  gipchaDaemonThread: starting daemon thread hctx 0x22d39b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'df31173e-00000000', name2 02ff-37da-c08f-50b4, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xcd60 }
2015-02-03 16:39:23.327691 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc032310 [0000000000000308] { gipcAddress : name 'tcp://192.168.5.121', objFlags 0x0, addrFlags 0x5 }
2015-02-03 16:39:23.327721 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-02-03 16:39:23.327727 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos dep :  Cannot assign requested address (99)
2015-02-03 16:39:23.327732 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos loc :  bind
2015-02-03 16:39:23.327736 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.121:0'
2015-02-03 16:39:23.327806 :GIPCXCPT:20907776:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 16:39:23.327823 :GIPCXCPT:20907776:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327838 :GIPCXCPT:20907776:  gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327851 :GIPCHDEM:20907776:  gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  terminating daemon thread due to exception
2015-02-03 16:39:23.327943 : GIPCNET:20907776:  gipcmodNetworkUnprepare: failed to unprepare waits for endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x8, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x26008000, flags-2 0x0, usrFlags 0x20020 }
--> Here bind system call fails with errno 99 which mean this IP  192.168.5.121 address is not available yet ! 
[root@hract21 Desktop]# cat /usr/include/asm-generic/errno.h | grep 99
#define    EADDRNOTAVAIL    99    /* Cannot assign requested address */

Verify Error with OS commands:
[root@hract21 Desktop]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.6.121  Bcast:192.168.6.255  Mask:255.255.255.0
[root@hract21 Desktop]#  ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 08:00:27:4E:C9:BF  
          inet addr:192.168.2.121  Bcast:192.168.2.255  Mask:255.255.255.0
[root@hract21 Desktop]#   $GRID_HOME/bin/gpnptool get 2>/dev/null  |  xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id'
    <gpnp:HostNetwork id="gen" HostName="*">
      <gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/>
      <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/>
  <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
  <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/>
--> GPnPD expects PUBLIC interface eth1 to be bound on IP Adress 192.168.5.121 and not 192.168.6.121

Verify Error with cluvfy:
[grid@hract21 CLUVFY]$  cluvfy comp gpnp -n hract21
Verifying GPNP integrity 
--> cluvfy comp gpnp hangs 

Fix: Change interface eth1 back to  192.168.5.121 and reboot cluster stack

 

Case IV   :  DHCP server returns wrong IP address – VIPs not starting

  • Multiple DHCP server
  • DHCP server not available
Lower CRS stack starts 
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    ONLINE       hract21         STABLE
ora.cluster_interconnect.haip  1   ONLINE    ONLINE       hract21         STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssdmonitor                1   ONLINE    ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    ONLINE       hract21         OBSERVER,STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    ONLINE       hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    ONLINE       hract21         STABLE
--> Lower CRS stack is up and running 

Vips are in state STARTING 
ora.hract21.vip                1   ONLINE       OFFLINE      hract21         STARTING  
ora.hract22.vip                1   ONLINE       ONLINE       hract22         STABLE  
ora.hract23.vip                1   ONLINE       ONLINE       hract23         STABLE  
ora.mgmtdb                     1   ONLINE       ONLINE       hract23         Open,STABLE  
ora.oc4j                       1   ONLINE       ONLINE       hract22         STABLE  
ora.scan1.vip                  1   ONLINE       OFFLINE      hract21         STARTING 

crsd_orarootagent_root.trc
2015-02-03 12:06:42.065910 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP client id = hract21-vip
2015-02-03 12:06:42.065929 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP Server Port = 67
2015-02-03 12:06:42.065940 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet from = 192.168.5.121
2015-02-03 12:06:42.065949 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet to = 255.255.255.255
2015-02-03 12:06:47.068966 :GIPCXCPT:2822174464:  gipcWaitF [clsdhcp_sendmessage : clsdhcp.c : 616]: 
       EXCEPTION[ ret (uknown) (910) ]  failed to wait on obj 0x7fcb8c04d770 [0000000000000ddf]
      { gipcEndpoint : localAddr 'udp://0.0.0.0:68', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, 
     objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fcb8c037e70, sendp 0x7fcb8c037cb0 status 13flags 0x20000002, flags-2 0x0, usrFlags 0x8000 }, reqList 0x7fcba8364658, nreq 1, creq 0x7fcba8364b20 timeout 5000 ms, flags 0x4000
--> After sending an DHCP request - we fail in  gipcWaitF  which means we have some troubles to contact our DHCP server
    or getting the reqired DHCP address 

Verify Error with OS commands
Download and Install dhcping:
Download location:  http://pkgs.repoforge.org/dhcping  following package : dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# rpm -i  /media/sf_kits/Linux/dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# dhcping -i eth1
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0 
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0
no answer
--> Here we see that we get a wrong DHCP address
[root@ns1 dhcp]# dhcping -h   08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
no answer
--> This confirms that our DHCP server is running on wrong IP addess ( 192.168.3.50 ) and 
    can server an DHCP request for a s 192.168.5.xx address

Working dhcping output - just for reference :
[root@hract21 Desktop]#  dhcping -h   08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
Got answer from: 192.168.5.50

Verify Error with cluvfy  commands
[root@hract21 CLUVFY]#  cluvfy comp dhcp -clustername ract2 -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
PRVG-5726 : Failed to discover DHCP servers on public network listening on port "67" using command "/u01/app/121/grid/bin/crsctl discover dhcp -clientid ract2-scan1-vip "
CRS-10010: unable to discover DHCP server in the network listening on port 67 for client ID ract2-scan1-vip
CRS-4000: Command discover failed, or completed with errors.
PRVF-5704 : No DHCP server were discovered on the public network listening on port 67
Verification of DHCP Check was unsuccessful on all the specified nodes. 

Additonal info about DHCP setup  
- I always look at /etc/dhcpd.conf wich is wrong - use /etc/dhcp/dhcpd.conf file instead !
- Note if changing  /etc/dhcpd.conf you may need change /etc/sysconfig/dhcpd 
DHCP config files: 
/etc/dhcp/dhcpd.conf 
/etc/sysconfig/dhcpd

 

Case V   : Wrong GNS VIP address – GNS not starting

[root@hract21 network-scripts]#  watch 'crs | grep gns'
ora.gns                        1   ONLINE       OFFLINE      -               STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABLE
-> GNS VIP is ONLINE but GNS doesn't sart 

gnsd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
    CLSB:489064000: Argument count (argc) for this daemon is 7
    CLSB:489064000: Argument 0 is: /u01/app/121/grid/bin/gnsd.bin
    CLSB:489064000: Argument 1 is: -trace-level
    CLSB:489064000: Argument 2 is: 1
    CLSB:489064000: Argument 3 is: -ip-address
    CLSB:489064000: Argument 4 is: 192.168.6.58
    CLSB:489064000: Argument 5 is: -startup-endpoint
    CLSB:489064000: Argument 6 is: ipc://GNS_hract21_4625_9fe54b1833d5fbd2
2015-02-03 17:29:15.339039 :   CLSNS:489064000: main::clsns_SetTraceLevel:trace level set to 1.
2015-02-03 17:29:16.226261 :     GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226283 :     GNS:489064000: main::clsgndmain: GNS starting on hract21. Process ID: 29196
2015-02-03 17:29:16.226299 :     GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226338 :     GNS:489064000: main::clsgnSetTraceLevel: trace level set to 1.
..
2015-02-03 17:29:17.490335 :     GNS:489064000: main::clsgndGetInstanceInfo: version: 12.1.0.2.0 (0xc100200) 
                                 endpoints: tcp://192.168.6.58:63806 process ID: "29196" state: "Initializing".
2015-02-03 17:29:17.491219 :     GNS:489064000: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
2015-02-03 17:29:17.496441 :     GNS:349841152: Resolve::clsgndnsCreateContainerCallback: listening on port 53 address "192.168.6.58"
2015-02-03 17:29:17.499552 :  CLSDMT:351942400: PID for the Process [29196], connkey 12
2015-02-03 17:29:17.505626 :     GNS:343537408: Command #0::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.512072 :     GNS:4160747264: Command #1::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.516675 :     GNS:4156544768: Command #2::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.518326 :     GNS:4154443520: Command #3::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.747693 :     GNS:4152342272: Self-check::clsgndscRun: Name: "GNSTESTHOST.grid12c.example.com" Address: 1.2.3.4.
2015-02-03 17:29:53.882538 :     GNS:351942400: main::clsgndCLSDMExit: CLSDM request to quit received - requester: agent.
2015-02-03 17:29:53.882610 :     GNS:351942400: main::clsgndCLSDMExit: terminating GNSD on behalf of CLSDM - requester: agent.
--> Here we have some troubles as GNS was terminated

crsd_orarootagent_root.trc:
2015-02-03 17:29:24.470729 :   CLSNS:292816640: main::clsnsgFind:(:CLSNS00230:):query to find 
     GNS using service name "_Oracle-GNS._tcp" failed.: 1: clskec:has:CLSNS:5 3 args[has:CLSNS:5][mod=clsns_DNSSD_FindServers][loc=(:CLSNS00152:)]
2015-02-03 17:29:24.470771 :     
     GNS:292816640: main::clsgnctrGetGNSAddressUsingCLSNS: (:CLSGN01053:) GNS address retrieval failed with 
     error CLSNS-00025 (GNS_SERV_FIND_FAIL) - throwing CLSGN-00070. 1: clskec:has:CLSNS:25 3 args[has:CLSNS:25][mod=clsnsgFind][loc=(:CLSNS00216:)]

Verify Error with OS commands:
Check GNS and PUBLIC network interface 
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com
Check the PUBLIC network interface 
[root@hract21 network-scripts]# ifconfig
eth1:1    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.156  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.157  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:3    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.153  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:4    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.151  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:5    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.152  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:6    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.6.58  Bcast:192.168.6.255  Mask:255.255.255.0
-->  VIPs are using 192.168.5.X as base address whereas our GNS VIP is using: 192.168.6.58
     This is not correct VIPs a GNS VIP should have the same Network address !

[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com

Let's investigate whether somebody changed the GNS base add
[grid@hract21 trace]$ grep clsgndadvAdvertise gnsd.trc
Lets check wether the GNS base address was changed :
2015-02-02 12:32:09.447471 : GNS:3141969472: main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.5.58:46453.
2015-02-03 17:22:00.410829 : GNS:4114409024: main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.5.58:25702.
2015-02-03 17:24:51.165609 : GNS:2221307456: main::clsgndadvAdvertise: 
                              Listening for commands on endpoint(s):tcp://192.168.6.58:27105.
2015-02-03 17:29:17.491219 : GNS:489064000:  main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
--> GNS base address was changed from  192.168.5.58 to 192.168.6.58 ! 

Verify Error with cluvy
[grid@hract21 CLUVFY]$  cluvfy comp gns -postcrsinst  -verbose
Verifying GNS integrity 
Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "grid12c.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
PRVF-5213 : GNS resource configuration check failed
PRCI-1156 : The GNS VIP 192.168.6.58 does not match any of the available subnets 192.168.5.0, 192.168.2.0.
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.6.58" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "grid12c.example.com" are reachable
WARNING: 
PRVF-5218 : "hract21-vip.grid12c.example.com" did not resolve into any IP address
PRVF-5827 : The response time for name lookup for name "hract21-vip.grid12c.example.com" exceeded 15 seconds
Checking status of GNS resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  hract21       no                        yes                     
  hract22       no                        yes                     
  hract23       no                        yes                     
PRVF-5211 : GNS resource is not running on any node of the cluster
Checking status of GNS VIP resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  hract21       yes                       yes                     
  hract22       no                        yes                     
  hract23       no                        yes                     
GNS integrity check failed
Verification of GNS integrity was unsuccessful. 
Checks did not pass for the following node(s):
    hract21
--> Cluvfy is very helpfull here as cluvfy compares the network adresses with the GNS address
    If GNS and network addresses don't match cluvfy throws PRVF-5213, PRCI-1156 error.

Fix -> Change GNS VIP back to the original address  and restart GNS
[root@hract21 network-scripts]# srvctl modify gns -vip 192.168.5.58
[root@hract21 network-scripts]# srvctl config gns 
  GNS is enabled.
  GNS VIP addresses: 192.168.5.58
  Domain served by GNS: grid12c.example.com
[root@hract21 network-scripts]# srvctl start gns
[root@hract21 network-scripts]# srvctl config gns -a -l
  GNS is enabled.
  GNS is listening for DNS server requests on port 53
  GNS is using port 5353 to connect to mDNS
  GNS status: OK
  Domain served by GNS: grid12c.example.com
  GNS version: 12.1.0.2.0
  Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12
  Name of the cluster where GNS is running: ract2
  Cluster type: server.
  GNS log level: 1.
  GNS listening addresses: tcp://192.168.5.58:30218.
  GNS is individually enabled on nodes: 
  GNS is individually disabled on nodes: 

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>