Pitfalls changing Public IP address in a RAC cluster env with detailed debugging steps

Overview

  • Changing the PUBLIC interface in a RAC env is not that simple and you need to take into account
    • Nameserver changes
    • DHCP server changes including VIPs
    • /etc/hosts changes
    • GNS VIP changes
    • PUBLIC interface changes
      #  oifcfg getif  ->  eth1  192.168.5.0  global  public
  • In any case you should read : How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)

If you still get problem the here some debugging details:

  • Note this tutorial use 12.1.0.2 CW logfiles structure which simplifies using grep command
    a lot as all traces can be found at:  $GRID_HOME/diag/crs/hract21/crs/trace
  • Download script crsi and run this script during booting you CRS stack with watch utility
    This gives you a good idea what component is failing or gets restarted and finally switch
    to status OFFLINE
  • As said again and again cluvfy is your friend to quickly identify the root problem
  • If the network adapter  info in profile.xml doesn’t match the ifconfig data GIPCD will not start ( This is true for PUBLIC and CLUSTERINTERCONNECT info )

In this tutorial we will debug following scenarios by reading logfiles, running OS command and by running cluvfy:

  • Case I   : Nameserver not responding –  GIPCD not starting
  • Case II  : Different  IP address in /etc/hosts and NameServer Lookup  – GIPCD not starting
  • Case III : Wrong Cluster Interconnect Address – GIPCD not starting
  • Case IV  : DHCP server sends wrong IP address – VIPs not starting
  • Case V   : Wrong GNS VIP address – GNS not starting

Potential Errors and Error types

In generell we have  2 types of Network related error

  • OS related errors ( either bind() or getaddrinfo() system call was failing )

    • If you you want to find an GIPCD related errors around between 2015-02-03 12:00:00 and 2015-02-03 12:09:50  you may run :     $ grep “2015-02-03 12:0″ *  | grep ” slos “
    • In this tutorial we handle bind()  OS system calls but you may check your traces for:
      send(),recv(), listen() and  connect() system call failures too !
    • Note – Only GIPCD errors prints OS errors with slos printout like :  slos loc :  getaddrinfo
    • For other components like MDNSD daemon  you may grep your CW traces
      for error strings: “Address already in use” , “Error Connection timed out”, “Cannot assign requested address”
  • Logical Errors
    • Are not easy to debug as we need to read and understand the CW logs more in detail.

Error Details

Error I :  Name Server related Errors – getaddrinfo () was failing

 OS system call:  getaddrinfo() is failing with errno 110:   Error Connection timed out (110)
 --> see Case I
 Search all CW traces with TS 2015-02-03 09:20:00 --> 2015-02-03 09:29:59" for failed OS Call: getaddrinfo
 [grid@hract21 trace]$  grep "2015-02-03 09:2" *  | grep " getaddrinfo"
 gipcd_2.trc:2015-02-03 09:20:09.946273 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo(
 gipcd_2.trc:2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo

Error II : bind() fails  as the local IP address is not avaiable on your system  (verify with ifconfig )

OS system call:  bind () is failing with errno 99 : Error: Cannot assign requested address (99)
 --> see Case II,III
 Search all CW traces with TS 2015-02-03 15:30:00 --> 2015-02-03 15:39:59" for failed OS Call: bind
 [grid@hract21 trace]$  grep "2015-02-03 15:3" *  | grep " bind"
 gipcd_2.trc:2015-02-03 15:34:47.898380 :GIPCXCPT:2106038016:  gipcmodNetworkProcessBind: slos loc :  bind
 gipcd_2.trc:2015-02-03 16:39:43.587972 :GIPCXCPT:1288218368:  gipcmodNetworkProcessBind: slos loc :  bind

--> If OS system call:  bind () is failing with errno 98 Error : Address already in use (98)
please read :  
Troubleshooting Clusterware and Clusterware component error : Address already in use

Error III: Logical Errros ( not related OS errors )

  • Wrong DHCP Server response : see Case IV
  • Wrong GNS Server address     : see Case V

Case I:  Nameserver not responding –  GIPCD not starting

[root@hract21 Desktop]#  watch crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> ora.gipcd in state INTERMEDIATE/OFFLINE ora.evmd in state INTERMEDIATE

As GIPCD doesn't come up  review tracefile :  gipcd.trc
2015-02-03 09:20:14.952363 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos op  :  sgipcnPopulateAddrInfo
2015-02-03 09:20:14.952373 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos dep :  Connection timed out (110)
2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo(
2015-02-03 09:20:14.952391 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos info:  server not available,try again
2015-02-03 09:20:14.952455 :GIPCXCPT:2157598464:  gipcResolveF [gipcInternalBind : gipcInternal.c : 537]: EXCEPTION[ ret gipcretFail (1) ]  failed to resolve address 0x7f035c033c10 [0000000000000311] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x4000
2015-02-03 09:20:14.952486 :GIPCXCPT:2157598464:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretFail (1) ]  failed to bind endp 0x7f035c033070 [000000000000030f] { gipcEndpoint : localAddr 'tcp://hract21.example.com', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x40008000, flags-2 0x0, usrFlags 0x240a0 }, addr 0x7f035c034890 [0000000000000316] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x200a0
2015-02-03 09:20:14.952552 :GIPCXCPT:2157598464:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretFail (1)
--> getaddrinfo() system all is failing -> Nameserver lookup issue

Verify Error with OS commands
[grid@hract21 trace]$  nslookup hract21
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

Verify Error with cluvfy 
[grid@hract21 CLUVFY]$  cluvfy comp nodeapp -n hract21
PRVF-0002 : could not retrieve local node name

Fix -> Verify the Nameserver is up and running 
1) Is your nameserver running ?
[root@ns1 ~]# service named status
version: 9.9.3-RedHat-9.9.3-P1.el6
CPUs found: 4
worker threads: 4
UDP listeners per interface: 4
number of zones: 101
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running
named (pid  9193) is running...

2) Can you ping your nameserver ?
[oracle@hract21 JAVA]$ ping ns1.example.com
PING ns1.example.com (192.168.5.50) 56(84) bytes of data.
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=2 ttl=64 time=0.293 ms

3) Verify that nameserver is listening on required IP/Adress and Port 
[root@ns1 ~]# netstat -auen  | grep ":53 "
udp        0      0 192.168.5.50:53             0.0.0.0:*                               25         56734      
udp        0      0 127.0.0.1:53                0.0.0.0:*                               25         56732  

Case II  : Different  IP address in /etc/hosts and NameServer Lookup – GIPCD not starting

****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE     OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> CSSD and GIPCD remains OFFLINE - switches STATE_DETAILS from STABLE to STARTING but doen't up

gipcd.trc:
2015-02-03 15:35:02.928327 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-02-03 15:35:02.928333 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos dep :  Cannot assign requested address (99)
2015-02-03 15:35:02.928337 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos loc :  bind
2015-02-03 15:35:02.928342 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos info:  addr '192.168.6.121:0'
2015-02-03 15:35:02.928391 :GIPCXCPT:937420544:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  failed to bind endp 0x7f4624027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.6.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7f4624033be0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7f4624033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 15:35:02.928405 :GIPCXCPT:937420544:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928419 :GIPCXCPT:937420544:  gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928429 :GIPCHDEM:937420544:  gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  terminating daemon thread due to exception
2015-02-03 15:35:02.928455 :GIPCXCPT:1281627904:  gipchaInternalRegister: daemon thread state invalid gipchaThreadStateFailed (5), ret gipcretFail (1)
2015-02-03 15:35:02.928477 :GIPCHGEN:1281627904:  gipchaRegisterF [gipchaInternalResolve : gipchaInternal.c : 1204]: EXCEPTION[ ret gipcretFail (1) ]  failed to register ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, name '(null)', flags 0x4000
2015-02-03 15:35:02.928544 :GIPCHGEN:1281627904:  gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 863]: EXCEPTION[ ret gipcretFail (1) ]  failed to resolve ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, host 'hract21', port 'gipcdha_hract21_', flags 0x0
2015-02-03 15:35:02.928569 :GIPCXCPT:1281627904:  gipcInternalResolve: failed to resolve addr 0x7f4638099680 [000000000000016a] { gipcAddress : name 'gipcha://hract21:gipcdha_hract21_', objFlags 0x0, addrFlags 0x4 }, ret gipcretFail (1)
 
Verify Error with OS commands
[grid@hract21 trace]$ nslookup hract21
Server:        192.168.5.50
Address:    192.168.5.50#53
Name:    hract21.example.com
Address: 192.168.5.121

[grid@hract21 trace]$ ping hract21
PING hract21 (192.168.6.121) 56(84) bytes of data.
--> Opps why to different results for nslookup and ping ?
Verify IP address from  /etc/hosts
[grid@hract21 trace]$ grep hract21 /etc/hosts
192.168.6.121 hract21 hract21.example.com

Verify Error with cluvfy  
[grid@hract21 CLUVFY]$ cluvfy comp nodereach -n  hract21
Verifying node reachability 
Checking node reachability...
PRVF-6006 : unable to reach the IP addresses "hract21" from the local node
PRKC-1071 : Nodes "hract21" did not respond to ping in "3" seconds, 
PRKN-1035 : Host "hract21" is unreachable
Verification of node reachability was unsuccessful on all the specified nodes. 

-> Fix : Keep your /etc/hosts and your Bind server in sync 
         When Changing Bind Server always verify the change in /etc/hosts too

 

Case III : Wrong Cluster Interconnect Address – GIPCD not starting

[root@hract21 Desktop]#  watch crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      hract21         STARTING
ora.cssdmonitor                1   ONLINE    ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    INTERMEDIATE hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> GPNPD remains in status INTERMEDIATE GIPCD is in state OFFLINE

gipcd.trc:
2015-02-03 16:39:18.324221 :GIPCHDEM:20907776:  gipchaDaemonThread: starting daemon thread hctx 0x22d39b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'df31173e-00000000', name2 02ff-37da-c08f-50b4, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xcd60 }
2015-02-03 16:39:23.327691 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc032310 [0000000000000308] { gipcAddress : name 'tcp://192.168.5.121', objFlags 0x0, addrFlags 0x5 }
2015-02-03 16:39:23.327721 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-02-03 16:39:23.327727 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos dep :  Cannot assign requested address (99)
2015-02-03 16:39:23.327732 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos loc :  bind
2015-02-03 16:39:23.327736 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.121:0'
2015-02-03 16:39:23.327806 :GIPCXCPT:20907776:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 16:39:23.327823 :GIPCXCPT:20907776:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327838 :GIPCXCPT:20907776:  gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327851 :GIPCHDEM:20907776:  gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  terminating daemon thread due to exception
2015-02-03 16:39:23.327943 : GIPCNET:20907776:  gipcmodNetworkUnprepare: failed to unprepare waits for endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x8, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x26008000, flags-2 0x0, usrFlags 0x20020 }
--> Here bind system call fails with errno 99 which mean this IP  192.168.5.121 address is not available yet ! 
[root@hract21 Desktop]# cat /usr/include/asm-generic/errno.h | grep 99
#define    EADDRNOTAVAIL    99    /* Cannot assign requested address */

Verify Error with OS commands:
[root@hract21 Desktop]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.6.121  Bcast:192.168.6.255  Mask:255.255.255.0
[root@hract21 Desktop]#  ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 08:00:27:4E:C9:BF  
          inet addr:192.168.2.121  Bcast:192.168.2.255  Mask:255.255.255.0
[root@hract21 Desktop]#   $GRID_HOME/bin/gpnptool get 2>/dev/null  |  xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id'
    <gpnp:HostNetwork id="gen" HostName="*">
      <gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/>
      <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/>
  <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
  <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/>
--> GPnPD expects PUBLIC interface eth1 to be bound on IP Adress 192.168.5.121 and not 192.168.6.121

Verify Error with cluvfy:
[grid@hract21 CLUVFY]$  cluvfy comp gpnp -n hract21
Verifying GPNP integrity 
--> cluvfy comp gpnp hangs 

Fix: Change interface eth1 back to  192.168.5.121 and reboot cluster stack

 

Case IV   :  DHCP server returns wrong IP address – VIPs not starting

  • Multiple DHCP server
  • DHCP server not available
Lower CRS stack starts 
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    ONLINE       hract21         STABLE
ora.cluster_interconnect.haip  1   ONLINE    ONLINE       hract21         STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssdmonitor                1   ONLINE    ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    ONLINE       hract21         OBSERVER,STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    ONLINE       hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    ONLINE       hract21         STABLE
--> Lower CRS stack is up and running 

Vips are in state STARTING 
ora.hract21.vip                1   ONLINE       OFFLINE      hract21         STARTING  
ora.hract22.vip                1   ONLINE       ONLINE       hract22         STABLE  
ora.hract23.vip                1   ONLINE       ONLINE       hract23         STABLE  
ora.mgmtdb                     1   ONLINE       ONLINE       hract23         Open,STABLE  
ora.oc4j                       1   ONLINE       ONLINE       hract22         STABLE  
ora.scan1.vip                  1   ONLINE       OFFLINE      hract21         STARTING 

crsd_orarootagent_root.trc
2015-02-03 12:06:42.065910 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP client id = hract21-vip
2015-02-03 12:06:42.065929 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP Server Port = 67
2015-02-03 12:06:42.065940 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet from = 192.168.5.121
2015-02-03 12:06:42.065949 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet to = 255.255.255.255
2015-02-03 12:06:47.068966 :GIPCXCPT:2822174464:  gipcWaitF [clsdhcp_sendmessage : clsdhcp.c : 616]: 
       EXCEPTION[ ret (uknown) (910) ]  failed to wait on obj 0x7fcb8c04d770 [0000000000000ddf]
      { gipcEndpoint : localAddr 'udp://0.0.0.0:68', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, 
     objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fcb8c037e70, sendp 0x7fcb8c037cb0 status 13flags 0x20000002, flags-2 0x0, usrFlags 0x8000 }, reqList 0x7fcba8364658, nreq 1, creq 0x7fcba8364b20 timeout 5000 ms, flags 0x4000
--> After sending an DHCP request - we fail in  gipcWaitF  which means we have some troubles to contact our DHCP server
    or getting the reqired DHCP address 

Verify Error with OS commands
Download and Install dhcping:
Download location:  http://pkgs.repoforge.org/dhcping  following package : dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# rpm -i  /media/sf_kits/Linux/dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# dhcping -i eth1
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0 
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0
no answer
--> Here we see that we get a wrong DHCP address
[root@ns1 dhcp]# dhcping -h   08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
no answer
--> This confirms that our DHCP server is running on wrong IP addess ( 192.168.3.50 ) and 
    can server an DHCP request for a s 192.168.5.xx address

Working dhcping output - just for reference :
[root@hract21 Desktop]#  dhcping -h   08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
Got answer from: 192.168.5.50

Verify Error with cluvfy  commands
[root@hract21 CLUVFY]#  cluvfy comp dhcp -clustername ract2 -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
PRVG-5726 : Failed to discover DHCP servers on public network listening on port "67" using command "/u01/app/121/grid/bin/crsctl discover dhcp -clientid ract2-scan1-vip "
CRS-10010: unable to discover DHCP server in the network listening on port 67 for client ID ract2-scan1-vip
CRS-4000: Command discover failed, or completed with errors.
PRVF-5704 : No DHCP server were discovered on the public network listening on port 67
Verification of DHCP Check was unsuccessful on all the specified nodes. 

Additonal info about DHCP setup  
- I always look at /etc/dhcpd.conf wich is wrong - use /etc/dhcp/dhcpd.conf file instead !
- Note if changing  /etc/dhcpd.conf you may need change /etc/sysconfig/dhcpd 
DHCP config files: 
/etc/dhcp/dhcpd.conf 
/etc/sysconfig/dhcpd

 

Case V   : Wrong GNS VIP address – GNS not starting

[root@hract21 network-scripts]#  watch 'crs | grep gns'
ora.gns                        1   ONLINE       OFFLINE      -               STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABLE
-> GNS VIP is ONLINE but GNS doesn't sart 

gnsd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
    CLSB:489064000: Argument count (argc) for this daemon is 7
    CLSB:489064000: Argument 0 is: /u01/app/121/grid/bin/gnsd.bin
    CLSB:489064000: Argument 1 is: -trace-level
    CLSB:489064000: Argument 2 is: 1
    CLSB:489064000: Argument 3 is: -ip-address
    CLSB:489064000: Argument 4 is: 192.168.6.58
    CLSB:489064000: Argument 5 is: -startup-endpoint
    CLSB:489064000: Argument 6 is: ipc://GNS_hract21_4625_9fe54b1833d5fbd2
2015-02-03 17:29:15.339039 :   CLSNS:489064000: main::clsns_SetTraceLevel:trace level set to 1.
2015-02-03 17:29:16.226261 :     GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226283 :     GNS:489064000: main::clsgndmain: GNS starting on hract21. Process ID: 29196
2015-02-03 17:29:16.226299 :     GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226338 :     GNS:489064000: main::clsgnSetTraceLevel: trace level set to 1.
..
2015-02-03 17:29:17.490335 :     GNS:489064000: main::clsgndGetInstanceInfo: version: 12.1.0.2.0 (0xc100200) 
                                 endpoints: tcp://192.168.6.58:63806 process ID: "29196" state: "Initializing".
2015-02-03 17:29:17.491219 :     GNS:489064000: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
2015-02-03 17:29:17.496441 :     GNS:349841152: Resolve::clsgndnsCreateContainerCallback: listening on port 53 address "192.168.6.58"
2015-02-03 17:29:17.499552 :  CLSDMT:351942400: PID for the Process [29196], connkey 12
2015-02-03 17:29:17.505626 :     GNS:343537408: Command #0::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.512072 :     GNS:4160747264: Command #1::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.516675 :     GNS:4156544768: Command #2::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.518326 :     GNS:4154443520: Command #3::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.747693 :     GNS:4152342272: Self-check::clsgndscRun: Name: "GNSTESTHOST.grid12c.example.com" Address: 1.2.3.4.
2015-02-03 17:29:53.882538 :     GNS:351942400: main::clsgndCLSDMExit: CLSDM request to quit received - requester: agent.
2015-02-03 17:29:53.882610 :     GNS:351942400: main::clsgndCLSDMExit: terminating GNSD on behalf of CLSDM - requester: agent.
--> Here we have some troubles as GNS was terminated

crsd_orarootagent_root.trc:
2015-02-03 17:29:24.470729 :   CLSNS:292816640: main::clsnsgFind:(:CLSNS00230:):query to find 
     GNS using service name "_Oracle-GNS._tcp" failed.: 1: clskec:has:CLSNS:5 3 args[has:CLSNS:5][mod=clsns_DNSSD_FindServers][loc=(:CLSNS00152:)]
2015-02-03 17:29:24.470771 :     
     GNS:292816640: main::clsgnctrGetGNSAddressUsingCLSNS: (:CLSGN01053:) GNS address retrieval failed with 
     error CLSNS-00025 (GNS_SERV_FIND_FAIL) - throwing CLSGN-00070. 1: clskec:has:CLSNS:25 3 args[has:CLSNS:25][mod=clsnsgFind][loc=(:CLSNS00216:)]

Verify Error with OS commands:
Check GNS and PUBLIC network interface 
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com
Check the PUBLIC network interface 
[root@hract21 network-scripts]# ifconfig
eth1:1    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.156  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.157  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:3    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.153  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:4    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.151  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:5    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.152  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:6    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.6.58  Bcast:192.168.6.255  Mask:255.255.255.0
-->  VIPs are using 192.168.5.X as base address whereas our GNS VIP is using: 192.168.6.58
     This is not correct VIPs a GNS VIP should have the same Network address !

[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com

Let's investigate whether somebody changed the GNS base add
[grid@hract21 trace]$ grep clsgndadvAdvertise gnsd.trc
Lets check wether the GNS base address was changed :
2015-02-02 12:32:09.447471 : GNS:3141969472: main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.5.58:46453.
2015-02-03 17:22:00.410829 : GNS:4114409024: main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.5.58:25702.
2015-02-03 17:24:51.165609 : GNS:2221307456: main::clsgndadvAdvertise: 
                              Listening for commands on endpoint(s):tcp://192.168.6.58:27105.
2015-02-03 17:29:17.491219 : GNS:489064000:  main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
--> GNS base address was changed from  192.168.5.58 to 192.168.6.58 ! 

Verify Error with cluvy
[grid@hract21 CLUVFY]$  cluvfy comp gns -postcrsinst  -verbose
Verifying GNS integrity 
Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "grid12c.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
PRVF-5213 : GNS resource configuration check failed
PRCI-1156 : The GNS VIP 192.168.6.58 does not match any of the available subnets 192.168.5.0, 192.168.2.0.
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.6.58" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "grid12c.example.com" are reachable
WARNING: 
PRVF-5218 : "hract21-vip.grid12c.example.com" did not resolve into any IP address
PRVF-5827 : The response time for name lookup for name "hract21-vip.grid12c.example.com" exceeded 15 seconds
Checking status of GNS resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  hract21       no                        yes                     
  hract22       no                        yes                     
  hract23       no                        yes                     
PRVF-5211 : GNS resource is not running on any node of the cluster
Checking status of GNS VIP resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  hract21       yes                       yes                     
  hract22       no                        yes                     
  hract23       no                        yes                     
GNS integrity check failed
Verification of GNS integrity was unsuccessful. 
Checks did not pass for the following node(s):
    hract21
--> Cluvfy is very helpfull here as cluvfy compares the network adresses with the GNS address
    If GNS and network addresses don't match cluvfy throws PRVF-5213, PRCI-1156 error.

Fix -> Change GNS VIP back to the original address  and restart GNS
[root@hract21 network-scripts]# srvctl modify gns -vip 192.168.5.58
[root@hract21 network-scripts]# srvctl config gns 
  GNS is enabled.
  GNS VIP addresses: 192.168.5.58
  Domain served by GNS: grid12c.example.com
[root@hract21 network-scripts]# srvctl start gns
[root@hract21 network-scripts]# srvctl config gns -a -l
  GNS is enabled.
  GNS is listening for DNS server requests on port 53
  GNS is using port 5353 to connect to mDNS
  GNS status: OK
  Domain served by GNS: grid12c.example.com
  GNS version: 12.1.0.2.0
  Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12
  Name of the cluster where GNS is running: ract2
  Cluster type: server.
  GNS log level: 1.
  GNS listening addresses: tcp://192.168.5.58:30218.
  GNS is individually enabled on nodes: 
  GNS is individually disabled on nodes: 

Reference

Recreate GNS 12102

Backup profile.xml and OCR and gather data of current GNS setup

As of 12.1/11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the 
gpnp profile -  please take a backup of profile.xml on all cluster nodes before proceeding, as grid user:

[root@hract21 ~]# cd $GRID_HOME/gpnp/hract21/profiles/peer/
[root@hract21 peer]#  cp profile.xml profile.xml_bk-2-FEB-2015
[root@hract21 peer]#  ocrconfig -local -manualbackup
hract21     2015/02/02 09:04:23     /u01/app/121/grid/cdata/hract21/backup_20150202_090423.olr     0     
hract21     2015/01/30 12:40:51     /u01/app/121/grid/cdata/hract21/backup_20150130_124051.olr     0     
[root@hract21 peer]#  ocrconfig -local -showbackup
hract21     2015/02/02 09:04:23     /u01/app/121/grid/cdata/hract21/backup_20150202_090423.olr     0     
hract21     2015/01/30 12:40:51     /u01/app/121/grid/cdata/hract21/backup_20150130_124051.olr     0  

[root@hract21 peer]# oifcfg getif
eth1  192.168.5.0  global  public
eth2  192.168.2.0  global  cluster_interconnect,asm

[root@hract21 peer]# crsctl status resource ora.gns.vip -f | grep USR_ORA_VIP
USR_ORA_VIP=192.168.5.58

[root@hract21 peer]#  ifconfig eth1 | egrep 'eth|inet addr'
eth1      Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.121  Bcast:192.168.5.255  Mask:255.255.255.0
[root@hract21 peer]# ifconfig eth2  | egrep 'eth|inet addr'
eth2      Link encap:Ethernet  HWaddr 08:00:27:4E:C9:BF  
          inet addr:192.168.2.121  Bcast:192.168.2.255  Mask:255.255.255.0
[root@hract21 peer]#  ifconfig eth3   | egrep 'eth|inet addr'
eth3      Link encap:Ethernet  HWaddr 08:00:27:3B:89:BF  
          inet addr:192.168.3.121  Bcast:192.168.3.255  Mask:255.255.255.0

[root@hract21 peer]#  srvctl config gns -a -l
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: grid12c.example.com
GNS version: 12.1.0.2.0
Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12
Name of the cluster where GNS is running: ract2
Cluster type: server.
GNS log level: 1.
GNS listening addresses: tcp://192.168.5.58:39839.
GNS is individually enabled on nodes: 
GNS is individually disabled on nodes: 

Stop resources and recreate  gns, nodeapps

[root@hract21 peer]#  srvctl stop scan_listener 
[root@hract21 peer]#  srvctl stop scan
[root@hract21 peer]#  srvctl stop nodeapps -f
PRCC-1016 : ons was already stopped
PRCR-1005 : Resource ora.ons is already stopped

[root@hract21 peer]#  srvctl stop gns
[root@hract21 Desktop]#  srvctl remove gns 
Remove GNS? (y/[n]) y


[root@hract21 Desktop]# srvctl remove nodeapps
Please confirm that you intend to remove node-level applications on all nodes of the cluster (y/[n]) y
[root@hract21 Desktop]# srvctl  add gns -i 192.168.5.58 -d  grid12c.example.com
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.5.58
Domain served by GNS: grid12c.example.com
[root@hract21 Desktop]# srvctl config gns -list
CLSNS-00005: operation timed out
  CLSNS-00025: unable to locate GNS
    CLSGN-00070: Service location failed.
[root@hract21 Desktop]# srvctl start gns
[root@hract21 Desktop]# srvctl config gns -list
Oracle-GNS A 192.168.5.58 Unique Flags: 0x115
ract2.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 46453 Weight: 0 Priority: 0 Flags: 0x115
ract2.Oracle-GNS TXT CLUSTER_NAME="ract2", CLUSTER_GUID="3d7c30fc9a0eeff3ff12b79970a14c12", NODE_NAME="hract21", SERVER_STATE="RUNNING", VERSION="12.1.0.2.0", DOMAIN="grid12c.example.com" Flags: 0x115
--> No VIPs there  

Recreate Nodeapps
[root@hract21 Desktop]#  srvctl add nodeapps -S 192.168.5.0/255.255.255.0/eth1 
 [root@hract21 Desktop]#  srvctl config gns -list
Oracle-GNS A 192.168.5.58 Unique Flags: 0x115
hract21-vip A 192.168.5.246 Unique Flags: 0x1
hract22-vip A 192.168.5.239 Unique Flags: 0x1
hract23-vip A 192.168.5.244 Unique Flags: 0x1
ract2.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 46453 Weight: 0 Priority: 0 Flags: 0x115
ract2.Oracle-GNS TXT CLUSTER_NAME="ract2", CLUSTER_GUID="3d7c30fc9a0eeff3ff12b79970a14c12", NODE_NAME="hract21", SERVER_STATE="RUNNING", VERSION="12.1.0.2.0", DOMAIN="grid12c.example.com" Flags: 0x115
--> Now VIPs should be ONLINE 
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.hract21.vip                1   ONLINE       ONLINE       hract21         STABLE  
ora.hract22.vip                1   ONLINE       ONLINE       hract22         STABLE  
ora.hract23.vip                1   ONLINE       ONLINE       hract23         STABLE 

Restart SCAN and SCAN Listeners
[root@hract21 Desktop]#  srvctl start scan
--> Now SCANs should be ONLINE
ora.scan1.vip                  1   ONLINE       ONLINE       hract22         STABLE  
ora.scan2.vip                  1   ONLINE       ONLINE       hract23         STABLE  
ora.scan3.vip                  1   ONLINE       ONLINE       hract21         STABLE  

[root@hract21 Desktop]# srvctl start scan_listener
--> Now SCAN_LISTENER should be ONLINE
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.LISTENER_SCAN1.lsnr        1   ONLINE       ONLINE       hract22         STABLE  
ora.LISTENER_SCAN2.lsnr        1   ONLINE       ONLINE       hract23         STABLE  
ora.LISTENER_SCAN3.lsnr        1   ONLINE       ONLINE       hract21         STABLE  

Verify GNS
[root@hract21 Desktop]#   srvctl config gns -list
Oracle-GNS A 192.168.5.58 Unique Flags: 0x115
hract21-vip A 192.168.5.246 Unique Flags: 0x1
hract22-vip A 192.168.5.239 Unique Flags: 0x1
hract23-vip A 192.168.5.244 Unique Flags: 0x1
ract2-scan A 192.168.5.238 Unique Flags: 0x1
ract2-scan A 192.168.5.243 Unique Flags: 0x1
ract2-scan A 192.168.5.245 Unique Flags: 0x1
ract2-scan1-vip A 192.168.5.243 Unique Flags: 0x1
ract2-scan2-vip A 192.168.5.245 Unique Flags: 0x1
ract2-scan3-vip A 192.168.5.238 Unique Flags: 0x1
ract2.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 46453 Weight: 0 Priority: 0 Flags: 0x115
ract2.Oracle-GNS TXT CLUSTER_NAME="ract2", CLUSTER_GUID="3d7c30fc9a0eeff3ff12b79970a14c12", NODE_NAME="hract21", 
   SERVER_STATE="RUNNING", VERSION="12.1.0.2.0", DOMAIN="grid12c.example.com" Flags: 0x115
--> VIPS, SCAN and SCAN VIPS should be ONLINE 
    Congrats you have successfully reconfigured GNS on 12.1.0.2 !

Potential problem : PRCN-2065,PRCN-2067  during recreating nodeapps

Note stopping nodeapps should stop the ONS !
[grid@hract21 trace]$  srvctl stop nodeapps -n hract21 -f
*****  Local Resources: *****
Rescource NAME                 TARGET     STATE           SERVER       STATE_DETAILS                       
-------------------------      ---------- ----------      ------------ ------------------                  
ora.ons                        OFFLINE    OFFLINE         hract21      STABLE   
ora.ons                        ONLINE     ONLINE          hract22      STABLE   
ora.ons                        ONLINE     ONLINE          hract23      STABLE   
[root@hract21 Desktop]# netstat -tapen | egrep '6100|6200'
-> Ons is stopped - port 6100 and 6200 not actice !
Sometimes during my testing  the remote  ONS port was still active after  :
  srvctl stop nodeapps -f
Later if we try to create the nodeapps we get the following error:
[root@hract21 Desktop]#  srvctl add nodeapps -S 192.168.5.0/255.255.255.0/eth1
PRCN-2065 : Ports 6200 are not available on the nodes given
PRCN-2067 : Port 6200 is not available on nodes: hract21,hract22,hract23

Verify TCP prot  status :
[root@hract22 ~]# netstat -taupen | grep 6200
tcp        0      0 :::6200                    ..  LISTEN      501        441704     21856/ons           
tcp        0      0 ::ffff:192.168.5.122:6200  ..  ESTABLISHED 501        67450915   21856/ons           
tcp        0      0 ::ffff:192.168.5.122:6200  ..  ESTABLISHED 501        72457163   21856/ons 
ONS was still running a occupied port 6200. This creates the above error ! 

WA: use the -skip parameter ( for details please read BUG 18317414 ) 
What is this really doing ?
[root@hract21 Desktop]# srvctl add nodeapps -skip -help
    -skip        Skip reachability check of VIP address and port validation for ONS

Now recreate the nodeapps with the skip paramter
[root@hract21 Desktop]#   srvctl add nodeapps  -skip  -S 192.168.5.0/255.255.255.0/eth1
--> Worked !!

Reference

  • Bug 18317414 : LNX64-12.1-INSTALL-SCC:RERUN ROOT.SH FAILED AT ADD NODEAPPS

Troubleshooting Clusterware and Clusterware component error : Address already in use

Generic RAC Portnumber Information

                                                  Default Port   Port Range  Protocol  Used for 
                                                  Number                               CI only? 
Cluster Synchronization Service daemon (CSSD)     42424          Dynamic     UDP       Yes
Oracle Grid Interprocess Communication (GIPCD)    42424          Dynamic     UDP       Yes
Oracle HA Services daemon (OHASD)                 42424          Dynamic     UDP       Yes
Multicast Domain Name Service (MDNSD)              5353          Dynamic     UDP/TCP    No 
Oracle Grid Naming Service (GNSD)                    53          53 (public) UDP/TCP    No
Oracle Notification Services (ONS)                 6100 (local)  Configured  TCP        No
                                                   6200 (remote)   manually
    
Port 42424 :
CSSD  : The Cluster Synchronization Service (CSS) daemon uses a fixed port for node restart 
        advisory messages.This port is used on all interfaces that have broadcast capability. 
        Broadcast  occurs only when a node  eviction restart is imminent.
OHASD : The Oracle High Availability Services (OHAS) daemon starts the Oracle Clusterware 
         stack.
GIPCD : A support daemon that enables Redundant Interconnect Usage.

Port 5353 :
MDNSD : The mDNS process is a background process on Linux and UNIX, and a service on Window, 
        and is necessary  for Grid Plug and Play and GNS.

Port 53: 
GNSD  : The Oracle Grid Naming Service daemon provides a gateway between the cluster mDNS and 
        external DNS servers. 
        The gnsd process performs name resolution within the cluster.

Port 6100/6200 :
ONS   : Port for ONS, used to publish and subscribe service for communicating information about 
        Fast Application Notification (FAN) events. The FAN notification process uses system 
        events that Oracle Database publishes  when cluster servers become unreachable or if 
        network interfaces fail.
        Use srvctl to modify ONS port

Verify port usage at OS level
As GNS runs only on a single node the cluster we need to Relocate GNS first :
[root@hract21 ~]# srvctl relocate gns -n hract21 

[root@hract21 Desktop]#  netstat -taupen |grep ":42424 "
udp        0      0 192.168.2.255:42424         0.0.0.0:*  0          10361774   11545/ohasd.bin     
udp        0      0 230.0.1.0:42424             0.0.0.0:*  0          10361773   11545/ohasd.bin     
udp        0      0 224.0.0.251:42424           0.0.0.0:*  0          10361772   11545/ohasd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*  501        10361732   11764/gipcd.bin     
udp        0      0 230.0.1.0:42424             0.0.0.0:*  501        10361731   11764/gipcd.bin     
udp        0      0 224.0.0.251:42424           0.0.0.0:*  501        10361730   11764/gipcd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*  501        10361722   11825/ocssd.bin     
udp        0      0 230.0.1.0:42424             0.0.0.0:*  501        10361721   11825/ocssd.bin     
udp        0      0 224.0.0.251:42424           0.0.0.0:*  501        10361720   11825/ocssd.bin 

[root@hract21 Desktop]# netstat -taupen |grep ":53 "
udp        0      0 192.168.5.58:53             0.0.0.0:*   0          46593880   5261/gnsd.bin  

[root@hract21 Desktop]#  netstat -taupen |grep ":5353 "
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378331    11724/mdnsd.bin     
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378210    11724/mdnsd.bin     
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378209    11724/mdnsd.bin     
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378208    11724/mdnsd.bin 

[root@hract21 Desktop]#  netstat -taupen |grep ":6100 "
tcp        0      0 127.0.0.1:6100     0.0.0.0:*     LISTEN  501  10419706   31762/ons    
..

 

Prepare Test program JavaUDPServer.java

Source can be found here : Simple Java example of UDP Client/Server communication

[root@hract21 JAVA]#  javac JavaUDPServer.java

Testing when a port is free and our program can successful listen to that port: 
[root@hract21 JAVA]# java  JavaUDPServer 59
Listening on UDP Port: 59
--> press <cntrl>C to terminate the program

Testing program  when port is already in use
[root@hract21 JAVA]# java  JavaUDPServer  53
Listening on UDP Port: 53
Jan 31, 2015 4:57:29 PM JavaUDPServer main
SEVERE: null
java.net.BindException: Address already in use
    at java.net.PlainDatagramSocketImpl.bind0(Native Method)
    at java.net.PlainDatagramSocketImpl.bind(PlainDatagramSocketImpl.java:125)
    at java.net.DatagramSocket.bind(DatagramSocket.java:372)

 

Case I: Clusterware startup fails as  Portnumber:  42424  is in use

Start our test program to block UPD port 42424
[root@hract21 JAVA]#  java  JavaUDPServer  42424
Listening on UDP Port: 42424

Start CRS and monitor local CRS stack
[root@hract21 Desktop]# crsct start crs
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    OFFLINE      -               STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      hract21         STARTING
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE

--> evmd process remain in status INTERMEDIATE . Local cluster stack doesn't up !
Investigate Trace files:
alert.log: 
2015-01-31 17:14:54.492 [CSSDAGENT(22642)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:9:3} in /u01/app/grid/diag/crs/hract21/crs/trace/ohasd_cssdagent_root.trc.
Sat Jan 31 17:14:59 2015
Errors in file /u01/app/grid/diag/crs/hract21/crs/trace/ocssd.trc  (incident=1):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []

gipcd.trc:
2015-01-31 17:20:27.606277 :GIPCHTHR:812046080:  gipchaWorkerCreateInterface: created local interface for node 'hract21', haName 'gipcd_ha_name', inf 'udp://192.168.2.121:28764' inf 0x7fef0c190b30
2015-01-31 17:20:27.606350 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: failed to bind endp 0x7fef182d8230 [000000000001e71a] { gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fef182da320 status 13flags 0x20000000, flags-2 0x0, usrFlags 0xc000 }, addr 0x7fef182d8cf0 [000000000001e71c] { gipcAddress : name 'mcast://224.0.0.251:42424/192.168.2.121', objFlags 0x0, addrFlags 0x5 }
2015-01-31 17:20:27.606358 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos op  :  sgipcnMctBind
2015-01-31 17:20:27.606360 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 17:20:27.606361 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos loc :  bind
2015-01-31 17:20:27.606363 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos info:  Invalid argument
2015-01-31 17:20:27.606399 :GIPCXCPT:812046080:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed to bind endp 0x7fef182d8230 [000000000001e71a] { gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fef182da320 status 13flags 0x20000000, flags-2 0x0, usrFlags 0xc000 }, addr 0x7fef182d9a20 [000000000001e721] { gipcAddress : name 'mcast://224.0.0.251:42424/192.168.2.121', objFlags 0x0, addrFlags 0x4 }, flags 0x8000
2015-01-31 17:20:27.606408 :GIPCXCPT:812046080:  gipcInternalEndpoint: failed to bind address to endpoint name 'mcast://224.0.0.251:42424/192.168.2.121', ret gipcretAddressInUse (20)
2015-01-31 17:20:27.606426 :GIPCHTHR:812046080:  gipchaWorkerUpdateInterface: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed to create local interface 'udp://192.168.2.121', 0x7fef0c190b30 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.2.121', subnet '192.168.2.0', mask '255.255.255.0', mac '08-00-27-4e-c9-bf', ifname 'eth2', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }, hctx 0x10639b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid '8c45d6e7-00000000', name2 3aca-bf27-17d5-691e, numNode 0, numInf 1, maxPriority 0, clientMode 1, nodeIncarnation d64c9b7c-06451148 usrFlags 0x0, flags 0x2d65 }
2015-01-31 17:20:27.606432 :GIPCHGEN:812046080:  gipchaInterfaceDisable: disabling interface 0x7fef0c190b30 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.2.121', subnet '192.168.2.0', mask '255.255.255.0', mac '08-00-27-4e-c9-bf', ifname 'eth2', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }
2015-01-31 17:20:27.606438 :GIPCHDEM:812046080:  gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7fef0c190b30 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.2.121', subnet '192.168.2.0', mask '255.255.255.0', mac '08-00-27-4e-c9-bf', ifname 'eth2', numRef 0, numFail 0, idxBoot 0, flags 0x1861 }
2015-01-31 17:20:27.60

Investigate the error more in detail  : 
gipcmodNetworkProcessBind: slos dep :  Address already in use (98) 
[root@hract21 Desktop]# cat  /usr/include/asm-generic/errno.h | grep 98
#define    EADDRINUSE    98    /* Address already in use */

Locate the port number :
gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121 -->   42424 is the port 
--> CW   can't listen on port 42424 ! 

Locate the  blocking process at OS level
[root@hract21 Desktop]#    netstat -taupen |grep ":42424 "
udp        0      0 :::42424         ....    22338/java          
[root@hract21 Desktop]# ps -elf | grep 22338
0 S root     22338 26783  0  80   0 - 438331 futex_ 17:04 pts/12  00:00:01 java JavaUDPServer 42424

--> Yep our java program blocks CW from comming up ! Kill java program and restart CW 
[root@hract21 Desktop]# kill -9 22338
[root@hract21 Desktop]# crsctl stop crs -f
[root@hract21 Desktop]# crsctl start crs

 

Case II: Clusterware startup fails as Portnumber  5353  is in use

Start our test program and block MDSND port 5353
[root@hract21 JAVA]#  java  JavaUDPServer 5353
Listening on UDP Port: 5353

*****  Local Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE       ONLINE       hract21         STABLE
ora.cluster_interconnect.haip  1   ONLINE    ONLINE       hract21         STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    ONLINE       hract21         OBSERVER,STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    ONLINE       hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    INTERMEDIATE hract21         STABLE
ora.storage                    1   ONLINE    ONLINE       hract21         STABLE
--> MDMSD daemon doesn' start 

mdnsd.trc :
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
    CLSB:2559100480: Argument count (argc) for this daemon is 1
    CLSB:2559100480: Argument 0 is: /u01/app/121/grid/bin/mdnsd.bin
2015-01-31 17:40:17.131516 :  CLSDMT:2554820352: PID for the Process [9863], connkey 9
2015-01-31 17:40:18.042329 :    MDNS:2559100480:  mdnsd interface eth0 (0x2 AF=2 f=0x1043 mcast=-1) 192.168.1.9 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.043191 :    MDNS:2559100480:  mdnsd interface eth1 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.046952 :    MDNS:2559100480:  mdnsd interface eth1:1 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.241 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.047574 :    MDNS:2559100480:  mdnsd interface eth1:2 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.242 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.047597 :    MDNS:2559100480:  mdnsd interface eth2 (0x4 AF=2 f=0x1043 mcast=-1) 192.168.2.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.047612 :    MDNS:2559100480:  mdnsd interface eth2:1 (0x4 AF=2 f=0x1043 mcast=-1) 169.254.213.86 mask 255.255.0.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.049171 :    MDNS:2559100480:  mdnsd interface eth3 (0x5 AF=2 f=0x1043 mcast=-1) 192.168.3.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.049222 :    MDNS:2559100480:  mdnsd interface lo (0x1 AF=2 f=0x49 mcast=-1) 127.0.0.1 mask 255.0.0.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.049236 :    MDNS:2559100480:  Error! No valid netowrk interfaces found to setup mDNS.
2015-01-31 17:40:18.049240 :    MDNS:2559100480:  Oracle mDNSResponder ver. mDNSResponder-1076 (Jun 30 2014 19:39:45) , init_rv=-65537
2015-01-31 17:40:18.049335 :    MDNS:2559100480:  stopping

--> Here we only get the error :  Address already in use  but info about  the portnumber. 
    We need to reference above list and remember that MSDNS is running on port 5353 

Now we can locate the blocking process , kill that process and restart clusterware
[root@hract21 Desktop]#  netstat -taupen |grep ":5353 "
udp        0      0 :::5353         ...            50111629   7252/java   
Again our java program prevents CW from startup. Kill the that process and resart CW.
[root@hract21 Desktop]# kill -9 7252

 

Case III: Investigate GNS startup problem due to Error:  Address already in use

Relocate GNS to a different host
[root@hract21 Desktop]# srvctl relocate gns -n hract23
ora.gns                        1   ONLINE       ONLINE       hract23         STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract23         STABLE

Now occupy port 53 by running our JAVA program:
[root@hract21 JAVA]# java  JavaUDPServer 53
Listening on UDP Port: 53

Now try to bring back the GNS 
[root@hract21 Desktop]# srvctl relocate gns -n hract21
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.gns                        1   ONLINE       OFFLINE      hract21         STARTING
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABLE
--> GNS is in status STARTING but doesn't come up

gnsd.trc :
2015-01-31 18:09:13.518516 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-01-31 18:09:13.518518 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 18:09:13.518520 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos loc :  bind
2015-01-31 18:09:13.518521 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.58:53'
2015-01-31 18:09:13.518577 :GIPCXCPT:255158016:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed to bind endp 0x7ff7000034c0 [0000000000001fc6] { gipcEndpoint : localAddr 'udp://192.168.5.58:53', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7ff7000050f0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x24000 }, addr 0x7ff7000047f0 [0000000000001fcd] { gipcAddress : name 'udp://192.168.5.58:53', objFlags 0x0, addrFlags 0x4 }, flags 0x20000
2015-01-31 18:09:13.518589 :GIPCXCPT:255158016:  gipcInternalEndpoint: failed to bind address to endpoint name 'udp://192.168.5.58:53', ret gipcretAddressInUse (20)
2015-01-31 18:09:13.518608 :GIPCXCPT:255158016:  gipcEndpointF [clsgngipcCreateEndpointInternal : clsgngipc.c : 2008]: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed endp create ctx 0x7ff7196f3c80 [0000000000001e99] { gipcContext : traceLevel 2, fieldLevel 0x0, numDead 0, numPending 0, numZombie 0, numObj 4, numWait 0, numReady 0, wobj 0x7ff7196f1c10, hgid 0000000000001e9a, flags 0x1a, objFlags 0x0 }, name 'udp://192.168.5.58:53', flags 0x24000
2015-01-31 18:09:13.518728 :     GNS:255158016: Resolve::clsgndnsCreateContainerCallback: (:CLSGN01163:) Error - Address in use: port 53 address "192.168.5.58". 1: clskec:has:CLSGN:208 2 args[has:CLSGN:208][udp://192.168.5.58:53]
2: clskec:has:gipc:20 1 args[has:gipc:20]
3: clskec:has:CLSU:910 4 args[has][mod=gipcInternalEndpoint][loc=473][msg=failed to bind address to endpoint name 'udp://192.168.5.58:53']
2015-01-31 18:09:13.518769 :     GNS:255158016: Resolve::clsgndnsCreateContainer: (:CLSGN00927:) failed to listen on all addresses - throwing error.
default:255158016: listen failed with 1 errors
1: clskec:has:CLSGN:208 3 args[has:CLSGN:208][192.168.5.58][53]

The following error messages tell use Linux errno code and the related portnumber :
2015-01-31 18:09:13.518518 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 18:09:13.518521 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.58:53'

Again locate the port number and kill the process
[root@hract21 Desktop]#  netstat -taupen |grep ":53 "
udp    16128      0 :::53          ...          51417680   23723/java
Again kill the process which holds the port number and restart CW 
[root@hract21 Desktop]# kill -9  23723

Now test whether Relocated GNS works again
[root@hract21 ~]#   srvctl relocate gns -n hract21
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.gns                        1   ONLINE       ONLINE       hract21         STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABL

Complete Portnumber Usage of a working RAC system

[root@hract21 ~]#   netstat -taupen |grep 192.168
tcp        0      0 192.168.5.242:1521          0.0.0.0:*                   LISTEN      501        50803141   17310/tnslsnr       
tcp        0      0 192.168.5.241:1521          0.0.0.0:*                   LISTEN      501        50793916   17258/tnslsnr       
tcp        0      0 192.168.5.121:1521          0.0.0.0:*                   LISTEN      501        50793894   17258/tnslsnr       
tcp        0      0 192.168.2.121:1522          0.0.0.0:*                   LISTEN      501        50790436   17212/tnslsnr       
tcp        0      0 192.168.2.121:61020         0.0.0.0:*                   LISTEN      0          50773311   16994/osysmond.bin  
tcp        0      0 192.168.5.121:42942         0.0.0.0:*                   LISTEN      501        50724207   16454/gipcd.bin     
tcp        0      0 192.168.5.58:39839          0.0.0.0:*                   LISTEN      0          51856456   27381/gnsd.bin      
tcp        0      0 192.168.5.232:36063         0.0.0.0:*                   LISTEN      502        8145376    596/exectask        
tcp        0      0 192.168.5.121:15043         0.0.0.0:*                   LISTEN      0          50730332   16281/ohasd.bin     
tcp        0      0 192.168.5.121:42942         192.168.5.123:28657         ESTABLISHED 501        50841598   16454/gipcd.bin     
tcp        0      0 192.168.5.241:1521          192.168.5.121:55119         ESTABLISHED 501        50829166   17258/tnslsnr       
tcp        0      0 192.168.2.121:46847         192.168.2.122:1522          ESTABLISHED 0          50774509   17012/crsd.bin      
tcp        0      0 192.168.2.121:1522          192.168.2.123:60331         ESTABLISHED 501        50795614   17212/tnslsnr       
tcp        0      0 192.168.2.121:1522          192.168.2.121:16025         ESTABLISHED 501        50829535   17212/tnslsnr       
tcp        0      0 192.168.2.121:1522          192.168.2.122:54611         ESTABLISHED 501        50796842   17212/tnslsnr       
tcp        0      0 192.168.2.121:46865         192.168.2.122:1522          ESTABLISHED 501        50829527   17468/asm_lreg_+ASM 
tcp        0      0 192.168.5.242:1521          192.168.5.121:61101         ESTABLISHED 501        50838159   17310/tnslsnr       
tcp        0      0 192.168.5.121:42942         192.168.5.122:32304         ESTABLISHED 501        50841582   16454/gipcd.bin     
tcp        1      0 192.168.1.9:39471           80.150.192.73:80            CLOSE_WAIT  0          50900520   4786/clock-applet   
tcp        0      0 192.168.2.121:1522          192.168.2.121:16024         ESTABLISHED 501        50829534   17212/tnslsnr       
tcp        0      0 192.168.2.121:16024         192.168.2.121:1522          ESTABLISHED 501        50829529   17468/asm_lreg_+ASM 
tcp        0      0 192.168.2.121:16025         192.168.2.121:1522          ESTABLISHED 501        50829531   17468/asm_lreg_+ASM 
tcp        0      0 192.168.2.121:28139         192.168.2.123:1522          ESTABLISHED 501        50829525   17468/asm_lreg_+ASM 
tcp        0      0 192.168.5.121:64227         192.168.5.122:35547         ESTABLISHED 501        50790718   16454/gipcd.bin     
tcp        0      0 192.168.5.121:21046         192.168.5.123:6200          ESTABLISHED 501        50787900   17215/ons           
tcp        0      0 192.168.5.121:59844         192.168.5.50:22             ESTABLISHED 0          44509382   13726/ssh           
tcp        0      0 192.168.5.121:61101         192.168.5.242:1521          ESTABLISHED 502        50838158   17721/ora_lreg_bank 
tcp        0      0 192.168.5.58:39839          192.168.5.121:34266         TIME_WAIT   0          0          -                   
tcp        0      0 192.168.5.121:16432         192.168.5.122:6200          ESTABLISHED 501        50787901   17215/ons           
tcp        0      0 192.168.2.121:39861         192.168.2.123:61021         ESTABLISHED 0          50769440   16994/osysmond.bin  
tcp        0      0 192.168.5.121:55125         192.168.5.241:1521          ESTABLISHED 502        50837652   17721/ora_lreg_bank 
tcp        0      0 192.168.5.121:55119         192.168.5.241:1521          ESTABLISHED 501        50829165   17468/asm_lreg_+ASM 
tcp        0      0 192.168.5.241:1521          192.168.5.121:55125         ESTABLISHED 501        50837653   17258/tnslsnr       
tcp        0      0 192.168.5.121:10242         192.168.5.123:17701         ESTABLISHED 501        50790723   16454/gipcd.bin     
tcp        0      0 192.168.5.121:55728         192.168.5.123:22            ESTABLISHED 0          27679552   25184/ssh           
udp        0      0 192.168.2.121:35570         0.0.0.0:*                               0          50731287   16281/ohasd.bin     
udp        0      0 192.168.2.121:51962         0.0.0.0:*                               0          50751183   16922/octssd.bin    
udp        0      0 192.168.2.255:42424         0.0.0.0:*                               501        50734962   16537/ocssd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*                               0          50731290   16281/ohasd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*                               501        50725223   16454/gipcd.bin     
udp        0      0 192.168.2.121:15891         0.0.0.0:*                               501        50734959   16537/ocssd.bin     
udp        0      0 192.168.2.121:12075         0.0.0.0:*                               501        50782505   16408/evmd.bin      
udp        0      0 192.168.5.58:53             0.0.0.0:*                               0          51856599   27381/gnsd.bin      
udp        0      0 192.168.5.58:123            0.0.0.0:*                               38         51843931   1291/ntpd           
udp        0      0 192.168.5.242:123           0.0.0.0:*                               38         50803109   1291/ntpd           
udp        0      0 192.168.5.241:123           0.0.0.0:*                               38         50793859   1291/ntpd           
udp        0      0 192.168.3.121:123           0.0.0.0:*                               0          43573989   1291/ntpd           
udp        0      0 192.168.2.121:123           0.0.0.0:*                               0          43573987   1291/ntpd           
udp        0      0 192.168.5.121:123           0.0.0.0:*                               0          43573984   1291/ntpd           
udp        0      0 192.168.1.9:123             0.0.0.0:*                               0          43573983   1291/ntpd           
udp        0      0 192.168.2.121:53498         0.0.0.0:*                               0          50776026   17012/crsd.bin      
udp        0      0 192.168.2.121:45379         0.0.0.0:*                               501        50725220   16454/gipcd.bin

Troubleshooting hint CW startproblems due to  Address already in use errors

Before CW startup verify the the following ports are not in use at all 
[root@hract21 Desktop]#    netstat -taupen |grep ":42424 "
[root@hract21 Desktop]#    netstat -taupen |grep ":5353 "
[root@hract21 Desktop]#    netstat -taupen |grep ":53 "
[root@hract21 Desktop]#    netstat -taupen |egrep ":6100 |:6200"
If you find any processes not belonging to the Oracle Clusterware stack you need to kill/stop 
these processes


If having problem with Clusterware startup or CW components startup ( GSD, VIPs ) you may 
check your clusterware tracefils for  "Address already in use" Error .

Note the tracefile location has changed for RAC 12.1.0.2 :
[grid@hract21 trace]$  grep -l "Address already in use" *
gipcd.trc
gnsd.trc
mdnsd.trc
ocssd.trc
ohasd.trc

Now find details:
# grep "Address already in use" ohasd.trc  mdnsd.trc  ocssd.trc gnsd.trc  gipcd.trc gnsd.trc | grep "2015-01-31 17"
ohasd.trc:2015-01-31 17:30:16.613432 :GIPCXCPT:2420897536:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
mdnsd.trc:2015-01-31 17:40:18.049222 :    MDNS:2559100480:  mdnsd interface lo (0x1 AF=2 f=0x49 mcast=-1) 127.0.0.1 mask 255.0.0.0 FAILED. Error 98 (Address already in use)
ocssd.trc:2015-01-31 17:04:56.013085 :GIPCXCPT:3986515712:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
gipcd.trc:2015-01-31 17:06:25.204775 :GIPCXCPT:812046080:   gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
gnsd.trc:2015-01-31 18:09:13.518518 :GIPCXCPT:2551580oblematic Portnumber and 16:    gipcmodNetworkProcessBind: slos dep :  Address already in use (98)

For mdnsd.trc we already know the port number          :   5353   
For ohasd.trc, ocssd.trc, gipcd.trc the port number is :  42424
For GNS the tracefiles provides details about the problematic Portnumber and IP-adress
[grid@hract21 trace]$  grep gipcmodNetworkProcessBind  gnsd.trc  
2015-01-31 18:09:13.518483 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: failed to bind endp 0x7ff7000034c0 [0000000000001fc6] { gipcEndpoint : localAddr 'udp://192.168.5.58:53', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7ff7000050f0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x24000 }, addr 0x7ff7000038b0 [0000000000001fc8] { gipcAddress : name 'udp://192.168.5.58:53', objFlags 0x0, addrFlags 0x5 }
2015-01-31 18:09:13.518516 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-01-31 18:09:13.518518 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 18:09:13.518520 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos loc :  bind
2015-01-31 18:09:13.518521 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.58:53'

 

Reference

Debug Cluvfy error ERROR: PRVF-9802

ERROR: 
PRVF-9802 : Attempt to get udev information from node "hract21" failed
No UDEV rule found for device(s) specified


Checking: cv/log/cvutrace.log.0

          ERRORMSG(hract21): PRVF-9802 : Attempt to get udev information from node "hract21" failed
No UDEV rule found for device(s) specified

[Thread-757] [ 2015-01-29 15:56:44.157 CET ] [StreamReader.run:65]  OUTPUT><CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP>
</SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT>
<SLOS_OTHERINFO>No UDEV rule found for device(s) specified</SLOS_OTHERINFO>
</CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG>
<CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
</CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
  | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' 
  | sed -e 's/://' -e 's/\.\*/\*/g'</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
..
[Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:144]  runCommand: process returns 0
[Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:161]  RunTimeExec: output>

Run the exectask from OS prompt :
[root@hract21 ~]# /tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
<CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP></SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT><SLOS_OTHERINFO>No UDEV rule found for device(s)
 specified</SLOS_OTHERINFO></CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG>
<CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
</CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
 | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
 | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' 
 | sed -e 's/://' -e 's/\.\*/\*/g'
 </CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules 
 | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'
 | awk '{if ("asmdisk2_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
 | sed -e 's/://' -e 's/\.\*/\*/g'
</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>

Test the exectask in detail:
[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE  
 | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
 | awk '  {if ("asmdisk1_10G" ~ $1) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
--> Here awk returns nothing !

[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
  |awk '  {  print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'   
 
sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", 
   RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid"

--> The above sed script adds sd?1 as parameter $1 and @ as parameter $2 . 
    later awk search for "asmdisk1_10G" in parameter $1   if ("asmdisk1_10G" ~ $1) ... 
        as string "asmdisk1_10G" can be found in paramter $3 but in in paramter $1 !!
    
Potential Fix : Modify search string we get a record back !
[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'  
  |awk  '  /asmdisk1_10G/ {  print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent",
 RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid", ..

--> Seems the way Oracle extracts UDEV data is not working for OEL 6 where UDEV Records could look like: 
NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent",   
    RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c",  OWNER="grid", GROUP="asmadmin", MODE="0660"

As the ASM disk has the proper permissions I decided to ignore the warnings  
[root@hract21 rules.d]# ls -l  /dev/asm*
brw-rw---- 1 grid asmadmin 8, 17 Jan 29 09:33 /dev/asmdisk1_10G
brw-rw---- 1 grid asmadmin 8, 33 Jan 29 09:33 /dev/asmdisk2_10G
brw-rw---- 1 grid asmadmin 8, 49 Jan 29 09:33 /dev/asmdisk3_10G
brw-rw---- 1 grid asmadmin 8, 65 Jan 29 09:33 /dev/asmdisk4_10G

Using datapatch in a RAC env

Overview

  • Datapatch is the new tool that enables automation of post-patch SQL actions for RDBMS patches.
  • If we have a 3 node Rac cluster datapatch runs 3  jobs named LOAD_OPATCH_INVENTORY_1 ,LOAD_OPATCH_INVENTORY_2, LOAD_OPATCH_INVENTORY_3
  • This inventory updates requires that all RAC nodes are available ( even for Policy managed database )
  • Install Helper package from Node 1585814.1 : [ demo1.sql + demo2.sql ]
  • With 12c we have a SQL interface for quering patches (by reading lsinventory via PLSQL )
  • For patches that do not have post-patch SQL actions to be performed, calling datapatch is a no-op.
  • For patches that do have post-patch SQL instructions to be invoked on the database instance, datapatch will automatically   detect ALL pending actions (from one installed patch or multiple installed patches) and complete the actions as appropriate.

What should I do when the datapatch commands throws any error or warning ?

Rollable VS. Non-Rollable Patches: ( From Oracle Docs )
 - Patches are designed to be applied in either rolling mode or non-rolling mode.
 - If a patch is rollable, the patch has no dependency on the SQL script. 
   The database can be brought up without issue.

 OPatchauto succeeds with a warning on datapatch/sqlpatch.
  ->  For rollable patches:
        In-1gnore datapatch errors on node 1 - node().
        On the last node (node n), run datapatch again. You can cut and paste this command from the log file.
        If you still encounter datapatch errors on the last node, call Oracle Support or open a Service Request.

   -> For non-rollable patches:
        Bring down all databases and stacks manually for all nodes.
        Run opatchauto apply on every node.
        Bring up the stack and databases.
        Note that the databases must be up in order for datapatch to connect and apply the SQL.
        Manually run datapatch on the last node. 
        Note that if you do not run datapatch, the SQL for the patch will not be applied and you will not 
           benefit from the bug fix. In addition, you may encounter incorrect system behavior 
           depending on the changes the SQL is intended to implement.
        If datapatch continues to fail, you must roll back the patch. 
        Call Oracle Support for assistance or open a Service Request.

 

How to check the current patch level and reinstall a SQL patch ?

[oracle@gract1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 08:55:31 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550
Currently installed C Patches: 19121550
Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  Nothing to apply
Patch installation complete.  Total patches installed: 0
SQL Patching tool complete on Sun Jan 25 08:57:14 2015

--> Patch 19121550 is installed ( both parts C layer and SQL layer are installed )

Rollback the patch
[oracle@gract1 OPatch]$  ./datapatch -rollback 19121550
SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 09:03:03 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  The following patches will be rolled back: 19121550
  Nothing to apply
Installing patches...
Patch installation complete.  Total patches installed: 1
Validating logfiles...done
SQL Patching tool complete on Sun Jan 25 09:04:51 2015

Reapply the patch
oracle@gract1 OPatch]$  ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 09:06:55 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches:               <-- Here we can see that SQL patch is not yet installed !
Currently installed C Patches: 19121550
Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied: 19121550
Installing patches...
Patch installation complete.  Total patches installed: 1
Validating logfiles...
Patch 19121550 apply: SUCCESS
  logfile: /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2015Jan25_09_08_51.log (no errors)
  catbundle generate logfile: /u01/app/oracle/cfgtoollogs/catbundle/catbundle_PSU_DW_dw_GENERATE_2015Jan25_09_08_51.log (no errors)
  catbundle apply logfile: /u01/app/oracle/cfgtoollogs/catbundle/catbundle_PSU_DW_dw_APPLY_2015Jan25_09_08_53.log (no errors)
SQL Patching tool complete on Sun Jan 25 09:10:31 2015

Verify the current patch status 
SQL> select * from dba_registry_sqlpatch;
  PATCH_ID ACTION       STATUS       ACTION_TIME              DESCRIPTION
---------- --------------- --------------- ------------------------------ --------------------
LOGFILE
------------------------------------------------------------------------------------------------------------------------
  19121550 APPLY       SUCCESS       26-OCT-14 12.13.19.575484 PM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2014Oct26_12_01_54.log

  19121550 ROLLBACK       SUCCESS       25-JAN-15 09.04.51.585648 AM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_rollback_DW_2015Jan25_09_04_43.log

  19121550 APPLY       SUCCESS       25-JAN-15 09.10.31.872019 AM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2015Jan25_09_08_51.log

--> Here we can identify that we re-applied the SQL part of patch  19121550 at : 25-JAN-15 09.10.31

 

Using  Queryable Patch Inventory [ DEMOQP helper package ]

Overview DEMOQP helper package 
Install Helper package from Node 1585814.1 : [ demo1.sql + demo2.sql ] 
Have a short look on these package details:
SQL> desc DEMOQP
PROCEDURE CHECK_PATCH_INSTALLED
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 BUGS                QOPATCH_LIST        IN
PROCEDURE COMPARE_CURRENT_DB
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 BUGS                QOPATCH_LIST        IN
PROCEDURE COMPARE_RAC_NODE
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 NODE                VARCHAR2        IN
 INST                VARCHAR2        IN
FUNCTION GET_BUG_DETAILS RETURNS XMLTYPE
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 PATCH                VARCHAR2        IN
FUNCTION GET_DEMO_XSLT RETURNS XMLTYPE

Script to test Queryable Patch Inventory : check_patch.sql  
/*     
        For details see : 
        Queryable Patch Inventory -- SQL Interface to view, compare, validate database patches (Doc ID 1585814.1)
*/
set echo on
set pagesize 20000
set long 200000

/* Is patch 19849140 installed  ?  */
set serveroutput on
exec DEMOQP.check_patch_installed (qopatch_list('19849140'));

/* Return details about pacht 19849140 */
select xmltransform(DEMOQP.get_bug_details('19849140'), dbms_qopatch.get_opatch_xslt()) from dual;

/* As we are running on a PM managed db let's have look on host_names and instance names */
col HOST_NAME format A30
select host_name, instance_name from gv$instance;
select host_name, instance_name from v$instance;

/* check Instance ERP_1 on gract2.example.com */
exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1');
select xmltransform (dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual;

/* Compare RAC nodes - this is not working in my env ! --> Getting   ORA-06502: PL/SQL: numeric or value error */
set serveroutput on
exec demoqp.compare_rac_node('gract2.example.com','ERP_1');



1) Check whether a certain patch ins installed

SQL> /* Is patch 19849140 installed    ?    */
SQL> set serveroutput on
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED


2) Check patch details for patch  19849140

SQL> /* Return details about pacht 19849140 */
SQL> select xmltransform(DEMOQP.get_bug_details('19849140'), dbms_qopatch.get_opatch_xslt()) from dual;
XMLTRANSFORM(DEMOQP.GET_BUG_DETAILS('19849140'),DBMS_QOPATCH.GET_OPATCH_XSLT())
--------------------------------------------------------------------------------

Patch     19849140:   applied on 2015-01-23T16:31:09+01:00
Unique Patch ID: 18183131
  Patch Description: Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Comp
onent)
  Created on     : 23 Oct 2014, 08:32:20 hrs PST8PDT
  Bugs fixed:
     16505840  16505255  16505717  16505617  16399322  16390989  17486244  1
6168869  16444109  16505361  13866165  16505763  16208257  16904822  17299876  1
6246222  16505540  16505214  15936039  16580269  16838292  16505449  16801843  1
6309853  16505395  17507349  17475155  16493242  17039197  16196609  18045611  1
7463260  17263488  16505667  15970176  16488665  16670327  17551223
  Files Touched:

    cluvfyrac.sh
    crsdiag.pl
    lsnodes
..


3) Read in the inventory stuff from a gract2.example.com running instance  ERP_1
SQL> /* As we are running on a PM managed db let's have look on host_names and instance names */
SQL> col HOST_NAME format A30
SQL> select host_name, instance_name from gv$instance;

HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           ERP_2
gract2.example.com           ERP_1
gract3.example.com           ERP_3

SQL> select host_name, instance_name from v$instance;

HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           ERP_2

SQL> 
SQL> /* check Instance ERP_1 on gract2.example.com */
SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1');

SQL> select xmltransform (dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual;
XMLTRANSFORM(DBMS_QOPATCH.GET_OPATCH_LSINVENTORY(),DBMS_QOPATCH.GET_OPATCH_XSLT(
--------------------------------------------------------------------------------

Oracle Querayable Patch Interface 1.0
--------------------------------------------------------------------------------

Oracle Home      : /u01/app/oracle/product/121/racdb
Inventory      : /u01/app/oraInventory
--------------------------------------------------------------------------------
Installed Top-level Products (1):
Oracle Database 12c                       12.1.0.1.0
Installed Products ( 131)
..

4) Compare RAC nodes 
This very exiting feature doesn't work - sorry not time for debugging !

SQL> /* Compare RAC nodes - this is not working in my env ! --> Getting   ORA-06502: PL/SQL: numeric or value error */
SQL> set serveroutput on
SQL> exec demoqp.compare_rac_node('gract2.example.com','ERP_1');
BEGIN demoqp.compare_rac_node('gract2.example.com','ERP_1'); END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
ORA-06512: at "SYS.DEMOQP", line 40
ORA-06512: at line 1

gract2.example.com           ERP_1

 

Why rollback and reapply SQL patch results in a NO-OP operation ?

[oracle@gract1 OPatch]$ ./datapatch -rollback 19849140 -force
SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 19:39:29 2015
Copyright (c) 2014, Oracle.  All rights reserved.
Connecting to database...OK
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  The following patches will be rolled back: 19849140
  Nothing to apply
Error: prereq checks failed!
  patch 19849140: rollback script /u01/app/oracle/product/121/racdb/sqlpatch/19849140/19849140_rollback.sql does not exist
Prereq check failed!  Exiting without installing any patches
See support note 1609718.1 for information on how to resolve the above errors
SQL Patching tool complete on Sat Jan 24 19:39:29 2015

What is this ?
Lets check dba_registry_sqlpatch whether patch 19849140 comes with any SQL changes 

SQL> col action_time format A30
SQL> col DESCRIPTION format A20
SQL> select * from dba_registry_sqlpatch ;
  PATCH_ID ACTION       STATUS       ACTION_TIME              DESCRIPTION
---------- --------------- --------------- ------------------------------ --------------------
LOGFILE
------------------------------------------------------------------------------------------------------------------------
  19121550 APPLY       SUCCESS       26-OCT-14 12.13.19.575484 PM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2014Oct26_12_01_54.log

--> Patch doesn't provide any SQL changes - so above error isn't more an informational message.

What is the root cause of ORA-20006 in a RAC env?

Stop an instance 
[oracle@gract2 ~]$  srvctl stop instance -d dw -i dw_3
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.dw.db                      1   ONLINE       ONLINE       gract1          Open,STABLE  
ora.dw.db                      2   ONLINE       ONLINE       gract3          Open,STABLE  
ora.dw.db                      3   OFFLINE      OFFLINE      -               Instance Shutdown,ST ABLE

[oracle@gract1 OPatch]$ ./datapatch  -verbose
SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 20:03:22 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550
DBD::Oracle::st execute failed: ORA-20006: Number of RAC active instances and opatch jobs configured are not same
ORA-06512: at "SYS.DBMS_QOPATCH", line 1007
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute) [for Statement "DECLARE
       x XMLType;
     BEGIN
       x := dbms_qopatch.get_pending_activity;
       ? := x.getStringVal();
     END;" with ParamValues: :p1=undef] at /u01/app/oracle/product/121/racdb/sqlpatch/sqlpatch.pm line 1293.

Note even for policy managed database we need all instances up running on all servers to apply the patch !

Start the instance and and rerun  ./datapatch command
[oracle@gract1 OPatch]$ srvctl start instance -d dw -i dw_3
[oracle@gract1 OPatch]$ vi check_it.sql
[oracle@gract1 OPatch]$  ./datapatch  -verbose
SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 20:17:33 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550

...................

ORA-20008 during datapatch installation on a RAC env

You get  ORA-20008 during running datapatch tool or during quering the patch status 
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
BEGIN DEMOQP.check_patch_installed (qopatch_list('19849140')); END;

*
ERROR at line 1:
ORA-20008: Timed out, Job Load_opatch_inventory_3execution time is more than 120Secs
ORA-06512: at "SYS.DBMS_QOPATCH", line 1428
ORA-06512: at "SYS.DBMS_QOPATCH", line 182
ORA-06512: at "SYS.DEMOQP", line 157
ORA-06512: at line 1

SQL> set linesize 120
SQL> col NODE_NAME format A20
SQL> col JOB_NAME format A30
SQL> col START_DATE format A35
SQL> col INST_JOB   format A30
SQL> select NODE_NAME, INST_ID, INST_JOB from opatch_inst_job;

NODE_NAME        INST_ID INST_JOB
-------------------- ---------- ------------------------------
gract1.example.com          1 Load_opatch_inventory_1
gract3.example.com          2 Load_opatch_inventory_2
gract2.example.com          3 Load_opatch_inventory_3

SQL> 
SQL> select job_name,state, start_date from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';

JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_2        SUCCEEDED       24-JAN-15 11.35.41.629308 AM +01:00
LOAD_OPATCH_INVENTORY_3        SCHEDULED       24-JAN-15 11.35.41.683097 AM +01:00
LOAD_OPATCH_INVENTORY_1        SUCCEEDED       24-JAN-15 11.35.41.156565 AM +01:00
 
JOB was scheduled but was never succeeded ! 
--> After fixing the the connections problem to  gract2.example.com the job runs to completion

SQL> select job_name,state, start_date from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';
JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_2        SUCCEEDED       24-JAN-15 11.59.29.078730 AM +01:00
LOAD_OPATCH_INVENTORY_3        SUCCEEDED       24-JAN-15 11.59.29.148714 AM +01:00
LOAD_OPATCH_INVENTORY_1        SUCCEEDED       24-JAN-15 11.59.29.025652 AM +01:00

Verify the patch install on all cluster nodes
SQL> set echo on
SQL> set pagesize 20000
SQL> set long 200000
SQL> 
SQL> /* As we are running on a PM managed db let's have look on host_names and instance names */
SQL> col HOST_NAME format A30
SQL> select host_name, instance_name from gv$instance;
HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           dw_1
gract2.example.com           dw_3
gract3.example.com           dw_2
SQL> select host_name, instance_name from v$instance;
HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           dw_1

SQL> /* exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1'); */
SQL> set serveroutput on
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED

SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','dw_3');
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED

SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract3.example.com','dw_2');
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED

 

Monitor Script to track  dba_scheduler_jobs and  opatch_inst_job tables

[oracle@gract1 ~/DATAPATCH]$ cat check_it.sql
 connect / as sysdba
 alter session set NLS_TIMESTAMP_TZ_FORMAT = 'dd-MON-yyyy HH24:mi:ss';
 set linesize 120
 col NODE_NAME format A20
 col JOB_NAME format A30
 col START_DATE format A25
 col LAST_START_DATE format A25
 col INST_JOB   format A30
 select NODE_NAME, INST_ID, INST_JOB from opatch_inst_job;
 select job_name,state, start_date, LAST_START_DATE from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';

 

How to cleanup after ORA-27477 errors ?

oracle@gract1 OPatch]$  ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Fri Jan 23 20:44:48 2015
Copyright (c) 2014, Oracle.  All rights reserved.
Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550
DBD::Oracle::st execute failed: ORA-27477: "SYS"."LOAD_OPATCH_INVENTORY_3" already exists
ORA-06512: at "SYS.DBMS_QOPATCH", line 1011
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute) [for Statement "DECLARE
x XMLType;
BEGIN
x := dbms_qopatch.get_pending_activity;
? := x.getStringVal();
END;" with ParamValues: :p1=undef] at /u01/app/oracle/product/121/racdb/sqlpatch/sqlpatch.pm line 1293.

sqlplus /nolog @check_it
NODE_NAME        INST_ID INST_JOB
-------------------- ---------- ------------------------------
gract2.example.com          1 Load_opatch_inventory_1
gract1.example.com          2 Load_opatch_inventory_2

JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_1        DISABLED        23-JAN-15 08.38.11.746811 PM +01:00
LOAD_OPATCH_INVENTORY_3        DISABLED        23-JAN-15 08.36.18.506279 PM +01:00
LOAD_OPATCH_INVENTORY_2        DISABLED        23-JAN-15 08.38.11.891360 PM +01:00

Drop the jobs and cleanup the  opatch_inst_job table
SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_1');
SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_2');
SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_3');
SQL>  delete from opatch_inst_job;
2 rows deleted.
SQL> commit;

Now rerun  ./datapatch verbose command and monitor progress
SQL> @check_it
Connected.
NODE_NAME        INST_ID INST_JOB
-------------------- ---------- ------------------------------
gract2.example.com          1 Load_opatch_inventory_1
gract1.example.com          2 Load_opatch_inventory_2
gract3.example.com          3 Load_opatch_inventory_3
--> All our cluster nodes are ONLINE and the required JOBS are SCHEDULED !
JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_1        SUCCEEDED       23-JAN-15 08.46.08.885038 PM +01:00
LOAD_OPATCH_INVENTORY_2        SUCCEEDED       23-JAN-15 08.46.08.933665 PM +01:00
LOAD_OPATCH_INVENTORY_3        RUNNING           23-JAN-15 08.46.09.014492 PM +01:00

Reference

  • 12.1.0.1 datapatch issue : ORA-27477: “SYS”.”LOAD_OPATCH_INVENTORY_1″ already exists (Doc ID 1934882.1)
  • Oracle Database 12.1 : FAQ on Queryable Patch Inventory [ID 1530108.1]
  • Datapatch errors at “SYS.DBMS_QOPATCH” [ID 1599479.1]
  • Queryable Patch Inventory — SQL Interface to view, compare, validate database patches (Doc ID 1585814.1)

Manually applying CW Patch ( 12.1.0.1.5 )

Overview

  • In this tutorial we will manually apply a CW patch [ 19849140 ] without using opatchauto.
  • For that we closely follow the patch README – chapter 5 [  patches/12105/19849140/README.html ]   ->  Manual Steps for Apply/Rollback Patch

Check for conflicts

[root@gract1 CLUVFY-JAN-2015]#  $GRID_HOME/OPatch/opatchauto apply /media/sf_kits/patches/12105/19849140 -analyze 
 OPatch Automation Tool
Copyright (c) 2015, Oracle Corporation.  All rights reserved.
OPatchauto version : 12.1.0.1.5
OUI version        : 12.1.0.1.0
Running from       : /u01/app/121/grid
opatchauto log file: /u01/app/121/grid/cfgtoollogs/opatchauto/19849140/opatch_gi_2015-01-22_18-25-48_analyze.log
NOTE: opatchauto is running in ANALYZE mode. There will be no change to your system.
Parameter Validation: Successful
Grid Infrastructure home:
/u01/app/121/grid
RAC home(s):
/u01/app/oracle/product/121/racdb
Configuration Validation: Successful
Patch Location: /media/sf_kits/patches/12105/19849140
Grid Infrastructure Patch(es): 19849140 
RAC Patch(es): 19849140 
Patch Validation: Successful
Analyzing patch(es) on "/u01/app/oracle/product/121/racdb" ...
[WARNING] The local database instance 'dw_2' from '/u01/app/oracle/product/121/racdb' is not running. 
SQL changes, if any,  will not be analyzed. Please refer to the log file for more details.
[WARNING] SQL changes, if any, could not be analyzed on the following database(s): ERP ... Please refer to the log 
file for more details. 
Apply Summary:
opatchauto ran into some warnings during analyze (Please see log file for details):
GI Home: /u01/app/121/grid: 19849140
RAC Home: /u01/app/oracle/product/121/racdb: 19849140
opatchauto completed with warnings.
You have new mail in /var/spool/mail/root

If this is a GI Home, as the root user execute:
Oracle Clusterware active version on the cluster is [12.1.0.1.0]. The cluster upgrade state is [NORMAL]. 
The cluster active 
patch level is [482231859].
..
--> As this is a clusterware patch ONLY ignore the WARNINGs 
    
  •  Note during the analyze step we get a first hint that all instances must run on all all server for applying the patch !

Run pre root script and apply the GRID patch

1) Stop all databases rununing out of this ORACLE_HOME and unmount ACFS filesystem

2) Run the pre root script
[grid@gract1 gract1]$ $GRID_HOME>/crs/install/rootcrs.pl -prepatch

3) Apply the CRS patch 
[grid@gract1 gract1]$   $GRID_HOME/OPatch/opatch apply -oh $GRID_HOME 
                          -local /media/sf_kits/patches/12105/19849140/19849140
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/121/grid
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/121/grid/oraInst.loc
OPatch version    : 12.1.0.1.5
OUI version       : 12.1.0.1.0
Log file location : /u01/app/121/grid/cfgtoollogs/opatch/opatch2015-01-23_12-25-48PM_1.log

Applying interim patch '19849140' to OH '/u01/app/121/grid'
Verifying environment and performing prerequisite checks...
Interim patch 19849140 is a superset of the patch(es) [  17077442 ] in the Oracle Home
OPatch will roll back the subset patches and apply the given patch.
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Y
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/121/grid')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Rolling back interim patch '17077442' from OH '/u01/app/121/grid'
Patching component oracle.crs, 12.1.0.1.0...
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
RollbackSession removing interim patch '17077442' from inventory
OPatch back to application of the patch '19849140' after auto-rollback.
Patching component oracle.crs, 12.1.0.1.0...
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
Verifying the update...
Patch 19849140 successfully applied
Log file location: /u01/app/121/grid/cfgtoollogs/opatch/opatch2015-01-23_12-25-48PM_1.log
OPatch succeeded.

Verify OUI inventory
[grid@gract2 ~]$ $GRID_HOME//OPatch/opatch lsinventory
--------------------------------------------------------------------------------
Installed Top-level Products (1): 
Oracle Grid Infrastructure 12c                                       12.1.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (3) :
Patch  19849140     : applied on Fri Jan 23 15:52:12 CET 2015
Unique Patch ID:  18183131
Patch description:  "Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Component)"
   Created on 23 Oct 2014, 08:32:20 hrs PST8PDT
   Bugs fixed:
     16505840, 16505255, 16505717, 16505617, 16399322, 16390989, 17486244
     16168869, 16444109, 16505361, 13866165, 16505763, 16208257, 16904822
     17299876, 16246222, 16505540, 16505214, 15936039, 16580269, 16838292
     16505449, 16801843, 16309853, 16505395, 17507349, 17475155, 16493242
     17039197, 16196609, 18045611, 17463260, 17263488, 16505667, 15970176
     16488665, 16670327, 17551223
...
Patch level status of Cluster nodes :
Patch level status of Cluster nodes :
 Patching Level              Nodes
 --------------              -----
 3174741718                  gract2,gract1
  482231859                   gract3
--> Here Node gract1 and gract2 are ready patched where gract3 still need to be patched !

 

Apply the DB patch

[oracle@gract2 ~]$  $ORACLE_HOME/OPatch/opatch apply -oh $ORACLE_HOME 
                     -local /media/sf_kits/patches/12105/19849140/19849140
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/121/racdb
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/121/racdb/oraInst.loc
OPatch version    : 12.1.0.1.5
OUI version       : 12.1.0.1.0
Log file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_16-30-11PM_1.log

Applying interim patch '19849140' to OH '/u01/app/oracle/product/121/racdb'
Verifying environment and performing prerequisite checks...
Patch 19849140: Optional component(s) missing : [ oracle.crs, 12.1.0.1.0 ] 
Interim patch 19849140 is a superset of the patch(es) [  17077442 ] in the Oracle Home
OPatch will roll back the subset patches and apply the given patch.
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Y
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/121/racdb')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Rolling back interim patch '17077442' from OH '/u01/app/oracle/product/121/racdb'
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
RollbackSession removing interim patch '17077442' from inventory
OPatch back to application of the patch '19849140' after auto-rollback.
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
Verifying the update...
Patch 19849140 successfully applied
Log file location: /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_16-30-11PM_1.log
OPatch succeeded.

Run the post script for GRID

As root user execute:
# $GRID_HOME/rdbms/install/rootadd_rdbms.sh
# $GRID_HOME/crs/install/rootcrs.pl -postpatch
Using configuration parameter file: /u01/app/121/grid/crs/install/crsconfig_params
..

Verify the  RAC Node patch level
[oracle@gract3 ~]$   $ORACLE_HOME/OPatch/opatch lsinventory
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/121/racdb
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/121/racdb/oraInst.loc
OPatch version    : 12.1.0.1.5
OUI version       : 12.1.0.1.0
Log file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_17-59-49PM_1.log

Lsinventory Output file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/lsinv/lsinventory2015-01-23_17-59-49PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 
Oracle Database 12c                                                  12.1.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (2) :
Patch  19849140     : applied on Fri Jan 23 17:41:28 CET 2015
Unique Patch ID:  18183131
Patch description:  "Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Component)"
   Created on 23 Oct 2014, 08:32:20 hrs PST8PDT
   Bugs fixed:
     16505840, 16505255, 16505717, 16505617, 16399322, 16390989, 17486244
     16168869, 16444109, 16505361, 13866165, 16505763, 16208257, 16904822
     17299876, 16246222, 16505540, 16505214, 15936039, 16580269, 16838292
     16505449, 16801843, 16309853, 16505395, 17507349, 17475155, 16493242
     17039197, 16196609, 18045611, 17463260, 17263488, 16505667, 15970176
     16488665, 16670327, 17551223
Using configuration parameter file: /u01/app/121/grid/crs/install/crsconfig_params

....
Rac system comprising of multiple nodes
  Local node = gract3
  Remote node = gract1
  Remote node = gract2


Restart the CRS / database and login into the local instance 
root@gract2 Desktop]# su - oracle
-> Active ORACLE_SID:   ERP_1
[oracle@gract2 ~]$ 
[oracle@gract2 ~]$ sqlplus / as sysdba
SQL>  select host_name, instance_name from v$instance;
HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract2.example.com           ERP_1

Repeat now all above steps for each RAC node !!

Run the datapatch tool for each Oracle Database

  
ORACLE_SID=ERP_1    
[oracle@gract2 OPatch]$ cd $ORACLE_HOME/OPatch
[oracle@gract2 OPatch]$ ./datapatch -verbose

ORACLE_SID=dw_1
[oracle@gract2 OPatch]$ cd $ORACLE_HOME/OPatch
[oracle@gract2 OPatch]$ ./datapatch -verbose

For potential problems runing datapatch you may read the following article

Reference

Cluvfy Usage

Download  location for 12c cluvfy

http://www.oracle.com/technetwork/database/options/clustering/downloads/index.html

  • Cluster Verification Utility Download for Oracle Grid Infrastructure 12c
  • Always download the newest cluvfy version from above linke
  • The latest CVU version (July 2013) can be used with all currently supported Oracle RAC versions, including Oracle RAC 10g, Oracle RAC 11g  and Oracle RAC 12c.

Impact of latest Cluvfy version

It's nothing more annoying than debugging a RAC problem which is finally a Cluvfy BUG. 
The latest Download from January 2015 shows the following version  
  [grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy -version
  12.1.0.1.0 Build 112713x8664
whereas my current 12.1 installation reports the following version  
  [grid@gract1 ~/CLUVFY-JAN-2015]$ cluvfy -version
  12.1.0.1.0 Build 100213x866

Cluvfy trace Location

If you have installed cluvfy in  /home/grid/CLUVFY-JAN-2015 the related cluvfy traces could be found
in cv/log subdirectory

[root@gract1 CLUVFY-JAN-2015]# ls /home/grid/CLUVFY-JAN-2015/cv/log
cvutrace.log.0  cvutrace.log.0.lck

Note some cluvfy commands like :
# cluvfy comp dhcp -clustername gract -verbose
must be run as root ! Im that case the default trace location may not have the correct permissions .
In that uses the script below to set Trace Level and Trace Location

Setting Cluvfy trace File Locaton and Trace Level in a bash script

The following bash script sets the cluvfy trace location and the cluvfy trace level 
#!/bin/bash
rm -rf /tmp/cvutrace
mkdir /tmp/cvutrace
export CV_TRACELOC=/tmp/cvutrace
export SRVM_TRACE=true
export SRVM_TRACE_LEVEL=2

Why  cluvfy version matters ?

Yesterday I debugged a DHCP problem starting with cluvfy :
[grid@gract1 ~]$  cluvfy -version
12.1.0.1.0 Build 100213x8664
[root@gract1 network-scripts]# cluvfy comp dhcp -clustername gract -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
<null>
At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
<null>
Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip"
<null>
..
DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was unsuccessful on all the specified nodes. 
 
--> As verification was unsuccessful I started Network Tracing using tcpdump. 
    But Network tracing looks good and I get a bad feeling about cluvfy ! 

What to do next ?
Install the newest cluvfy version and rerun the test !
[grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy -version
12.1.0.1.0 Build 112713x8664

Now rerun test :
[root@gract1 CLUVFY-JAN-2015]#  bin/cluvfy  comp dhcp -clustername gract -verbose

Verifying DHCP Check 
Checking if any DHCP server exists on the network...
DHCP server returned server: 192.168.5.50, loan address: 192.168.5.150/255.255.255.0, lease time: 21600
At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
DHCP server returned server: 192.168.5.50, loan address: 192.168.5.150/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip"
..
released DHCP server lease for client ID "gract-gract1-vip" on port "67"
DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was successful. 

Why you should always review your cluvfy logs ?

Per default cluvfy logs are under CV_HOME/cv/logs

[grid@gract1 ~/CLUVFY-JAN-2015]$  cluvfy  stage -pre crsinst -n gract1 
Performing pre-checks for cluster services setup 
Checking node reachability...
Node reachability check passed from node "gract1"
Checking user equivalence...
User equivalence check passed for user "grid"
ERROR: 
An error occurred in creating a TaskFactory object or in generating a task list
PRCT-1011 : Failed to run "oifcfg". Detailed error: []
PRCT-1011 : Failed to run "oifcfg". Detailed error: []
This error is not very helpful at all !

Reviewing cluvfy logfiles for details:
[root@gract1 log]#  cd $GRID_HOME/cv/log

Cluvfy log cvutrace.log.0 : 
[Thread-49] [ 2015-01-22 08:51:25.283 CET ] [StreamReader.run:65]  OUTPUT>PRIF-10: failed to initialize the cluster registry
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:144]  runCommand: process returns 1
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:161]  RunTimeExec: output>
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:164]  PRIF-10: failed to initialize the cluster registry
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:170]  RunTimeExec: error>
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:192]  Returning from RunTimeExec.runCommand
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:884]  retval =  1
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:885]  exitval =  1
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:886]  rtErrLength =  0
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:892]  Failed to execute command. Command = [/u01/app/121/grid/bin/oifcfg, getif, -from, gpnp] env = null error = []
[main] [ 2015-01-22 08:51:25.287 CET ] [ClusterNetworkInfo.getNetworkInfoFromOifcfg:152]  INSTALLEXCEPTION: occured while getting cluster network info. messagePRCT-1011 : Failed to run "oifcfg". Detailed error: []
[main] [ 2015-01-22 08:51:25.287 CET ] [TaskFactory.getNetIfFromOifcfg:4352]  Exception occured while getting network information. msg=PRCT-1011 : Failed to run "oifcfg". Detailed error: []

Here we get a better error message : PRIF-10: failed to initialize the cluster registry
and we extract the failing command : /u01/app/121/grid/bin/oifcfg getif

Now we can retry the OS command as OS level
[grid@gract1 ~/CLUVFY-JAN-2015]$ /u01/app/121/grid/bin/oifcfg getif
PRIF-10: failed to initialize the cluster registry

Btw, if you have uploaded the new cluvfy command you get a much better error output 
[grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy  stage -pre crsinst -n gract1
ERROR: 
PRVG-1060 : Failed to retrieve the network interface classification information from an existing CRS home at path "/u01/app/121/grid" on the local node
PRCT-1011 : Failed to run "oifcfg". Detailed error: PRIF-10: failed to initialize the cluster registry

For Fixing PRVG-1060,PRCT-1011,PRIF-10 runnung above cluvfy commnads please read 
following article: Common cluvfy errors and warnings

Run cluvfy before CRS installation by passing network connections for PUBLIC and CLUSTER_INTERCONNECT

$ ./bin/cluvfy stage -pre crsinst -n grac121,grac122  -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect

Run cluvfy before doing an UPGRADE

grid@grac41 /]$  cluvfy stage -pre crsinst -upgrade -n grac41,grac42,grac43 -rolling -src_crshome $GRID_HOME 
                 -dest_crshome /u01/app/grid_new -dest_version 12.1.0.1.0  -fixup -fixupdir /tmp -verbose

 Run cluvfy 12.1 for preparing a 10.2.1.0 CRS installation

Always install newest cluvfy version even for 10gR2 CRS validations!
[root@ract1 ~]$  ./bin/cluvfy  -version
12.1.0.1.0 Build 112713x8664

Verify OS setup on ract1
[root@ract1 ~]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract1 -verbose -fixup
--> Run required scripts
[root@ract1 ~]# /tmp/CVU_12.1.0.1.0_oracle/runfixup.sh
All Fix-up operations were completed successfully.

Repeat this step on ract2
[root@ract2 ~]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract2 -verbose -fixup
--> Run required scripts
[root@ract2 ~]# /tmp/CVU_12.1.0.1.0_oracle/runfixup.sh
All Fix-up operations were completed successfully.

Now verify System requirements on both nodes
[oracle@ract1 cluvfy12]$  ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract1 -verbose -fixup
Verifying system requirement
..
NOTE:
No fixable verification failures to fix

Finally run cluvfy to test CRS installation readiness 
$ cluvfy12/bin/cluvfy stage -pre crsinst -r 10gR2 \
  -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect \
  -n ract1,ract2 -verbose
..
Pre-check for cluster services setup was successful.

Run cluvfy comp software to check file protections for GRID and RDBMS installations

  • Note : Not all files are checked ( SHELL scripts like ohasd are missing )  –    Bug 18407533 – CLUVFY DOES NOT VERIFY ALL FILES
  • Config File  : $GRID_HOME/cv/cvdata/ora_software_cfg.xml
Run   cluvfy comp software to verify GRID stack 
[grid@grac41 ~]$  cluvfy comp software  -r  11gR2 -n grac41 -verbose  
Verifying software 
Check: Software
  1178 files verified                 
Software check passed
Verification of software was successful. 

Run   cluvfy comp software to verify RDBMS stack 
[oracle@grac43 ~]$  cluvfy comp software  -d $ORACLE_HOME -r 11gR2 -verbose 
Verifying software 
Check: Software
  1780 files verified                 
Software check passed
Verification of software was successful.

Run cluvfy before CRS installation on a single node and create a  script for fixable errors

$ ./bin/cluvfy comp sys -p crs -n grac121 -verbose -fixup
Verifying system requirement 
Check: Total memory 
  Node Name     Available                 Required                  Status    
  ------------  ------------------------  ------------------------  ----------
  grac121       3.7426GB (3924412.0KB)    4GB (4194304.0KB)         failed    
Result: Total memory check failed
... 
*****************************************************************************************
Following is the list of fixable prerequisites selected to fix in this session
******************************************************************************************
--------------                ---------------     ----------------    
Check failed.                 Failed on nodes     Reboot required?    
--------------                ---------------     ----------------    
Hard Limit: maximum open      grac121             no                  
file descriptors                                                      
Execute "/tmp/CVU_12.1.0.1.0_grid/runfixup.sh" as root user on nodes "grac121" to perform the fix up operations manually
--> Now run runfixup.sh" as root   on nodes "grac121" 
Press ENTER key to continue after execution of "/tmp/CVU_12.1.0.1.0_grid/runfixup.sh" has completed on nodes "grac121"
Fix: Hard Limit: maximum open file descriptors 
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac121                               successful              
Result: "Hard Limit: maximum open file descriptors" was successfully fixed on all the applicable nodes
Fix up operations were successfully completed on all the applicable nodes
Verification of system requirement was unsuccessful on all the specified nodes.

Note errrors like to low memory/swap needs manual intervention:
Check: Total memory 
  Node Name     Available                 Required                  Status    
  ------------  ------------------------  ------------------------  ----------
  grac121       3.7426GB (3924412.0KB)    4GB (4194304.0KB)         failed    
Result: Total memory check failed
Fix that error at OS level and rerun the above cluvfy command

Performing post-checks for hardware and operating system setup

  • cluvfy  stage -post hwos  test multicast communication with multicast group “230.0.1.0”
[grid@grac42 ~]$  cluvfy stage -post hwos -n grac42,grac43 -verbose 
Performing post-checks for hardware and operating system setup 
Checking node reachability...
Check: Node reachability from node "grac42"
  Destination Node                      Reachable?              
  ------------------------------------  ------------------------
  grac42                                yes                     
  grac43                                yes                     
Result: Node reachability check passed from node "grac42"

Checking user equivalence...
Check: User equivalence for user "grid"
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac43                                passed                  
  grac42                                passed                  
Result: User equivalence check passed for user "grid"

Checking node connectivity...
Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac43                                passed                  
  grac42                                passed                  
Verification of the hosts config file successful

Interface information for node "grac43"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:38:10:76 1500  
 eth1   192.168.1.103   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth1   192.168.1.59    192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth1   192.168.1.170   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth1   192.168.1.177   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth2   192.168.2.103   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:1C:30:DD 1500  
 eth2   169.254.125.13  169.254.0.0     0.0.0.0         10.0.2.2        08:00:27:1C:30:DD 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  

Interface information for node "grac42"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:6C:89:27 1500  
 eth1   192.168.1.102   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.165   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.178   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.167   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth2   192.168.2.102   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 eth2   169.254.96.101  169.254.0.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  

Check: Node connectivity for interface "eth1"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac43[192.168.1.103]           grac43[192.168.1.59]            yes             
  grac43[192.168.1.103]           grac43[192.168.1.170]           yes             
  ..     
  grac42[192.168.1.165]           grac42[192.168.1.167]           yes             
  grac42[192.168.1.178]           grac42[192.168.1.167]           yes             
Result: Node connectivity passed for interface "eth1"

Check: TCP connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.1.102            grac43:192.168.1.103            passed          
  grac42:192.168.1.102            grac43:192.168.1.59             passed          
  grac42:192.168.1.102            grac43:192.168.1.170            passed          
  grac42:192.168.1.102            grac43:192.168.1.177            passed          
  grac42:192.168.1.102            grac42:192.168.1.165            passed          
  grac42:192.168.1.102            grac42:192.168.1.178            passed          
  grac42:192.168.1.102            grac42:192.168.1.167            passed          
Result: TCP connectivity check passed for subnet "192.168.1.0"

Check: Node connectivity for interface "eth2"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac43[192.168.2.103]           grac42[192.168.2.102]           yes             
Result: Node connectivity passed for interface "eth2"
Check: TCP connectivity of subnet "192.168.2.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.2.102            grac43:192.168.2.103            passed          
Result: TCP connectivity check passed for subnet "192.168.2.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed for subnet "192.168.2.0".
Subnet mask consistency check passed.
Result: Node connectivity check passed

Checking multicast communication...
Checking subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0" passed.
Checking subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0" passed.
Check of multicast communication passed.

Checking for multiple users with UID value 0
Result: Check for multiple users with UID value 0 passed 
Check: Time zone consistency 
Result: Time zone consistency check passed

Checking shared storage accessibility...
  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdb                              grac43                  
  /dev/sdk                              grac42                  
..        
  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdp                              grac43 grac42           
Shared storage check was successful on nodes "grac43,grac42"

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes...
Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined
More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file
All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf"
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed

Post-check for hardware and operating system setup was successful. 

Debugging Voting disk problems with:  cluvfy comp vdisk

As your CRS stack may not be up run these commands from a node which is up and running 
[grid@grac42 ~]$ cluvfy comp ocr -n grac41
Verifying OCR integrity 
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR: 
PRVF-4194 : Asm is not running on any of the nodes. Verification cannot proceed.
OCR integrity check failed
Verification of OCR integrity was unsuccessful on all the specified nodes. 

[grid@grac42 ~]$ cluvfy comp vdisk -n grac41
Verifying Voting Disk: 
Checking Oracle Cluster Voting Disk configuration...
ERROR: 
PRVF-4194 : Asm is not running on any of the nodes. Verification cannot proceed.
ERROR: 
PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdf1"
ERROR: 
PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdg1"
ERROR: 
PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdh1"
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
UDev attributes check for Voting Disk locations started...
UDev attributes check passed for Voting Disk locations 
Verification of Voting Disk was unsuccessful on all the specified nodes. 

Debugging steps at OS level 
Verify disk protections and use kfed to read disk header 
[grid@grac41 ~/cluvfy]$ ls -l /dev/asmdisk1_udev_sdf1 /dev/asmdisk1_udev_sdg1 /dev/asmdisk1_udev_sdh1
b---------. 1 grid asmadmin 8,  81 May 14 09:51 /dev/asmdisk1_udev_sdf1
b---------. 1 grid asmadmin 8,  97 May 14 09:51 /dev/asmdisk1_udev_sdg1
b---------. 1 grid asmadmin 8, 113 May 14 09:51 /dev/asmdisk1_udev_sdh1

[grid@grac41 ~/cluvfy]$ kfed read  /dev/asmdisk1_udev_sdf1
KFED-00303: unable to open file '/dev/asmdisk1_udev_sdf1'

Debugging file protection problems with:  cluvfy comp software

  • Related BUG: 18350484 : 112042GIPSU:”CLUVFY COMP SOFTWARE” FAILED IN 112042GIPSU IN HPUX
Investigate file protection problems with cluvfy comp software

Cluvfy checks file protections against ora_software_cfg.xml
[grid@grac41 cvdata]$ cd  /u01/app/11204/grid/cv/cvdata
[grid@grac41 cvdata]$ grep gpnp ora_software_cfg.xml
      <File Path="bin/" Name="gpnpd.bin" Permissions="0755"/>
      <File Path="bin/" Name="gpnptool.bin" Permissions="0755"/>

Change protections and verify wiht cluvfy
[grid@grac41 cvdata]$ chmod 444  /u01/app/11204/grid/bin/gpnpd.bin
[grid@grac41 cvdata]$ cluvfy comp software -verbose | grep gpnpd
    /u01/app/11204/grid/bin/gpnpd.bin..."Permissions" did not match reference
        Permissions of file "/u01/app/11204/grid/bin/gpnpd.bin" did not match the expected value. [Expected = "0755" ; Found = "0444"]

Now correct problem and verify again 
[grid@grac41 cvdata]$ chmod 755  /u01/app/11204/grid/bin/gpnpd.bin
[grid@grac41 cvdata]$ cluvfy comp software -verbose | grep gpnpd
--> No errors were reported anymore

Debugging CTSSD/NTP problems with:  cluvfy comp clocksync

[grid@grac41 ctssd]$ cluvfy comp clocksync -n grac41,grac42,grac43 -verbose
Verifying Clock Synchronization across the cluster nodes 
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac43                                passed                  
  grac42                                passed                  
  grac41                                passed                  
Result: CTSS resource check passed
Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
  Node Name                             State                   
  ------------------------------------  ------------------------
  grac43                                Observer                
  grac42                                Observer                
  grac41                                Observer                
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
Checking daemon liveness...
Check: Liveness for "ntpd"
  Node Name                             Running?                
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
Result: Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
Checking NTP daemon command line for slewing option "-x"
Check: NTP daemon command line
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
Result: 
NTP daemon slewing option check passed
Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
Check: NTP daemon's boot time configuration
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
Result: 
NTP daemon's boot time configuration check for slewing option passed
Checking whether NTP daemon or service is using UDP port 123 on all nodes
Check for NTP daemon or service using UDP port 123
  Node Name                             Port Open?              
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
NTP common Time Server Check started...
NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Checking on nodes "[grac43, grac42, grac41]"... 
Check: Clock time offset from NTP Time Server
Time Server: .LOCL. 
Time Offset Limit: 1000.0 msecs
  Node Name     Time Offset               Status                  
  ------------  ------------------------  ------------------------
  grac43        0.0                       passed                  
  grac42        0.0                       passed                  
  grac41        0.0                       passed                  
Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[grac43, grac42, grac41]". 
Clock time offset check passed
Result: Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful. 

At OS level you can run ntpq -p 
[root@grac41 dev]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ns1.example.com LOCAL(0)        10 u   90  256  377    0.072  -238.49 205.610
 LOCAL(0)        .LOCL.          12 l  15h   64    0    0.000    0.000   0.000

Running cluvfy stage -post crsinst   after a failed Clusterware startup

  • Note you should run cluvfy from a ndoe which is up and runnung to get best results
 
CRS resource status
[grid@grac41 ~]$ my_crs_stat_init
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     OFFLINE                      Instance Shutdown
ora.cluster_interconnect.haip  ONLINE     OFFLINE                       
ora.crf                        ONLINE     ONLINE          grac41        
ora.crsd                       ONLINE     OFFLINE                       
ora.cssd                       ONLINE     OFFLINE         STARTING      
ora.cssdmonitor                ONLINE     ONLINE          grac41        
ora.ctssd                      ONLINE     OFFLINE                       
ora.diskmon                    OFFLINE    OFFLINE                       
ora.drivers.acfs               ONLINE     OFFLINE                       
ora.evmd                       ONLINE     OFFLINE                       
ora.gipcd                      ONLINE     ONLINE          grac41        
ora.gpnpd                      ONLINE     ONLINE          grac41        
ora.mdnsd                      ONLINE     ONLINE          grac41        

Verify CRS status with cluvfy ( CRS on grac42 is up and running )
[grid@grac42 ~]$ cluvfy stage -post crsinst -n grac41,grac42 -verbose
Performing post-checks for cluster services setup 
Checking node reachability...
Check: Node reachability from node "grac42"
  Destination Node                      Reachable?              
  ------------------------------------  ------------------------
  grac42                                yes                     
  grac41                                yes                     
Result: Node reachability check passed from node "grac42"

Checking user equivalence...
Check: User equivalence for user "grid"
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                passed                  
  grac41                                passed                  
Result: User equivalence check passed for user "grid"

Checking node connectivity...
Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                passed                  
  grac41                                passed                  
Verification of the hosts config file successful

Interface information for node "grac42"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:6C:89:27 1500  
 eth1   192.168.1.102   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.59    192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.178   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.170   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth2   192.168.2.102   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 eth2   169.254.96.101  169.254.0.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  
Interface information for node "grac41"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:82:47:3F 1500  
 eth1   192.168.1.101   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:89:E9:A2 1500  
 eth2   192.168.2.101   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:6B:E2:BD 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  

Check: Node connectivity for interface "eth1"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42[192.168.1.102]           grac42[192.168.1.59]            yes             
  grac42[192.168.1.102]           grac42[192.168.1.178]           yes             
  grac42[192.168.1.102]           grac42[192.168.1.170]           yes             
  grac42[192.168.1.102]           grac41[192.168.1.101]           yes             
  grac42[192.168.1.59]            grac42[192.168.1.178]           yes             
  grac42[192.168.1.59]            grac42[192.168.1.170]           yes             
  grac42[192.168.1.59]            grac41[192.168.1.101]           yes             
  grac42[192.168.1.178]           grac42[192.168.1.170]           yes             
  grac42[192.168.1.178]           grac41[192.168.1.101]           yes             
  grac42[192.168.1.170]           grac41[192.168.1.101]           yes             
Result: Node connectivity passed for interface "eth1"

Check: TCP connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.1.102            grac42:192.168.1.59             passed          
  grac42:192.168.1.102            grac42:192.168.1.178            passed          
  grac42:192.168.1.102            grac42:192.168.1.170            passed          
  grac42:192.168.1.102            grac41:192.168.1.101            passed          
Result: TCP connectivity check passed for subnet "192.168.1.0"

Check: Node connectivity for interface "eth2"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42[192.168.2.102]           grac41[192.168.2.101]           yes             
Result: Node connectivity passed for interface "eth2"

Check: TCP connectivity of subnet "192.168.2.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.2.102            grac41:192.168.2.101            passed          
Result: TCP connectivity check passed for subnet "192.168.2.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed for subnet "192.168.2.0".
Subnet mask consistency check passed.
Result: Node connectivity check passed

Checking multicast communication...
Checking subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0" passed.
Checking subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0" passed.
Check of multicast communication passed.

Check: Time zone consistency 
Result: Time zone consistency check passed

Checking Oracle Cluster Voting Disk configuration...
ERROR: 
PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
--> Expected error as lower CRS stack is not completly up and running
grac41
Oracle Cluster Voting Disk configuration check passed

Checking Cluster manager integrity... 
Checking CSS daemon...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                running                 
  grac41                                not running             
ERROR: 
PRVF-5319 : Oracle Cluster Synchronization Services do not appear to be online.
Cluster manager integrity check failed
--> Expected error as lower CRS stack is not completely up and running

UDev attributes check for OCR locations started...
Result: UDev attributes check passed for OCR locations 
UDev attributes check for Voting Disk locations started...
Result: UDev attributes check passed for Voting Disk locations 

Check default user file creation mask
  Node Name     Available                 Required                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        22                        0022                      passed    
  grac41        22                        0022                      passed    
Result: Default user file creation mask check passed

Checking cluster integrity...
  Node Name                           
  ------------------------------------
  grac41                              
  grac42                              
  grac43                              

Cluster integrity check failed This check did not run on the following node(s): 
    grac41

Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR: 
PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
    grac41
--> Expected error as lower CRS stack is not completely up and running

Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful
ERROR: 
PRVF-4195 : Disk group for ocr location "+OCR" not available on the following nodes:
    grac41
--> Expected error as lower CRS stack is not completly up and running
NOTE: 
This check does not verify the integrity of the OCR contents. Execute 'ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check failed

Checking CRS integrity...

Clusterware version consistency passed
The Oracle Clusterware is healthy on node "grac42"
ERROR: 
PRVF-5305 : The Oracle Clusterware is not healthy on node "grac41"
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
CRS integrity check failed
--> Expected error as lower CRS stack is not completly up and running

Checking node application existence...
Checking existence of VIP node application (required)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        yes                       yes                       passed    
  grac41        yes                       no                        exists    
VIP node application is offline on nodes "grac41"

Checking existence of NETWORK node application (required)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        yes                       yes                       passed    
  grac41        yes                       no                        failed    
PRVF-4570 : Failed to check existence of NETWORK node application on nodes "grac41"
--> Expected error as lower CRS stack is not completly up and running

Checking existence of GSD node application (optional)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        no                        no                        exists    
  grac41        no                        no                        exists    
GSD node application is offline on nodes "grac42,grac41"

Checking existence of ONS node application (optional)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        no                        yes                       passed    
  grac41        no                        no                        failed    
PRVF-4576 : Failed to check existence of ONS node application on nodes "grac41"
--> Expected error as lower CRS stack is not completly up and running

Checking Single Client Access Name (SCAN)...
  SCAN Name         Node          Running?      ListenerName  Port          Running?    
  ----------------  ------------  ------------  ------------  ------------  ------------
  grac4-scan.grid4.example.com  grac43        true          LISTENER_SCAN1  1521          true        
  grac4-scan.grid4.example.com  grac42        true          LISTENER_SCAN2  1521          true        

Checking TCP connectivity to SCAN Listeners...
  Node          ListenerName              TCP connectivity?       
  ------------  ------------------------  ------------------------
  grac42        LISTENER_SCAN1            yes                     
  grac42        LISTENER_SCAN2            yes                     
TCP connectivity to SCAN Listeners exists on all cluster nodes

Checking name resolution setup for "grac4-scan.grid4.example.com"...

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes...
Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined
More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file
All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf"
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed

  SCAN Name     IP Address                Status                    Comment   
  ------------  ------------------------  ------------------------  ----------
  grac4-scan.grid4.example.com  192.168.1.165             passed                              
  grac4-scan.grid4.example.com  192.168.1.168             passed                              
  grac4-scan.grid4.example.com  192.168.1.170             passed                              

Verification of SCAN VIP and Listener setup passed

Checking OLR integrity...
Checking OLR config file...
ERROR: 
PRVF-4184 : OLR config file check failed on the following nodes:
    grac41
    grac41:Group of file "/etc/oracle/olr.loc" did not match the expected value. [Expected = "oinstall" ; Found = "root"]
Fix : 
[grid@grac41 ~]$ ls -l /etc/oracle/olr.loc
-rw-r--r--. 1 root root 81 May 11 14:02 /etc/oracle/olr.loc
root@grac41 Desktop]#  chown root:oinstall  /etc/oracle/olr.loc

Checking OLR file attributes...
OLR file check successful
OLR integrity check failed

Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "grid4.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
Public network subnets "192.168.1.0" match with the GNS VIP "192.168.1.0"
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.1.59" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "grid4.example.com" are reachable
PRVF-5216 : The following GNS resolved IP addresses for "grac4-scan.grid4.example.com" are not reachable: "192.168.1.168"
PRKN-1035 : Host "192.168.1.168" is unreachable
-->
GNS resolved IP addresses are reachable
GNS resolved IP addresses are reachable
GNS resolved IP addresses are reachable
GNS resolved IP addresses are reachable
Checking status of GNS resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  grac42        yes                       yes                     
  grac41        no                        yes                     
GNS resource configuration check passed
Checking status of GNS VIP resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  grac42        yes                       yes                     
  grac41        no                        yes                     

GNS VIP resource configuration check passed.

GNS integrity check passed
OCR detected on ASM. Running ACFS Integrity checks...

Starting check to see if ASM is running on all cluster nodes...
PRVF-5110 : ASM is not running on nodes: "grac41," 
--> Expected error as lower CRS stack is not completly up and running

Starting Disk Groups check to see if at least one Disk Group configured...
Disk Group Check passed. At least one Disk Group configured

Task ACFS Integrity check failed

Checking to make sure user "grid" is not in "root" group
  Node Name     Status                    Comment                 
  ------------  ------------------------  ------------------------
  grac42        passed                    does not exist          
  grac41        passed                    does not exist          
Result: User "grid" is not part of "root" group. Check passed

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                passed                  
  grac41                                failed                  
PRVF-9671 : CTSS on node "grac41" is not in ONLINE state, when checked with command "/u01/app/11204/grid/bin/crsctl stat resource ora.ctssd -init" 
--> Expected error as lower CRS stack is not completly up and running
Result: Check of CTSS resource passed on all nodes

Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed

Check CTSS state started...
Check: CTSS state
  Node Name                             State                   
  ------------------------------------  ------------------------
  grac42                                Observer                
CTSS is in Observer state. Switching over to clock synchronization checks using NTP

Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
Checking daemon liveness...
Check: Liveness for "ntpd"
  Node Name                             Running?                
  ------------------------------------  ------------------------
  grac42                                yes                     
Result: Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
Checking NTP daemon command line for slewing option "-x"
Check: NTP daemon command line
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac42                                yes                     
Result: 
NTP daemon slewing option check passed
Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
Check: NTP daemon's boot time configuration
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac42                                yes                     
Result: 
NTP daemon's boot time configuration check for slewing option passed
Checking whether NTP daemon or service is using UDP port 123 on all nodes
Check for NTP daemon or service using UDP port 123
  Node Name                             Port Open?              
  ------------------------------------  ------------------------
  grac42                                yes                     
NTP common Time Server Check started...
NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Checking on nodes "[grac42]"... 
Check: Clock time offset from NTP Time Server
Time Server: .LOCL. 
Time Offset Limit: 1000.0 msecs
  Node Name     Time Offset               Status                  
  ------------  ------------------------  ------------------------
  grac42        0.0                       passed                  
Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[grac42]". 
Clock time offset check passed
Result: Clock synchronization check using Network Time Protocol(NTP) passed
PRVF-9652 : Cluster Time Synchronization Services check failed
--> Expected error as lower CRS stack is not completly up and running

Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Post-check for cluster services setup was unsuccessful. 
Checks did not pass for the following node(s):
    grac41

Verify your DHCP setup ( only if using GNS )

[root@gract1 Desktop]#  cluvfy comp dhcp -clustername gract -verbose
Checking if any DHCP server exists on the network...
PRVG-5723 : Network CRS resource is configured to use DHCP provided IP addresses
Verification of DHCP Check was unsuccessful on all the specified nodes.
--> If network resource is ONLINE you aren't allowed to  run this command  

DESCRIPTION:
Checks if DHCP server exists on the network and is capable of providing required number of IP addresses. 
This check also verifies the response time for the DHCP server. The checks are all done on the local node. 
For port values less than 1024 CVU needs to be run as root user. If -networks is specified and it contains 
a PUBLIC network then DHCP packets are sent on the public network. By default the network on which the host 
IP is specified is used. This check must not be done while default network CRS resource configured to use 
DHCP provided IP address is online.

In my case even stopping nodeapps doesn't help .
Only a full cluster shutdown the command seems query the DHCP server !

[root@gract1 Desktop]#  cluvfy comp dhcp -clustername gract -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600
At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600
Sending DHCP "DISCOVER" packets for client ID "gract-scan2-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.169/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan2-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.169/255.255.255.0, lease time: 21600
Sending DHCP "DISCOVER" packets for client ID "gract-scan3-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.168/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan3-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.168/255.255.255.0, lease time: 21600
Sending DHCP "DISCOVER" packets for client ID "gract-gract1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.174/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-gract1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.174/255.255.255.0, lease time: 21600
CRS-10012: released DHCP server lease for client ID gract-scan1-vip on port 67
CRS-10012: released DHCP server lease for client ID gract-scan2-vip on port 67
CRS-10012: released DHCP server lease for client ID gract-scan3-vip on port 67
CRS-10012: released DHCP server lease for client ID gract-gract1-vip on port 67
DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was successful. 

The nameserver /var/log/messages shows the following: 
Jan 21 14:42:53 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:55 ns1 dhcpd: Wrote 6 leases to leases file.
Jan 21 14:42:55 ns1 dhcpd: DHCPREQUEST for 192.168.1.170 (192.168.1.50) from 00:00:00:00:00:00 via eth2
Jan 21 14:42:55 ns1 dhcpd: DHCPACK on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:55 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:56 ns1 dhcpd: DHCPOFFER on 192.168.1.169 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:56 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2

Reference :

Using tcpdump

Tracing  PUBLIC RAC  device for DHCP requests –

  • our DHCP server is running on port 67
[root@gract1 cvutrace]# tcpdump -i eth1 -vvv -s 1500 port 67
..
    gract1.example.com.bootpc > 255.255.255.255.bootps: [bad udp cksum 473!] BOOTP/DHCP, Request from 00:00:00:00:00:00 (oui Ethernet), length 368, xid 0xab536e31, Flags [Broadcast] (0x8000)
      Client-Ethernet-Address 00:00:00:00:00:00 (oui Ethernet)
      sname "gract-scan1-vip"
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 8
        Client-ID Option 61, length 16: "gract-scan1-vip"
        END Option 255, length 0
        PAD Option 0, length 0, occurs 102

11:25:25.480234 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 335)
    ns1.example.com.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 307, xid 0xab536e31, Flags [Broadcast] (0x8000)
      Your-IP 192.168.5.150
      Client-Ethernet-Address 00:00:00:00:00:00 (oui Ethernet)
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Offer
        Server-ID Option 54, length 4: ns1.example.com
        Lease-Time Option 51, length 4: 21600
        Subnet-Mask Option 1, length 4: 255.255.255.0
        Default-Gateway Option 3, length 4: 192.168.5.1
        Domain-Name-Server Option 6, length 4: ns1.example.com
        Time-Zone Option 2, length 4: -19000
        IPF Option 19, length 1: N
        RN Option 58, length 4: 10800
        RB Option 59, length 4: 18900
        NTP Option 42, length 4: ns1.example.com
        BR Option 28, length 4: 192.168.5.255
        END Option 255, length 0
11:25:25.481129 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.153 tell ns1.example.com, length 46
11:25:25.484070 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 396)
    gract1.example.com.bootpc > ns1.example.com.bootps: [bad udp cksum 8780!] BOOTP/DHCP, Request from 00:00:00:00:00:00 (oui Ethernet), length 368, xid 0x7f90997b, Flags [Broadcast] (0x8000)
      Client-IP 192.168.5.150
      Your-IP 192.168.5.150
      Client-Ethernet-Address 00:00:00:00:00:00 (oui Ethernet)
      sname "gract-scan1-vip"
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Release
        Server-ID Option 54, length 4: ns1.example.com
        Client-ID Option 61, length 16: "gract-scan1-vip"
        END Option 255, length 0
        PAD Option 0, length 0, occurs 100

Reference :

Using route command

Assume you want to route the traffic for network 192.168.5.0 
  through interface  eth1 serving  192.168.1.0 network

Verify current routing info :

[root@gract1 Desktop]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 08:00:27:29:54:EF  
          inet addr:192.168.1.111  Bcast:192.168.1.255  Mask:255.255.255.0

[root@gract1 ~]#  netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth2
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
192.168.2.0     0.0.0.0         255.255.255.0   U         0 0          0 eth2
192.168.3.0     0.0.0.0         255.255.255.0   U         0 0          0 eth3
[root@gract1 ~]# ping 192.168.5.50
PING 192.168.5.50 (192.168.5.50) 56(84) bytes of data.
From 192.168.1.111 icmp_seq=2 Destination Host Unreachable
From 192.168.1.111 icmp_seq=3 Destination Host Unreachable
From 192.168.1.111 icmp_seq=4 Destination Host Unreachable

Add routing info :
[root@gract1 ~]# ip route add 192.168.5.0/24 dev eth1
[root@gract1 ~]# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth2
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
192.168.2.0     0.0.0.0         255.255.255.0   U         0 0          0 eth2
192.168.3.0     0.0.0.0         255.255.255.0   U         0 0          0 eth3
192.168.5.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1

Verify that ping and nslookup are working 
[root@gract1 ~]# ping  192.168.5.50
PING 192.168.5.50 (192.168.5.50) 56(84) bytes of data.
64 bytes from 192.168.5.50: icmp_seq=1 ttl=64 time=0.929 ms
64 bytes from 192.168.5.50: icmp_seq=2 ttl=64 time=0.264 ms
--- 192.168.5.50 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.264/0.596/0.929/0.333 ms

[root@gract1 ~]# nslookup ns1
Server:        192.168.1.50
Address:    192.168.1.50#53
Name:    ns1.example.com
Address: 192.168.5.50

To delete the above created route run :
[root@gract1 ~]#  ip route del 192.168.5.0/24 dev eth1

Portable JNDI lookup of REMOTE EJBs

Overview

  • The EJBs described below are deployed as a JEE6 project to the following Application Servers
    • Glassfish 4.1 ( JEE6 / JEE7 )
    • Weblogic 12.1.3 ( JEE6 )
    • Wildfly 8.2 ( JEE6 / JEE7 )
  • Deleveloping 2 EJBs
    • BrokerEJB deployed as a simple EJB in a plain JAR file ( BrokerEJB.jar )
    • OnlineBroker is deployed as a Enterprise Application in an EAR file ( OnlineBroker.ear )
  • Both EBJs are deployed to all of the Application Servers (note here we do need any code changes )
  • The Remote client lookup needs different coding for different App Servers and Packaging ( JAR / EAR )

Server Setup BrokerEJB : deployed as a simple EJB in a plain JAR file

  • BrokerEJB.jar
 
   View  BrokerEJB.jar content : 
	   [oracle@wls1 ~]$ jar tvf ./NetBeansProjects/BrokerEJB/dist/BrokerEJB.jar
	     0 Thu Dec 25 10:20:42 CET 2014 META-INF/
	   103 Thu Dec 25 10:20:40 CET 2014 META-INF/MANIFEST.MF
	     0 Thu Dec 25 10:20:40 CET 2014 test/
	    48 Thu Dec 25 10:20:40 CET 2014 META-INF/jboss.xml
	   401 Thu Dec 25 10:20:40 CET 2014 META-INF/weblogic-ejb-jar.xml
	  1624 Thu Dec 25 10:20:40 CET 2014 test/Main.class
	   822 Thu Dec 25 10:20:40 CET 2014 test/StockBean.class
	   243 Thu Dec 25 10:20:40 CET 2014 test/StockBeanRemote.class

  test/StockBean.java ( Business Logic )
	package test;
	import javax.ejb.Stateless;
	import javax.ejb.Remote;
	@Stateless 
	@Remote(StockBeanRemote.class)
	public class StockBean implements StockBeanRemote  {

	 @Override
	public String get_stockprize(String stock_name) {
	   return "Message from BrokerEJB: Current share prize  for  "+" "+stock_name + " : 200 € ";
	   }
	}

   test/StockBeanRemote.class   (  Remote Business Interface )
	package test;
	import javax.ejb.Remote;
	@Remote
	public interface StockBeanRemote {
    	    public String get_stockprize(String name);
        }

Server Setup OnlineBroker : deployed as a Enterprise Application in an EAR file

  • OnlineBroker.ear
 
   View OnlineBroker.ear  content : 
	[oracle@wls1 ~]$  jar tvf ./NetBeansProjects/GIT/OnlineBroker/dist/OnlineBroker.ear 
	     0 Mon Dec 29 08:12:44 CET 2014 META-INF/
	   103 Mon Dec 29 08:12:42 CET 2014 META-INF/MANIFEST.MF
	   529 Mon Dec 29 08:12:42 CET 2014 META-INF/application.xml
	   418 Mon Dec 29 08:12:42 CET 2014 META-INF/weblogic-application.xml
	  2693 Mon Dec 29 08:12:42 CET 2014 OnlineBroker-ejb.jar
	  2574 Mon Dec 29 08:12:42 CET 2014 OnlineBroker-war.war

   View  OnlineBroker-ejb.jar content 
	  [oracle@wls1 ~]$ jar xvf ./NetBeansProjects/GIT/OnlineBroker/dist/OnlineBroker.ear OnlineBroker-ejb.jar ;  jar tvf OnlineBroker-ejb.jar
	   0 Mon Dec 29 08:12:42 CET 2014 META-INF/
	   103 Mon Dec 29 08:12:40 CET 2014 META-INF/MANIFEST.MF
	     0 Mon Dec 29 08:12:42 CET 2014 broker/
	   269 Mon Dec 29 08:12:40 CET 2014 META-INF/beans.xml
	   401 Mon Dec 29 08:12:40 CET 2014 META-INF/weblogic-ejb-jar.xml
	   841 Mon Dec 29 08:12:42 CET 2014 broker/StockBean2.class
	   247 Mon Dec 29 08:12:42 CET 2014 broker/StockBeanRemote2.class

   broker/StockBean2.java ( Business Logic )
	package broker;
	import javax.ejb.Stateless;
	import javax.ejb.Remote;

	@Stateless 
	@Remote(StockBeanRemote2.class)
	public class StockBean2 implements StockBeanRemote2 
	{
	    @Override
	    public String get_stockprize(String stock_name) 
	    {
		return "Message from EAR BrokerEJB : Current share prize  for  "+" "+stock_name + " : 200 € ";
	    }
	} 

   broker/StockBeanRemote2.java (  Remote Business Interface )
	package broker;
	import javax.ejb.Remote;

        @Remote
	public interface StockBeanRemote2 
	{
	    public String get_stockprize(String name);
	}

 

Details running a  Stand-alone REMOTE EJB client using JNDI on Glassfish 4.1

  • The Glassfish 41 standalone JNDI client needs  more than a single JAR  gf-client.jar  to operate as a full JNDI client
  • The client lib gf-client.jar  is located at :  appclient/glassfish/lib
  • Other needed JAR files can be found at : appclient/glassfish/modules/ [ like glassfish-naming.jar ]
  • This makes Glassfish 41 not very handy compared to Weblogic 12.1.3 and WildFly 8.2 where we need only a single client JAR
  • Portable JNDI Name  ( for JAR file destribution) : java:global/BrokerEJB/StockBean!test.StockBeanRemote
  • Portable JNDI Name  ( for EAR file destribution) : java:global/OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2
Create a standalone  client with Glassfish libs using the package-appclient Script
You can package the GlassFish Server system files required to launch application clients on remote systems into a 
single JAR file using the package-appclient script
[root@wls1 ~]# /usr/local/glassfish-4.1/glassfish/bin/package-appclient
Creating /usr/local/glassfish-4.1/glassfish/lib/appclient.jar
-> copy appclient.jar to the remote system and unzip that file --> /home/oracle/JEE7/lib/

Verify JNDI settings by exploring the GlassFish 4.1 server 
[oracle@wls1 GLASSFISH_EJB]$ asadmin list-jndi-entries --context java:global
     BrokerEJB: com.sun.enterprise.naming.impl.TransientContext
     OnlineBroker: com.sun.enterprise.naming.impl.TransientContext
  --> BrokerEJB and OnlineBroker are listed are accessible as global JNDI names

Invoke GlassFish management console : http://localhost:4848/common/index.jsf 
Verify  EAR lookup:
 --> Applications -> OnlineBroker
    -> Modules and Components : OnlineBroker-ejb.jar     StockBean2     StatelessSessionBean
This translates to followowing global JNDI naming
   "java:global/OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2"; 

Verify JAR lookup:    
 --> Applications -> BrokerEJB
    -> Modules and Components :  BrokerEJB     StockBean2     StatelessSessionBean
 This translates to followowing global JNDI naming
          "java:global/BrokerEJB/StockBean!test.StockBeanRemote"; 

JNDI client Java code 
./EjbClient.java                - Remote JNDI Client program
./test/StockBeanRemote.java     - Remote Interface for the JAR test 
./broker/StockBeanRemote2.java  - Remote Interface for the EAR test 

Java Code extract :  ./EjbClient.java  
            Properties prop = new Properties();
            prop.put("org.omg.CORBA.ORBInitialHost","wls1.example.com"); 
            prop.put("org.omg.CORBA.ORBInitialPort","3700");
            prop.put("java.naming.factory.initial","com.sun.enterprise.naming.SerialInitContextFactory");
            prop.put("java.naming.factory.url.pkgs","com.sun.enterprise.naming");
            prop.put("java.naming.factory.state","com.sun.corba.ee.impl.presentation.rmi.JNDIStateFactoryImpl");
            ctx = new InitialContext(prop);

              // GlassFish 41 EJB Remote lookup for single EJB class named StockBean
              // EJB name container: BrokerEJB.jar 
              // Bean name         : StockBean2
              // Java package      : test  
              // Remote Interface  : StockBeanRemote2
              // This translates to the following global JNDI naming: 

            String ejb_class_name = "java:global/BrokerEJB/StockBean!test.StockBeanRemote";  
            System.out.println("\n-> EJB remote lookup ( JAR file destribution) : " + ejb_class_name);
            Object o = ctx.lookup(ejb_class_name);
            StockBeanRemote stockb =(StockBeanRemote)o;
            System.out.println(stockb.get_stockprize("Google"));

              // GlassFish 41 EJB Remote lookup for single EJB class named StockBean
              // java:global.shop1.shop1-ejb.StockBean2!myshop.StockBeanRemote2
              // EAR name:         : OnlineBroker.ear
              // EJB name container: OnlineBroker-ejb.jar 
              // Bean name         : StockBean2
              // Java package      : broker
              // Remote Interface  : StockBeanRemote2
              // This translates to the following global JNDI naming: 
            String ejb_ear_name = "java:global/OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2"; 
            System.out.println("-> EJB remote lookup ( EAR distribution) : " + ejb_ear_name);
            Object o2 = ctx.lookup(ejb_ear_name);
            StockBeanRemote2 stockb2 =(StockBeanRemote2)o2;
            System.out.println(stockb2.get_stockprize("Google"));
..
Remote Interface for EAR distribution - broker/StockBeanRemote2.java )
package broker;
import javax.ejb.Remote;
@Remote
public interface StockBeanRemote2 {
    public String get_stockprize(String name);
}

Remote Interface for plain JAR distribution - test/StockBeanRemote.java )
package test;
import javax.ejb.Remote;

@Remote
public interface StockBeanRemote {
    public String get_stockprize(String name);
}

Full Glassfish 41 client can be found here .

Compile and run testcase:
CLASSPATH=.:/home/oracle/JEE7/lib/appclient/glassfish/lib/gf-client.jar
+ javac test/StockBeanRemote.java
+ javac broker/StockBeanRemote2.java
+ javac EjbClient.java
+ /home/oracle/JEE7/lib/appclient/glassfish/bin/appclient EjbClient

Output: 
-> EJB remote lookup ( JAR file destribution) : java:global/BrokerEJB/StockBean!test.StockBeanRemote
Message from BrokerEJB: Current share prize  for   Google : 200 € 
-> EJB remote lookup ( EAR distribution) : java:global/OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2
Message from Broker_EAR_EJB : Current share prize  for   Google : 200 €

Details running a  Stand-alone REMOTE EJB client using JNDI for Oracle WebLogic Server 12.1.3

  • WebLogic Server 12.1.3 is not using IIOP insteat uses the WebLogic T3 protocol
  • WebLogic Server 12.1.3 only need  wlthint3client.jar  to operate as a full JNDI client
  • Portable JNDI Name  ( for JAR file destribution) : java:global/classes/StockBean!test.StockBeanRemote
  • Portable JNDI Name  ( for EAR file destribution) : java:global.OnlineBroker.OnlineBroker-ejb.StockBean2!broker.StockBeanRemote2
Details about WebLogic Thin T3 Client
The WebLogic full client, wlfullclient.jar, is deprecated as of WebLogic Server 12.1.3 and may be removed in a future release.
Oracle recommends using the WebLogic Thin T3 client or other appropriate client depending on your environment. For more
information on WebLogic client types, see WebLogic Server Client Types and Features.

Understanding the WebLogic Thin T3 Client
The WebLogic Thin T3 Client jar (wlthint3client.jar) is a light-weight, high performing alternative to the wlfullclient.jar
and wlclient.jar (IIOP) remote client jars. The Thin T3 client has a minimal footprint while providing access to a rich set of
APIs that are appropriate for client usage. As its name implies, the Thin T3 Client uses the WebLogic T3 protocol, which provides
significant performance improvements over the wlclient.jar, which uses the IIOP protocol.

Verify JNDI settings by exploring the WebLogic 12.1.3 server with  Admin Console : http://wls1.example.com:7001 
  Domain (wl_server ) -> Enviroments -> Servers -> AdminServer(admin) -> View JNDI Tree ( right below Save button ) 

  EAR lookup: [ Enterprise Application: EJB + WAR ]
    java:global -> shop1 -> shop1-ejb -> StockBean2!myshop -> StockBeanRemote2
    Binding Name: java:global.OnlineBroker.OnlineBroker-ejb.StockBean2!broker.StockBeanRemote2

  JAR lookup: [ EJB ]    
    java:global -> classes -> StockBean!test -> StockBeanRemote
    Binding Name: java:global.classes.StockBean!test.StockBeanRemote
    --> Expect to get java:global.brokerEJB.StockBean!test.StockBeanRemote instead of 
        java:global.classes.StockBean!test.StockBeanRemote

Java code
./EjbClient.java                - Remote JNDI Client program
./test/StockBeanRemote.java     - Remote Interface for the JAR test
./broker/StockBeanRemote2.java  - Remote Interface for the EAR test

Java Code extract :  ./EjbClient.java  
        Properties prop = new Properties();
        prop.put("java.naming.factory.initial", "weblogic.jndi.WLInitialContextFactory");
        prop.put("java.naming.provider.url","t3://wls1.example.com:7001");
        prop.put("java.naming.security.principal","weblogic");
        prop.put("java.naming.security.credentials","helmut11");
            ctx = new InitialContext(prop);

              // Weblogic 12.1.3 EJB Remote lookup for single EJB class named StockBean
              // EJB name container: brokerEJB.jar 
              // Bean name         : StockBean2
              // Java package      : test  
              // Remote Interface  : StockBeanRemote2
              // This translates to the following global JNDI naming: 
            String ejb_class_name = "java:global/classes/StockBean!test.StockBeanRemote";  
            System.out.println("\n-> EJB remote lookup ( JAR file destribution) : " + ejb_class_name);
            Object o = ctx.lookup(ejb_class_name);
            StockBeanRemote stockb =(StockBeanRemote)o;
            System.out.println(stockb.get_stockprize("Google"));

              // Weblogic 12.1.3 EJB Remote lookup for single EJB class named StockBean
              // java:global.shop1.shop1-ejb.StockBean2!myshop.StockBeanRemote2
              // EAR name:         : OnlineBroker.ear
              // EJB name container: OnlineBroker-ejb.jar 
              // Bean name         : StockBean2
              // Java package      : broker
              // Remote Interface  : StockBeanRemote2
              // This translates to the following global JNDI naming: 
            String ejb_ear_name = "java:global.OnlineBroker.OnlineBroker-ejb.StockBean2!broker.StockBeanRemote2"; 
            System.out.println("-> EJB remote lookup ( EAR distribution) : " + ejb_ear_name);
            Object o2 = ctx.lookup(ejb_ear_name);
            StockBeanRemote2 stockb2 =(StockBeanRemote2)o2;
            System.out.println(stockb2.get_stockprize("Google"));

..
Remote Interface for EAR distribution - broker/StockBeanRemote2.java )
package broker;
import javax.ejb.Remote;
@Remote
public interface StockBeanRemote2 {
    public String get_stockprize(String name);
}

Remote Interface for plain JAR distribution - test/StockBeanRemote.java )
package test;
import javax.ejb.Remote;

@Remote
public interface StockBeanRemote {
    public String get_stockprize(String name);
}

Full Weblogic 12.1.3  client can be found: here .

Compile and run testcase
CLASSPATH=.:./lib/wlthint3client.jar
+ javac test/StockBeanRemote.java
+ javac broker/StockBeanRemote2.java
+ javac EjbClient.java
+ java EjbClient

Output :
-> EJB remote lookup ( JAR file destribution) : java:global/classes/StockBean!test.StockBeanRemote
Message from BrokerEJB: Current share prize  for   Google : 200 € 
-> EJB remote lookup ( EAR distribution) : java:global.OnlineBroker.OnlineBroker-ejb.StockBean2!broker.StockBeanRemote2
Message from Broker_EAR_EJB : Current share prize  for   Google : 200 €

Reference

 

Details running a  Stand-alone REMOTE EJB client using JNDI for WildFly 8.2

  • Wildfy 8.2 only needs jboss-client.jar to run an EJB remote client
  • Starting WildFly 8, the JNP project is not used. Neither on the server side nor on the client side.
  • The client side of the JNP project has now been replaced by jboss-remote-naming project
  • Portable JNDI Name  ( for JAR file destribution) : BrokerEJB/StockBean!test.StockBeanRemote
  • Portable JNDI Name  ( for EAR file destribution) : OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2
  • EJB remotely accessible are such objects which can found under java:jboss/exported/ namespace.

 

Know Bugs/Problems
Closing context with Wildfly 8.2 crashes with following stack  dumped to stderr 
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@1e4000e7 rejected from java.util.concurrent.Th
readPoolExecutor@7bfb4d34[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)

Crash in context close ( ctx.close() ) is  a JBOSS/Wildfly problem only.
For details please read following BUG https://issues.jboss.org/browse/EJBCLIENT-98
java.util.concurrent.RejectedExecutionException if a remote-naming InitialContext should be closed
Problem code: ctx.close() 
private static void close_context(Context ctx)
    {
        if ( ctx != null)
        { 
            try
            {
                ctx.close();
                System.out.println("Context closed()   " ); <-- This is still reached
            }

Verify JNDI settings by exploring the WildFly 8.2 server with  Admin Console :  localhost:9990
 EAR lookup:
   Runtime -> JNDI View -> java:global -> OnlineBroker -> OnlineBroker-ejb -> StockBean2!broker.StockBeanRemote2
   This translates to followowing global JNDI naming
       "OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2"; 
   Note the java:global/ prefix needs to be removed 

 JAR lookup:    
   Runtime -> JNDI View -> java:global -> BrokerEJB -> StockBean2!broker.StockBeanRemote2
    -> Modules and Components :  BrokerEJB     StockBean2     StatelessSessionBean
   This translates to followowing global JNDI naming
            "BrokerEJB/StockBean!test.StockBeanRemote";  
   Note the java:global/ prefix needs to be removed 

  Under Deployments you should see BrokerEJB.jar and OnlineBroker.ear !

Java code
./EjbClient.java                - Remote JNDI Client program
./test/StockBeanRemote.java     - Remote Interface for the JAR test
./broker/StockBeanRemote2.java  - Remote Interface for the EAR test

Java Code extract :  ./EjbClient.java  
..
        Properties prop = new Properties();
        prop.put(Context.INITIAL_CONTEXT_FACTORY, "org.jboss.naming.remote.client.InitialContextFactory");
        prop.put(Context.PROVIDER_URL, "http-remoting://127.0.0.1:8180");
        prop.put(Context.SECURITY_PRINCIPAL, "oracle");
        prop.put(Context.SECURITY_CREDENTIALS, "helmut11");
        prop.put("jboss.naming.client.ejb.context", true);
        ctx = new InitialContext(prop);

              // WildFly 8.2 EJB Remote lookup for single EJB class named StockBean
              // EJB name container: BrokerEJB.jar 
              // Bean name         : StockBean2
              // Java package      : test  
              // Remote Interface  : StockBeanRemote2
              // This translates to the following global JNDI naming: 

            String ejb_class_name = "BrokerEJB/StockBean!test.StockBeanRemote";  
            System.out.println("\n-> EJB remote lookup ( JAR file destribution) : " + ejb_class_name);
            Object o = ctx.lookup(ejb_class_name);
            StockBeanRemote stockb =(StockBeanRemote)o;
            System.out.println(stockb.get_stockprize("Google"));

              // WildFly 82 EJB Remote lookup for single EJB class named StockBean
              // java:global.shop1.shop1-ejb.StockBean2!myshop.StockBeanRemote2
              // EAR name:         : OnlineBroker.ear
              // EJB name container: OnlineBroker-ejb.jar 
              // Bean name         : StockBean2
              // Java package      : broker
              // Remote Interface  : StockBeanRemote2
              // This translates to the following global JNDI naming: 
            String ejb_ear_name = "OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2"; 
            System.out.println("-> EJB remote lookup ( EAR distribution) : " + ejb_ear_name);
            Object o2 = ctx.lookup(ejb_ear_name);
            StockBeanRemote2 stockb2 =(StockBeanRemote2)o2;
            System.out.println(stockb2.get_stockprize("Google"));

Remote Interface for plain JAR distribution - test/StockBeanRemote.java )
package test;
import javax.ejb.Remote;

@Remote
public interface StockBeanRemote {
    public String get_stockprize(String name);
}

Full WildFly 8.2  client can be found:  here.

Compile and run testcase
CLASSPATH=:./lib/jboss-client.jar
+ javac test/StockBeanRemote.java
+ javac broker/StockBeanRemote2.java
+ javac EjbClient.java
+ java EjbClient
Output 
-> EJB remote lookup ( JAR file destribution) : BrokerEJB/StockBean!test.StockBeanRemote
Message from BrokerEJB: Current share prize  for   Google : 200 € 
-> EJB remote lookup ( EAR distribution) : OnlineBroker/OnlineBroker-ejb/StockBean2!broker.StockBeanRemote2
Message from EAR BrokerEJB : Current share prize  for   Google : 200 €

Reference