GIPC Defects

  • For Detail read  : List of gipc defects that prevent GI from starting/joining after private network is restored or node rebooted (Doc ID 1488378.1)

Bug 9593552   – fixed in 11.2.0.2 GI PSU3, 11.2.0.3 and above, crsd fails to join, refer to note 1337730.1 for details

 
  Note      : CRSD Fails to Start due to GIPC Communication Failure with Master (Doc ID 1337730.1)
  BUG       : Bug 9593552 : GIPCCONNECT IS NOT ASYNC 11.2.0.2GIBTWO
  gipcd.log : gipchaLowerProcessNode: no valid interfaces found
  crsd.log  : gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound
                    Invoking member kill 
 Root Cause : BUG 9593552 is fixed in 11.2.0.2 PSU3, 11.2.0.3 and above

Bug 12720728 – fixed in 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU3, 11.2.0.4 and above, cssd fails to join, refer to note 1352887.1 for details

 
  Node        : 11gR2 Grid Infrastructure Node May not Join the Cluster After Evicted With Error sgipcnUdpSend "No buffer space available (74)" (Doc ID 1352887.1)
  BUG         : 12720728 : GIPCHALOWERPROCESSNODE: NO VALID INTERFACES FOUND TO NODE
  ocssd.log   :  [ GIPCNET][1543] gipcmodNetworkProcessSend: slos op  :  sgipcnUdpSend
                 [ GIPCNET][1543] gipcmodNetworkProcessSend: slos dep :  No buffer space available (74)   ==>> key rediscovery error
                 [GIPCHALO][1543] gipchaLowerProcessNode: no valid interfaces found to node for 595773 ms, node 111c093d0 { host 'racnode2', haName 'CSS_fcrprd', srcLuid 9f9bc4e8-26101e05, 
                                   dstLuid 3559ec4f-06cd6c73 numInf 0, contigSeq 124472, lastAck 124393, lastValidAck 124471, sendSeq [125035 : 125035], createTime 1729131589, flags 0x2408 }
                 [GIPCHALO][1543] gipchaLowerProcessNode: bootstrap node considered dead because of idle connection time 600001 ms, node 111c093d0 { host 'racnode2', haName 'CSS_fcrprd', 
                                  srcLuid 9f9bc4e8-26101e05, dstLuid 3559ec4f-06cd6c73 numInf 0, contigSeq 124472, lastAck 124393, lastValidAck 124471, sendSeq [125038 : 125038], createTime 1729131589, flags 0x2408 }
  Bug Descr.  :  CSSD may report the following errors if a sendto() system call fails due 
                 to some underlying UDP issues at the OS level:
                 gipcmodNetworkProcessSend: slos op  :  sgipcnUdpSend
                 gipcmodNetworkProcessSend: slos dep :  No buffer space available (74)
                 This fix enables the CSSD to handle this error and retry the sendto() operation.

Bug 13334158 – fixed in 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4 and above, cssd fails to join, refer to note 1456977.1 for details

   
 Node         : 11gR2 GI CSS is not Coming up After Private Network Related Problem Recovered due to gipc Issue (Doc ID 1456977.1)  
  BUG          : Bug 13334158 : REBOOT OF ONE OF THE SWITCH EVICTS INSTANCES
  ocssd.log    : 2012-03-20 21:04:45.369: [GIPCHGEN][1102465344] gipchaInterfaceFail: marking interface failing 0x1ac10730 { host '', haName 'CSS_crsrtmpdrdbdm', local (nil), ip '192.168.224.1', subnet '192.168.224.0', 
                 mask '255.255.255.0', mac '00-17-a4-77-88-48', ifname 'bond1', numRef 4, numFail 0, idxBoot 0, flags 0x184d }

                [GIPCHGEN][1102465344] gipchaInterfaceFail: marking interface failing 0x1ac10730 { host '', haName 'CSS_crsrtmpdrdbdm', local (nil), ip '192.168.224.1',
                           subnet '192.168.224.0', mask '255.255.255.0', mac '00-17-a4-77-88-48', ifname 'bond1', numRef 4, numFail 0, idxBoot 0, flags 0x184d }
               or 
                 [GIPCHGEN][8] gipchaInterfaceDisable: disabling interface 102352750 { host 'racnode2', haName 'CSS_crs-webyours', local 101525550, ip '192.168.104.131:21974', 
                         subnet '192.168.104.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0xa6 }
                 [GIPCHALO][8] gipchaLowerProcessNode: no valid interfaces found to node for 25839 ms, node 101463b90 { host 'nxswdd02', haName 'CSS_crs-webyours', 
                         srcLuid af93e29f-a31d34d8, dstLuid 6e41cc16-5dd93ae6 numInf 1, contigSeq 75317, lastAck 75301, lastValidAck 75317, sendSeq [75302 : 75360], 
                         createTime 111734562, sentRegister 1, localMonitor 1, flags 0x408 }
  Bug Descr.  : This problem is introduced in 11.2.0.2 GIPSU 3 and 11.2.0.3 by the fix for bug 10231906 .
                This fix supersedes that fix - for interim patches use this fix instead of that one.
                           After a network problem, the network information in gipc is not restored, causing communication problems between the clusterware processes.

  Rediscovery Notes :
     1. processes like CRSD should show that the network endpoint is closed or the interface is invalidated:
      [GIPCHDEM][1112189248] gipchaDaemonProcessHAInvalidate: completed ha name invalidate for node 0x2aaaac01fd80 
      { host 'node1', haName '9ef5-c63e-d216-3b7f', srcLuid 259e2eb0-52aca06a, dstLuid cc2582a4-bce1999e numInf 1, 
        contigSeq 290425, lastAck 281019, lastValidAck 290424, sendSeq [281019 : 281019], createTime 4294577560, 
        sentRegister 1, localMonitor 0, flags 0x28 }

     2. gipcd log shows tthat a problem was found and the interface disabled:
      [ GIPCNET][1109211456] gipcmodNetworkProcessSend: [network]  failed send attempt endp 0x17d49a20
      [00000000000002e0] { gipcEndpoint : localAddr 'udp://172.16.30.101:13707', remoteAddr '', numPend 5, numReady 1, 
        numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, req 0x17f66090 [00000000053ad8fb] 
       { gipcSendRequest : addr 'udp://IP address:16195', data 0x17f68648, len 80, olen 0, parentEndp 0x17d49a20, ret gipcretFail (1), 
         objFlags 0x0, reqFlags 0x2 }
     ...
      [GIPCHGEN][1109211456] gipchaInterfaceDisable: disabling interface 0x2aaaac231420 { host 'esemdmdb1', haName 'gipcd_ha_name', 
        local (nil), ip '172.16.30.100:16195',subnet 'IP address', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
      [GIPCHALO][1109211456] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaaac231420 
       { host 'esemdmdb1', haName 'gipcd_ha_name', local (nil), ip 'IP address:16195', subnet 'IP address', mask 'IP address0', mac '',
        ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x226 }
      [GIPCDCLT][1075992896] gipcdDeleteAllInterfaces: interface (ip: IP address:56120, mask: 255.255.255.0, subnet: IP address, mac: , ifname: ) deleted
      [GIPCDCLT][1075992896] gipcdDeleteAllInterfaces: interface (ip: IP address:47081, mask: 255.255.255.0, subnet: IP address, mac: , ifname: ) deleted

      but the network interface is not restored.

    Workaround : Reboot the machine

Bug 13811209 – fixed in 11.2.0.3 GI PSU3, 11.2.0.4 and above, cssd fails to join, refer to note 1456977.1 for details

 
  Note        :11gR2 Grid Infrastructure CSS fails to start after recovered from cluster_interconnect (network adapter, cable, switch etc) related problems
  ocssd.log   : from surviving node
               [GIPCHGEN][1102465344] gipchaInterfaceFail: marking interface failing 0x1ac10730 { host '', haName 'CSS_crsrtmpdrdbdm', local (nil), ip '192.168.224.1', 
                      subnet '192.168.224.0', mask '255.255.255.0', mac '00-17-a4-77-88-48', ifname 'bond1', numRef 4, numFail 0, idxBoot 0, flags 0x184d }

        OR
               [GIPCHGEN][8] gipchaInterfaceDisable: disabling interface 102352750 { host 'racnode2', haName 'CSS_crs-webyours', local 101525550, ip '192.168.104.131:21974',
                   subnet '192.168.104.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0xa6 }

               [GIPCHALO][8] gipchaLowerProcessNode: no valid interfaces found to node for 25839 ms, node 101463b90 { host 'nxswdd02', haName 'CSS_crs-webyours', 
                    srcLuid af93e29f-a31d34d8, dstLuid 6e41cc16-5dd93ae6 numInf 1, contigSeq 75317, lastAck 75301, lastValidAck 75317, sendSeq [75302 : 75360], createTime 111734562, sentRegister 1, localMonitor 1, flags 0x408 }
   Related   : bug 13334158 is fixed in 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4 and above (see above BUG description )  
               bug 13811209 is fixed in 11.2.0.3 GI PSU3, 11.2.0.4 and above ( This is the continuation of Bug13334158, the fix in Bug13334158 is incomplete )

bug 13653178 – fixed in 11.2.0.3 GI PSU5, 11.2.0.4 and above, cssd fails to join, refer to note 1479380.1 for details.

  
   The fix caused regression and has been  superseded by bug 16547309, refer to note 1564555.1 for details.
   Bug       : 16547309 : GIPC SHOWS RANK 0 OR -1 AFTER APPLIED PSU 11.2.0.3.5
               Multicast is not working for private network for 11.2.0.2.x (expected behavior) or 11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 (due to Bug 16547309)     

   Note      :  11gR2 Grid Infrastructure, cssd fails to join the cluster (GI fails to start as a result) after recovered from private network failure caused by pulling cluster 
                interconnect cables etc
                11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 CSSD Fails to Start if Multicast Fails on Private Network (Doc ID 1564555.1)
   Bug desc   :  After applying 11.2.0.3.5 cssd can not establish connection with peer  cssd using broadcast since broadcast address is not created with correct
                 multicast port number. this issue does not happen where multicast is enabled 
   OS /var/log/messages - node1
                Jul 16 16:12:54 <0.6> racnode1 kernel: e1000e: pci1p2 NIC Link is Down
                Jul 16 16:12:55 <0.6> racnode1 kernel: bonding: bond0: link status definitely down for interface pci1p2, disabling it
                Jul 16 16:12:55 <0.6> racnode1 kernel: bonding: bond0: now running without any active interface !                                    ##>> private network failed
                 --..
                Jul 16 16:15:00 <0.6> racnode1 kernel: bnx2 0000:02:00.1: em2: NIC Copper Link is Up, 1000 Mbps full duplex         ##>> private network recovered

   gipcd.log : [GIPCDCLT][1086585152] gipcdRawInterfaceUpdates: ([update(ip: 192.168.44.5, mask: 255.255.255.0, subnet: 192.168.44.0, mac: 00-26-55-52-75-32,
                                     ifname: bond0), state(gipcdadapterstateUp)])
               [GIPCDMON][1106753856] gipcdMonitorSaveInfMetrics: inf[ 0]  bond0                - rank    0, avgms 30000000000.000000 [ 4 / 0 / 0 ]    
                         ##>> rank stayed 0 after network is restored

               [GIPCDMON][1106753856] gipcdMonitorSaveInfMetrics: inf[ 0]  bond0                - rank    0, avgms 30000000000.000000 [ 19 / 0 / 0 ]
               [ CLSINET][1106753856] Returning NETDATA: 1 interfaces
               [ CLSINET][1106753856] # 0 Interface 'bond0',ip='192.168.44.5',mac='00-26-55-52-75-32',mask='255.255.255.0',net='192.168.44.0',use='cluster_interconnect'
                  ..
               [GIPCDMON][1106753856] gipcdMonitorSaveInfMetrics: inf[ 0]  bond0                - rank   -1, avgms 30000000000.000000 [ 0 / 0 / 0 ]       
                  ##>> rank changed to "-1" 10 minutes after the failure although it was restored a few minutes earlier

  REDISCOVERY INFORMATION:
              enable GIPC_TRACE_LEVEL=3 and look at ocssd log. check if message from gipcInternalAddress shows IP address like below.
              { gipcAddress : name 'udp://192.10.100.255:58375:42424', objFlags 0x0, addrFlags 0x0 }
              the correct address should be 'udp://192.10.100.255:42424'
  WORKAROUND: enable MULTICAST
  Multicast is not working for private network for 11.2.0.2.x (expected behavior) or 11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 (due to Bug 16547309)

Bug 16867451 : SOLX64-11.2.0.4-CSS: CSSD DID NOT COME BACK AFTER RESUME ONE OF PRIVATE NETWORKS

 
Fixed in 11.2.0.4, 12.1.0.2 onward, GI does not start after recovery of private network
Duplicate bug :  17831538
  gipcd.log :
          [GIPCDMON][7] gipcdMonitorUpdate: interface  DOWN -  [ ip 192.168.1.101, subnet 192.168.1.0, mask 255.255.255.0, 
                      mac00-21-28-25-a2-09-00-00-00-00-00-00-00-00-2f-00-00-00-00-00, ifname nge1 ]
          [GIPCDMON][7] gipcdMonitorUpdate: interface  DOWN - [ ip 192.168.2.101, subnet 192.168.2.0, mask 255.255.255.0, mac 
                      00-21-28-25-a2-0a-00-00-00-00-00-00-00-00-2f-00-00-00-00-00, ifname e1000g2 ]
          [GIPCDMON][7] gipcdMonitorUpdate: interface  UP - [ ip 192.168.2.101, subnet 192.168.2.0, mask 255.255.255.0, mac 
                     00-21-28-25-a2-0a-00-00-00-00-00-00-00-00-2f-00-00-00-00-00, ifname e1000g2 ]
  cssd.log                    
           [GIPCHALO][8] gipchaLowerProcessNode: no valid interfaces found to node for 392 4690957 ms, node 18cab50 { host 'popen1', haName 'CSS_popen-c13', srcLuid 
              60abf35b-e660fd80, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 :  0], createTime 3924690956,  sentRegister 0, localMonitor 1, flags 0x4 }
           [    CSSD][18]clssnmvDHBValidateNcopy: node 1,  popen1, has a disk HB, but no network HB, DHB has rcfg 265080551, wrtcnt, 9452, LATS 3924691956, lastSeqNo 
                 9451, uniqueness 1369643316,

Bug 14693336 : GI does not start after recovery of private network ( Duplicate bug 19125577 bug 18667717 )

 
Bug 14693336 : THE CONNECTION IN GM LAYER FAILS IN GIPC AFTER NIC RESUME
Test Env     : Two nodes rwsbc03/04 involved in the test and 2 private nics(eth4/eth5)   configured, bring down eth4 and eth5 on rwsbc04 from console, after CSSD  
                aborted on rwsbc04, enable eth4 on rwsbc04;
              $ oifcfg getif
               eth0  10.209.0.0  global  public
               eth2  10.196.108.0  global  asm
               eth4  192.168.4.0  global  cluster_interconnect
               eth5  192.168.5.0  global  cluster_interconnect
             Timestamps to disable/enable private network on rwsbc04
               Dec 23 16:40:17 rwsbc04 eth4: NIC Copper Link is Down
               Dec 23 16:43:18 rwsbc04 eth5: NIC Copper Link is Down
               Dec 23 16:49:35 rwsbc04 eth4: NIC Copper Link is Up
Fixed  in :   fixed in 11.2.0.4 GI PSU2, 12.1.0.2

Leave a Reply

Your email address will not be published. Required fields are marked *