Testing UCP Connection Pool with FCF against a RAC 12.1.0.2 database

Overview

  • Set CLASSPATH : $ export CLASSPATH=$ORACLE_HOME/jdbc/lib/ojdbc7.jar:.:$ORACLE_HOME/ucp/lib/ucp.jar:$ORACLE_HOME//opmn/lib/ons.jar
  •  Always use the newest version of:  ojdbc7.jar ucp.jar ons.jar [ 12.1.0.2 ]
  • Download location for java test program:  UcpRacTest.java

Run java test program and monitor UCP pool after and before instance shutdown

Check current instance status 
[oracle@hract21 ~]$ srvctl status database  -db banka
Instance bankA_1 is running on node hract21
Instance bankA_2 is running on node hract22

Java code to create UCP connection pool supporting FCF 
 public UcpRacTest() throws SQLException
    {
    // Create pool-enabled data source instance.
    pds = PoolDataSourceFactory.getPoolDataSource();
        // PoolDataSource and UCP configuration
         //set the connection properties on the data source and pool properties
    String ONS_CONFIG = "nodes=hract21:6200,hract22:6200,hract23:6200";
    pds.setONSConfiguration ( ONS_CONFIG);
    pds.setFastConnectionFailoverEnabled( true);
    pds.setUser("scott");
    pds.setPassword("tiger");
    pds.setURL("jdbc:oracle:thin:@ract2-scan.grid12c.example.com:1521/banka");
    pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSource");
    pds.setInitialPoolSize(5);
    pds.setMinPoolSize(5);
    pds.setMaxPoolSize(20);
    System.out.println("--> UCP Pool wtih FCF sucessfuly initialized !");
    }

Java code to check closed connections 
  public String getInstanceInfo (Connection c) throws SQLException
    {
        String query1= "select name from v$database";
        String query2 = "select host_name,instance_name from v$instance";
        String query = null;

        String inst_info ="";
      // Statement stmt = c.createStatement();
        try ( Statement stmt =  c.createStatement() ) 
        {
            inst_info ="   RAC DB: ";
            query = query1;
            ResultSet rset = stmt.executeQuery (query);
            rset.next ();
            inst_info = inst_info +  rset.getString (1) ;

            query = query2;
            rset = stmt.executeQuery (query);
            while( rset.next() )
                inst_info =  inst_info + " - Instance Name:" +  rset.getString (2)  + " - Host: " +  rset.getString(1) ;
        } catch ( SQLException e1)
          {
            throw e1;
          }
    return inst_info;
    }

   public void displayPoolDetails (Connection c) throws SQLException
     {
     // System.out.println("-----------");
     System.out.print("NumberOfAvailableConnections: " + pds.getAvailableConnectionsCount());
     System.out.println(" - BorrowedConnectionsCount: " + pds.getBorrowedConnectionsCount());
     if ( c != null )
         System.out.println(getInstanceInfo(c));
     }
               
Run test program: 
[oracle@hract21 UCP]$  java UcpRacTest
Started at Wed Mar 11 12:21:30 CET 2015
--> UCP Pool wtih FCF sucessfuly initialized !
--> Opening 5 connections to a RAC DB !
getConnection(): NumberOfAvailableConnections: 4 - BorrowedConnectionsCount: 1
   RAC DB: BANKA - Instance Name:bankA_1 - Host: hract21.example.com
getConnection(): NumberOfAvailableConnections: 3 - BorrowedConnectionsCount: 2
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
getConnection(): NumberOfAvailableConnections: 2 - BorrowedConnectionsCount: 3
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
getConnection(): NumberOfAvailableConnections: 1 - BorrowedConnectionsCount: 4
   RAC DB: BANKA - Instance Name:bankA_1 - Host: hract21.example.com
getConnection(): NumberOfAvailableConnections: 0 - BorrowedConnectionsCount: 5
   RAC DB: BANKA - Instance Name:bankA_1 - Host: hract21.example.com
--> Closing all opend  RAC connections !
--> Pool stats After Closing all opened connection: NumberOfAvailableConnections: 5 - BorrowedConnectionsCount: 0

--> Shutdown an Instance and press <CR>:

Now shutdown instance bankA_1 
[oracle@hract21 ~]$  srvctl stop instance -db banka -i bankA_1 -o abort
[oracle@hract21 ~]$  srvctl status database  -db banka
Instance bankA_2 is running on node hract22
Instance bankA_1 is not running on node hract21
and continue program by pressing <CR> :
--> Opening 5 connections to a RAC DB !
getConnection(): NumberOfAvailableConnections: 4 - BorrowedConnectionsCount: 1
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
getConnection(): NumberOfAvailableConnections: 3 - BorrowedConnectionsCount: 2
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
getConnection(): NumberOfAvailableConnections: 2 - BorrowedConnectionsCount: 3
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
getConnection(): NumberOfAvailableConnections: 1 - BorrowedConnectionsCount: 4
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
getConnection(): NumberOfAvailableConnections: 0 - BorrowedConnectionsCount: 5
   RAC DB: BANKA - Instance Name:bankA_2 - Host: hract22.example.com
--> Closing all opend  RAC connections !
--> Pool stats After Closing all opened connection: NumberOfAvailableConnections: 5 - BorrowedConnectionsCount: 0
Ended at Wed Mar 11 12:23:43 CET 2015

 

Test Summary

  • All connections are reconnected to  instance bankA_2 after shutdown of instance  bankA_1 immediately
  •   The failed RAC system generates a DOWN FAN EVENT and send a message to our UCP pool
  •    Note we see no errors when we reusing the staled connections from the UCP pool
  •    The RAC instance has notified the UCP Pool via ONS to close and reconnect the stale connections pointing to hract21

Minimize Memory Footprint of running RAC instances to reduce OS Paging/Swapping

  • Tested version : RAC 12.1.0.2 / RAC 12.2.0.1
  • In test environments you very often my have memory resource problems and you need to reduce paging 
  • Paging/Swapping is the worse case scenario – so we need to reduce PGA/SGA size first

Using AMM

Reduce memory parameters on system 1
SQL> ALTER SYSTEM SET SGA_TARGET = 600m scope=spfile;
SQL> ALTER SYSTEM SET  pga_aggregate_target=350m scope=spfile;
SQL> startup force

After db reboot verify memory consumption  with top  
[root@ractw21 ~]# top
top - 12:06:43 up  1:50,  5 users,  load average: 2.19, 4.18, 3.50
Tasks: 471 total,   2 running, 469 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.5 us,  1.5 sy,  0.0 ni, 93.8 id,  1.0 wa,  0.1 hi,  0.3 si,  0.0 st
KiB Mem :  3781616 total,    82784 free,  2190716 used,  1508116 buff/cache
KiB Swap:  8257532 total,  6943376 free,  1314156 used.   677496 avail Mem
--> After reboot the system1 uses about 2.2 GByte memory
    paging/swappinig is low :  1.0 wa           

Now change these parameters on the complete cluster
SQL> ALTER SYSTEM SET SGA_TARGET = 600m scope=spfile sid='*';
SQL> ALTER SYSTEM SET  pga_aggregate_target=350m scope=spfile  sid='*';

Restart database
[oracle@ractw21 ~]$ svrctl stop database -db ractw
[oracle@ractw21 ~]$ srvctl stop database -db ractw

[root@ractw22 ~]# top
top - 12:17:17 up  2:01,  2 users,  load average: 17.49, 14.26, 9.98
Tasks: 499 total,   1 running, 497 sleeping,   0 stopped,   1 zombie
%Cpu(s):  2.8 us,  2.1 sy,  0.0 ni, 94.5 id,  0.3 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :  3781616 total,    83492 free,  1877612 used,  1820512 buff/cache
KiB Swap:  8257532 total,  5851312 free,  2406220 used.   902384 avail Mem
--> After reboot the system2 uses about 1.87 GByte memory
    paging/swappinig is low :  0.3 wa

Using memory_target

  •  Note:  avoid using memory_target  on prodcution systems
Monitor OS resources :
top - 10:06:57 up  1:54, 10 users,  load average: 2.89, 5.43, 5.05
Tasks: 490 total,   3 running, 487 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.5%us,  7.9%sy,  0.0%ni, 17.7%id,  63.1%wa,  0.0%hi,  0.9%si,  0.0%st
Mem:   3785860k total,  2866768k used,   919092k free,    30704k buffers
Swap:  4063228k total,   965472k used,  3097756k free,  1315228k cached

--> if wa values are very high for many top samples you may reduce your SGA/PGA footprint 
    to reduce OS paging/swapping.  

Verify that at least 2 instances are up and running 
Note changing values on a single instance first will allow us to easily change back these values in 
case the instance doesn't startup after reboot !

[oracle@hract21 ~]$ srvctl  status database -d bankA
Instance bankA_1 is running on node hract21
Instance bankA_3 is running on node hract22

Test your new memory settings on your local instance  
SQL> ALTER SYSTEM SET MEMORY_TARGET = 400M  scope=spfile ;
SQL> ALTER SYSTEM SET SGA_TARGET = 0  scope=spfile;
SQL> ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 0  scope=spfile;
SQL> startup force
ORACLE instance started.
Total System Global Area  419430400 bytes
Fixed Size            2925120 bytes
Variable Size          335547840 bytes
Database Buffers       75497472 bytes

After single instance reboot works change parameter globally and restart database 
SQL> ALTER SYSTEM SET MEMORY_TARGET = 400M  scope=spfile sid='*' ;
SQL> ALTER SYSTEM SET SGA_TARGET = 0  scope=spfile sid='*';
SQL> ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 0  scope=spfile sid='*';

Restart database :
$ srvctl stop database -d bankA
$ srvctl start database -d bankA
$ srvctl  status database -d bankA
Instance bankA_1 is running on node hract22
Instance bankA_3 is running on node hract21

Reference

 

JAVA: Automatic Resource Management (ARM)

Overview

  • The try-with-resources statement is a try statement that declares one or more resources.
  • A resource is an object that must be closed after the program is finished with it.
  • The try-with-resources statement ensures that each resource is closed at the end of the statement.
  • Any object that implements java.lang.AutoCloseable, which includes all objects which implement java.io.Closeable,  can be used as a resource.
  • try-with-resources, can have catch and finally. They work as usual and no change in it.
  • ARM returns the  Suppressed Exception details  thrown by close() statement without any add. coding

Sample Code

 
DirtyResource.java :
public class DirtyResource implements AutoCloseable
{
 /*
 *    Need to call this method if you want to access this resource
 *    @throws RuntimeException no matter how you call this method
 */
    public void accessResource()
    {
        throw new RuntimeException("I wanted to access this resource. Bad luck. Its dirty resource !!!");
    }
 
 /*
 *    The overridden closure method from AutoCloseable interface
 *    @throws Exception which is thrown during closure of this dirty resource
 */
    @Override
    public void close() throws Exception
    {
        throw new NullPointerException("Remember me. I am your worst nightmare !! I am Null pointer exception !!");
    }
}

SuppressedExceptionDemoWithTryWithResource.java
public class SuppressedExceptionDemoWithTryWithResource
{
 /*
 *     Demonstrating suppressed exceptions using try-with-resources
 */
   public static void main(String[] arguments) throws Exception
   {
      try (DirtyResource resource= new DirtyResource())
      {
          resource.accessResource();
      } catch ( Exception e1 ) 
    {
       throw e1;
    }
   }
}

Exception printout

[oracle@wls1 ARM]$ java SuppressedExceptionDemoWithTryWithResource
Exception in thread "main" java.lang.RuntimeException: I wanted to access this resource. Bad luck. Its dirty resource !!!
    at DirtyResource.accessResource(DirtyResource.java:9)
    at SuppressedExceptionDemoWithTryWithResource.main(SuppressedExceptionDemoWithTryWithResource.java:10)
    Suppressed: java.lang.NullPointerException: Remember me. I am your worst nightmare !! I am Null pointer exception !!
        at DirtyResource.close(DirtyResource.java:19)
        at SuppressedExceptionDemoWithTryWithResource.main(SuppressedExceptionDemoWithTryWithResource.java:11

  • Here we can easily indentify tha we are failing in ARM close() in DirtyResource.close

Reference

Setup DNS, NTP and DHCP for a mixed RAC/Internet usage

Note : 

You need to install your RAC Nameserver on a separate Virtualbox image/system as a NON-functional Nameserver can lead to a RAC hang scenario !!

Install BIND / DHCP RPMs and learn the needed configuration commands

Install and verify BIND installation [ bind_libs and bind_utils should be arlready installed ] 
[root@hract21 Desktop]#  yum install bind bind-utils bind-libs
[root@hract21 Desktop]# rpm -qa |grep '^bind'
bind-utils-9.8.2-0.30.rc1.el6_6.1.x86_64
bind-libs-9.8.2-0.30.rc1.el6_6.1.x86_64
bind-9.8.2-0.30.rc1.el6_6.1.x86_64

Install and verify DHCP setup 
Download and install dcping utility;
Download location:  http://pkgs.repoforge.org/dhcping  following package :
    dhcping-1.2-2.2.el6.rf.x86_64.rpm  11-Nov-2010 07:31   16K  RHEL6 and CentOS-6 x86 64bit
[root@ns1 ~]# rpm -i Downloads/dhcping-1.2-2.2.el6.rf.x86_64.rpm
 
[root@hract21 Desktop]# yum install dhcp.x86_64 
Total download size: 1.2 M
Is this ok [y/N]: y
Downloading Packages:
(1/3): dhclient-4.1.1-43.P1.0.1.el6_6.1.x86_64.rpm                                                           | 318 kB     00:00     
(2/3): dhcp-4.1.1-43.P1.0.1.el6_6.1.x86_64.rpm                                                               | 819 kB     00:00     
(3/3): dhcp-common-4.1.1-43.P1.0.1.el6_6.1.x86_64.rpm                                                        | 142 kB     00:00  

[root@hract21 Desktop]#  rpm -qa | grep -i DHCP
dhcp-common-4.1.1-43.P1.0.1.el6_6.1.x86_64
dhcp-4.1.1-43.P1.0.1.el6_6.1.x86_64

Setup Files needed: 
: /etc/named.conf
: /var/named/example.com.db
: /var/named/192.168.2.db
: /var/named/192.168.5.db
: /etc/dhcp/dhcpd.conf
: /etc/sysconfig/dhcpd  
: /etc/dhcp/dhcpd.conf
--> For details how to configure DNS/DHCP please read the details the chapters below. 

Setup,test and configure BIND service 
# service named restart 
# nslookup google.de
Server:        192.168.5.50
Address:    192.168.5.50#53

Non-authoritative answer:
Name:    google.de
Address: 173.194.112.152
Name:    google.de
Address: 173.194.112.159
Name:    google.de
Address: 173.194.112.143
Name:    google.de
Address: 173.194.112.151
#  chkconfig named on chkconfig named --list
named              0:off    1:off    2:on    3:on    4:on    5:on    6:off

Setup,test and configure DHCP service 
# service dhcpd start
Starting dhcpd:                                            [  OK  ]
# chkconfig  dhcpd on
# chkconfig  dhcpd --list
dhcpd              0:off    1:off    2:on    3:on    4:on    5:on    6:off
Verify DHCP setup with  dhcping
[root@hract21 Desktop]#  dhcping -s 192.168.5.50 -c 192.168.5.197 
Got answer from: 192.168.5.50

DNS Server Setup

Our DNS server should have configured the Virtualbox Network Devices 
eth0  -> Bridged Network  : inet addr:192.168.1.XXX  Bcast:192.168.1.255  [ Internet Access ]
eth1  -> Internal Network : inet addr:192.168.5.50   Bcast:192.168.5.255  [ Public RAC Interface ]

eth0      Link encap:Ethernet  HWaddr 08:00:27:E6:71:54  
          inet addr:192.168.1.X  Bcast:192.168.1.255  Mask:255.255.255.0

eth1      Link encap:Ethernet  HWaddr 08:00:27:8D:8A:93  
          inet addr:192.168.5.50  Bcast:192.168.5.255  Mask:255.255.255.0   

Setup files used by  DNS : 
  /etc/named.conf  
  /var/named/example.com.db 
  /var/named/192.168.2.db
  /var/named/192.168.5.db


/etc/named.conf :
options {
    listen-on port 53 {  192.168.5.50; 127.0.0.1; };
    directory     "/var/named";
    dump-file     "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
    allow-query     {  any; };
    allow-recursion     {  any; };
    recursion yes;
    dnssec-enable no;
    dnssec-validation no;
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "." IN {
    type hint;
    file "named.ca";
};
zone    "5.168.192.in-addr.arpa" IN { // Reverse zone
    type master;
    file "192.168.5.db";
        allow-transfer { any; };
    allow-update { none; };
};
zone    "2.168.192.in-addr.arpa" IN { // Reverse zone
    type master;
    file "192.168.2.db";
        allow-transfer { any; };
    allow-update { none; };
};
zone  "example.com" IN {
      type master;
       notify no;
       file "example.com.db";
};

/var/named/example.com.db: 
$TTL 1H         ; Time to live
$ORIGIN example.com.
@       IN      SOA     ns1.example.com.  hostmaster.example.com.  (
                        2009011202      ; serial (todays date + todays serial #)
                        3H              ; refresh 3 hours
                        1H              ; retry 1 hour
                        1W              ; expire 1 week
                        1D )            ; minimum 24 hour
;
             IN     NS        ns1  ; name server for example.com
ns1          IN     A        192.168.5.50
grac41       IN     A        192.168.5.101  
grac42       IN     A        192.168.5.102  
grac43       IN     A        192.168.5.103  
grac41int    IN     A        192.168.2.101  
grac42int    IN     A        192.168.2.102  
grac43int    IN     A        192.168.2.103 
;
$ORIGIN grid4.example.com.
@       IN          NS        gns4.grid4.example.com. ; NS  grid4.example.com
        IN          NS        ns1.example.com.      ; NS example.com
gns4    IN          A         192.168.5.54 ; glue record



/var/named/192.168.5.db :
$TTL 1H
@       IN      SOA     ns1.example.com.  root.domin.com.  (
                        2009011201      ; serial (todays date + todays serial #)
                        3H              ; refresh 3 hours
                        1H              ; retry 1 hour
                        1W              ; expire 1 week
                        1D )            ; minimum 24 hour
      IN    NS    ns1
ns1     IN       A      192.168.5.50
;
50            PTR       ns1.example.com.
54            PTR       gns4.grid4.example.com. ; reverse mapping for GNS
101           PTR       grac41.example.com. 
102           PTR       grac42.example.com. 
103           PTR       grac43.example.com. 
201           PTR       wls1.example.com. 

/var/named/192.168.2.db :
$TTL 1H
@       IN      SOA     ns1.example.com. hostmaster.example.com.  (
                        2009011201      ; serial (todays date + todays serial #)
                        3H              ; refresh 3 hours
                        1H              ; retry 1 hour
                        1W              ; expire 1 week
                        1D )            ; minimum 24 hour
        IN      NS      ns1
ns1     IN       A         192.168.5.50
; 
101          PTR       grac41int.example.com. 
102          PTR       grac42int.example.com. 
103          PTR       grac43int.example.com.


Verify zone files and restart named deamon
[root@ns1 named]#  named-checkconf /etc/named.conf
[root@ns1 named]#  named-checkzone example.com example.com.db
zone example.com/IN: grid.example.com/NS 'gns.grid.example.com' (out of zone) has no addresses records (A or AAAA)
zone example.com/IN: grid12c.example.com/NS 'gns12c.grid12c.example.com' (out of zone) has no addresses records (A or AAAA)
zone example.com/IN: grid2.example.com/NS 'gns2.grid2.example.com' (out of zone) has no addresses records (A or AAAA)
zone example.com/IN: grid3.example.com/NS 'gns3.grid3.example.com' (out of zone) has no addresses records (A or AAAA)
zone example.com/IN: grid4.example.com/NS 'gns4.grid4.example.com' (out of zone) has no addresses records (A or AAAA)
zone example.com/IN: loaded serial 2009011202
OK
[root@ns1 named]# named-checkzone example.com  192.168.5.db
zone example.com/IN: loaded serial 2009011201
OK
[root@ns1 named]# named-checkzone example.com  192.168.2.db
zone example.com/IN: loaded serial 2009011201
OK

Verify DNS Setup

[root@ns1 ~]# nslookup google.de
Server:        192.168.5.50
Address:    192.168.5.50#53

Non-authoritative answer:
Name:    google.de
Address: 173.194.67.94

[root@ns1 ~]# nslookup grac41 
Server:        192.168.5.50
Address:    192.168.5.50#53

Name:    grac41.example.com
Address: 192.168.5.101

[root@ns1 ~]# ping -c 2  google.de
PING google.de (173.194.67.94) 56(84) bytes of data.
64 bytes from wi-in-f94.1e100.net (173.194.67.94): icmp_seq=1 ttl=38 time=66.3 ms
64 bytes from wi-in-f94.1e100.net (173.194.67.94): icmp_seq=2 ttl=38 time=62.3 ms
--- google.de ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1064ms
rtt min/avg/max/mdev = 62.373/64.344/66.316/1.987 ms

[root@ns1 ~]# ping -c 2  grac41 
PING grac41.example.com (192.168.5.101) 56(84) bytes of data.
64 bytes from grac41.example.com (192.168.5.101): icmp_seq=1 ttl=64 time=0.200 ms
 64 bytes from grac41.example.com (192.168.5.101): icmp_seq=2 ttl=64 time=0.293 ms
--- grac41.example.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.200/0.246/0.293/0.049 ms

[root@ns1 ~]#  cat /etc/resolv.conf
# Generated by NetworkManager
search example.com grid4.example.com
nameserver 192.168.5.50
[root@ns1 ~]#  netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth0
192.168.3.0     0.0.0.0         255.255.255.0   U         0 0          0 eth2
192.168.5.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1

If the GNS server is running the following commands should work too !
[root@ns1 ~]# nslookup grac4-scan
Server:        192.168.5.50
Address:    192.168.5.50#53

Non-authoritative answer:
Name:    grac4-scan.grid4.example.com
Address: 192.168.5.167
Name:    grac4-scan.grid4.example.com
Address: 192.168.5.156
Name:    grac4-scan.grid4.example.com
Address: 192.168.5.153

[root@ns1 ~]# ping -c 2  grac4-scan
PING grac4-scan.grid4.example.com (192.168.5.167) 56(84) bytes of data.
64 bytes from 192.168.5.167: icmp_seq=1 ttl=64 time=0.176 ms
64 bytes from 192.168.5.167: icmp_seq=2 ttl=64 time=0.203 ms
--- grac4-scan.grid4.example.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.176/0.189/0.203/0.019 ms
[root@ns1 ~]# dig @192.168.5.50 grac4-scan.grid4.example.com
; <<>> DiG 9.9.3-RedHat-9.9.3-P1.el6 <<>> @192.168.5.50 grac4-scan.grid4.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18529
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 2, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;grac4-scan.grid4.example.com.    IN    A

;; ANSWER SECTION:
grac4-scan.grid4.example.com. 94 IN    A    192.168.5.167
grac4-scan.grid4.example.com. 94 IN    A    192.168.5.156
grac4-scan.grid4.example.com. 94 IN    A    192.168.5.153

;; AUTHORITY SECTION:
grid4.example.com.    3600    IN    NS    gns4.grid4.example.com.
grid4.example.com.    3600    IN    NS    ns1.example.com.

;; ADDITIONAL SECTION:
ns1.example.com.    3600    IN    A    192.168.5.50

;; Query time: 1 msec
;; SERVER: 192.168.5.50#53(192.168.5.50)
;; WHEN: Sun Jan 11 17:17:51 CET 2015
;; MSG SIZE  rcvd: 158

[root@ns1 ~]#  dig @192.168.5.54 grac4-scan.grid4.example.com
; <<>> DiG 9.9.3-RedHat-9.9.3-P1.el6 <<>> @192.168.5.54 grac4-scan.grid4.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5071
;; flags: qr aa; QUERY: 1, ANSWER: 3, AUTHORITY: 1, ADDITIONAL: 1

;; QUESTION SECTION:
;grac4-scan.grid4.example.com.    IN    A

;; ANSWER SECTION:
grac4-scan.grid4.example.com. 120 IN    A    192.168.5.153
grac4-scan.grid4.example.com. 120 IN    A    192.168.5.156
grac4-scan.grid4.example.com. 120 IN    A    192.168.5.167

;; AUTHORITY SECTION:
grid4.example.com.    10800    IN    SOA    grac4-gns-vip.grid4.example.com. grac4-gns-vip.grid4.example.com. 264601876 10800 10800 30 120

;; ADDITIONAL SECTION:
grac4-gns-vip.grid4.example.com. 10800 IN A    192.168.5.54

;; Query time: 2 msec
;; SERVER: 192.168.5.54#53(192.168.5.54)
;; WHEN: Sun Jan 11 17:17:59 CET 2015
;; MSG SIZE  rcvd: 160

If GNS is not configured or running you will get error:  can't find grac4-scan: NXDOMAIN
[grid@grac41 ~]$  srvctl stop gns
[root@ns1 ~]# ping 192.168.5.54
PING 192.168.5.54 (192.168.5.54) 56(84) bytes of data.
From 192.168.5.50 icmp_seq=2 Destination Host Unreachable
From 192.168.5.50 icmp_seq=3 Destination Host Unreachable
From 192.168.5.50 icmp_seq=4 Destination Host Unreachable
^C
--- 192.168.5.54 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3944ms
pipe 3
[root@ns1 ~]#  nslookup grac4-scan
Server:        192.168.5.50
Address:    192.168.5.50#53

** server can't find grac4-scan: NXDOMAIN

Verify subdomain delegation with cluvfy

Starting with Oracle Database 11g release 2 (11.2.0.2), use the cluvfy comp dns component verification 
command to verify that the Grid Naming Service (GNS) subdomain delegation has been properly set up in 
the Domain Name Service (DNS) server.

Run cluvfy comp dns -server on one node of the cluster. On each node of the cluster run 
cluvfy comp dns -client to verify the DNS server setup for the cluster.

Oh grac41: 
[root@grac41 ~]# cluvfy comp dns -server -domain  grid4.example.com -vipaddress 192.168.5.54/255.255.255.0/eth1 -verbose
Verifying DNS Check 
Starting the test DNS server on IP "192.168.5.54/255.255.255.0/eth1" listening on port 53
Started the IP address "192.168.5.54/255.255.255.0/eth1" on node "grac41"

On grac42: 
[root@grac42 ~]#  cluvfy comp dns -client -domain  grid4.example.com -vip 192.168.5.54
Verifying DNS Check 
Checking if the IP address "192.168.5.54" is reachable
The IP address "192.168.5.54" is reachable from local node
Successfully connected to test DNS server
Checking if the test DNS server started on address "192.168.5.54", listening on port 53 can be queried
Check output of command "cluvfy comp dns -server" to see if it received IP address lookup for name "grac42.grid4.example.com"
Successfully connected to the test DNS server started on address "192.168.5.54", listening on port 53
Checking DNS delegation for the GNS subdomain "grid4.example.com"...
Check output of command "cluvfy comp dns -server" to see if it received IP address lookup for name "grac42.grid4.example.com"
Successfully verified DNS delegation of the GNS subdomain "grid4.example.com"

Verification of DNS Check was successful. 

--> Server should report 
Received IP address lookup query for name "grac42.grid4.example.com"
Received IP address lookup query for name "grac42.grid4.example.com"

On grac43:
[root@grac43 ~]# cluvfy comp dns -client -domain  grid4.example.com -vip 192.168.5.54
..
Verification of DNS Check was successful. 
--> Server should report 
Received IP address lookup query for name "grac43.grid4.example.com"
Received IP address lookup query for name "grac43.grid4.example.com"

On grac41 
[root@grac41 Desktop]#  cluvfy comp dns -client -domain  grid4.example.com -vip 192.168.5.54 
..
Verification of DNS Check was successful. 
--> Server should report 
Received IP address lookup query for name "grac41.grid4.example.com"
Received IP address lookup query for name "grac41.grid4.example.com"

 

Setup DHCP server

DHCP configuration file 
/etc/dhcp/dhcpd.conf :
ddns-update-style interim;
ignore client-updates;

subnet 192.168.5.0 netmask 255.255.255.0 {
        option routers                  192.168.5.1;                    # Default gateway to be used by DHCP clients
        option subnet-mask              255.255.255.0;                  # Default subnet mask to be used by DHCP clients.
        option ip-forwarding            off;                            # Do not forward DHCP requests.
        option broadcast-address        192.168.5.255;                  # Default broadcast address to be used by DHCP client.
        option domain-name-servers      192.168.5.50;                   # IP address of the DNS server. 
        option time-offset              -19000;                           # Central Standard Time
        option ntp-servers              192.168.5.50;                   # Default NTP server to be used by DHCP clients
        range                           192.168.5.150 192.168.5.254;    # Range of IP addresses that can be issued to DHCP client
        default-lease-time              21600;                            # Amount of time in seconds that a client may keep the IP address
        max-lease-time                  43200;
} 

/etc/sysconfig/dhcpd
# Command line options here
DHCPDARGS="eth1"

Restart the DHCP server :
[root@ns1 network-scripts]# service dhcpd restart

 

Verify  DHCP setup with cluvfy

[root@grac41 ~]#  $GRID_HOME/bin/cluvfy comp dhcp -clustername grac4 
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
PRVG-5723 : Network CRS resource is configured to use DHCP provided IP addresses

Verification of DHCP Check was unsuccessful on all the specified nodes. 

From Oracle docu :
- You must run this command as root.
- Do not run this check while the default network Oracle Clusterware resource, configured to use a 
   DHCP-provided IP address, is online (because the VIPs get released and, since the cluster is online, 
   DHCP has provided IP, so there is no need to double the load on the DHCP server).
- Before running this command, ensure that the network resource is offline. Use the srvctl stop nodeapps command 
   to bring the network resource offline, if necessary.

As we are on a test cluster go ahead and stop the Nodeapps 
[root@grac41 Desktop]#  srvctl stop nodeapps -f

[root@grac41 ~]# $GRID_HOME/bin/cluvfy comp dhcp -clustername grac4 -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
CRS-10009: DHCP server returned server: 192.168.5.50, loan address : 192.168.5.165/255.255.255.0, lease time: 21600

At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "grac4-scan1-vip"
CRS-10009: DHCP server returned server: 192.168.5.50, loan address : 192.168.5.165/255.255.255.0, lease time: 21600
...
CRS-10012: released DHCP server lease for client ID grac4-scan3-vip on port 67
CRS-10012: released DHCP server lease for client ID grac4-grac41-vip on port 67

DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was successful. 

Note you can track  the lease operation with following OS command 
[root@ns1 ~]# tail -f  /var/lib/dhcpd/dhcpd.leases
}
lease 192.168.5.164 {
  starts 0 2015/01/11 17:29:10;
  ends 0 2015/01/11 17:29:10;
  tstp 0 2015/01/11 17:29:10;
  cltt 0 2015/01/11 17:29:10;
  binding state free;
  hardware ethernet 00:00:00:00:00:00;
  uid "\000grac4-grac41-vip";
}

 

Configure NTP


Configuration script :
/etc/ntp.conf
restrict default nomodify notrap noquery
restrict 127.0.0.1 
# -- CLIENT NETWORK -------
restrict 192.168.5.0 mask 255.255.255.0 nomodify notrap
# --- OUR TIMESERVERS -----  can't reach NTP servers - build my own server 
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 127.127.1.0
# --- NTP MULTICASTCLIENT ---
# --- GENERAL CONFIGURATION ---
# Undisciplined Local Clock.
fudge   127.127.1.0 stratum 9
# Drift file.
driftfile /var/lib/ntp/drift
broadcastdelay  0.008
# Keys file.
keys /etc/ntp/keys

Restart NTP daemon
[root@ns1 network-scripts]# service ntpd restart
Shutting down ntpd:                                        [  OK  ]
Starting ntpd:                                             [  OK  ]

Verify setup
[root@ns1 network-scripts]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 foxtrot.zq1.de  122.227.206.195  3 u    2   64    1   68.504  4608.38   1.115
 der.beste.tiger 159.173.11.127   3 u    1   64    1   38.195  4603.43  11.063
 LOCAL(0)        .LOCL.           9 l    2   64    1    0.000    0.000   0.000

 

Verify NTP setup with cluvfy

Verify NTP setup with cluvfy 
[grid@grac41 ~]$   cluvfy comp clocksync
Verifying Clock Synchronization across the cluster nodes 
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed
Checking daemon liveness...
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
NTP daemon slewing option check passed
NTP daemon's boot time configuration check for slewing option passed
NTP common Time Server Check started...
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Clock time offset check passed
Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful.

Reference

Pitfalls changing Public IP address in a RAC cluster env with detailed debugging steps

Overview

  • Changing the PUBLIC interface in a RAC env is not that simple and you need to take into account
    • Nameserver changes
    • DHCP server changes including VIPs
    • /etc/hosts changes
    • GNS VIP changes
    • PUBLIC interface changes
      #  oifcfg getif  ->  eth1  192.168.5.0  global  public
  • In any case you should read : How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)

If you still get problem the here some debugging details:

  • Note this tutorial use 12.1.0.2 CW logfiles structure which simplifies using grep command
    a lot as all traces can be found at:  $GRID_HOME/diag/crs/hract21/crs/trace
  • Download script crsi and run this script during booting you CRS stack with watch utility
    This gives you a good idea what component is failing or gets restarted and finally switch
    to status OFFLINE
  • As said again and again cluvfy is your friend to quickly identify the root problem
  • If the network adapter  info in profile.xml doesn’t match the ifconfig data GIPCD will not start ( This is true for PUBLIC and CLUSTERINTERCONNECT info )

In this tutorial we will debug following scenarios by reading logfiles, running OS command and by running cluvfy:

  • Case I   : Nameserver not responding –  GIPCD not starting
  • Case II  : Different  IP address in /etc/hosts and NameServer Lookup  – GIPCD not starting
  • Case III : Wrong Cluster Interconnect Address – GIPCD not starting
  • Case IV  : DHCP server sends wrong IP address – VIPs not starting
  • Case V   : Wrong GNS VIP address – GNS not starting

Potential Errors and Error types

In generell we have  2 types of Network related error

  • OS related errors ( either bind() or getaddrinfo() system call was failing )

    • If you you want to find an GIPCD related errors around between 2015-02-03 12:00:00 and 2015-02-03 12:09:50  you may run :     $ grep “2015-02-03 12:0″ *  | grep ” slos “
    • In this tutorial we handle bind()  OS system calls but you may check your traces for:
      send(),recv(), listen() and  connect() system call failures too !
    • Note – Only GIPCD errors prints OS errors with slos printout like :  slos loc :  getaddrinfo
    • For other components like MDNSD daemon  you may grep your CW traces
      for error strings: “Address already in use” , “Error Connection timed out”, “Cannot assign requested address”
  • Logical Errors
    • Are not easy to debug as we need to read and understand the CW logs more in detail.

Error Details

Error I :  Name Server related Errors – getaddrinfo () was failing

 OS system call:  getaddrinfo() is failing with errno 110:   Error Connection timed out (110)
 --> see Case I
 Search all CW traces with TS 2015-02-03 09:20:00 --> 2015-02-03 09:29:59" for failed OS Call: getaddrinfo
 [grid@hract21 trace]$  grep "2015-02-03 09:2" *  | grep " getaddrinfo"
 gipcd_2.trc:2015-02-03 09:20:09.946273 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo(
 gipcd_2.trc:2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo

Error II : bind() fails  as the local IP address is not avaiable on your system  (verify with ifconfig )

OS system call:  bind () is failing with errno 99 : Error: Cannot assign requested address (99)
 --> see Case II,III
 Search all CW traces with TS 2015-02-03 15:30:00 --> 2015-02-03 15:39:59" for failed OS Call: bind
 [grid@hract21 trace]$  grep "2015-02-03 15:3" *  | grep " bind"
 gipcd_2.trc:2015-02-03 15:34:47.898380 :GIPCXCPT:2106038016:  gipcmodNetworkProcessBind: slos loc :  bind
 gipcd_2.trc:2015-02-03 16:39:43.587972 :GIPCXCPT:1288218368:  gipcmodNetworkProcessBind: slos loc :  bind

--> If OS system call:  bind () is failing with errno 98 Error : Address already in use (98)
please read :  
Troubleshooting Clusterware and Clusterware component error : Address already in use

Error III: Logical Errros ( not related OS errors )

  • Wrong DHCP Server response : see Case IV
  • Wrong GNS Server address     : see Case V

Case I:  Nameserver not responding –  GIPCD not starting

[root@hract21 Desktop]#  watch crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> ora.gipcd in state INTERMEDIATE/OFFLINE ora.evmd in state INTERMEDIATE

As GIPCD doesn't come up  review tracefile :  gipcd.trc
2015-02-03 09:20:14.952363 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos op  :  sgipcnPopulateAddrInfo
2015-02-03 09:20:14.952373 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos dep :  Connection timed out (110)
2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos loc :  getaddrinfo(
2015-02-03 09:20:14.952391 :GIPCXCPT:2157598464:  gipcmodNetworkResolve: slos info:  server not available,try again
2015-02-03 09:20:14.952455 :GIPCXCPT:2157598464:  gipcResolveF [gipcInternalBind : gipcInternal.c : 537]: EXCEPTION[ ret gipcretFail (1) ]  failed to resolve address 0x7f035c033c10 [0000000000000311] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x4000
2015-02-03 09:20:14.952486 :GIPCXCPT:2157598464:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretFail (1) ]  failed to bind endp 0x7f035c033070 [000000000000030f] { gipcEndpoint : localAddr 'tcp://hract21.example.com', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x40008000, flags-2 0x0, usrFlags 0x240a0 }, addr 0x7f035c034890 [0000000000000316] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x200a0
2015-02-03 09:20:14.952552 :GIPCXCPT:2157598464:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretFail (1)
--> getaddrinfo() system all is failing -> Nameserver lookup issue

Verify Error with OS commands
[grid@hract21 trace]$  nslookup hract21
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

Verify Error with cluvfy 
[grid@hract21 CLUVFY]$  cluvfy comp nodeapp -n hract21
PRVF-0002 : could not retrieve local node name

Fix -> Verify the Nameserver is up and running 
1) Is your nameserver running ?
[root@ns1 ~]# service named status
version: 9.9.3-RedHat-9.9.3-P1.el6
CPUs found: 4
worker threads: 4
UDP listeners per interface: 4
number of zones: 101
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running
named (pid  9193) is running...

2) Can you ping your nameserver ?
[oracle@hract21 JAVA]$ ping ns1.example.com
PING ns1.example.com (192.168.5.50) 56(84) bytes of data.
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=2 ttl=64 time=0.293 ms

3) Verify that nameserver is listening on required IP/Adress and Port 
[root@ns1 ~]# netstat -auen  | grep ":53 "
udp        0      0 192.168.5.50:53             0.0.0.0:*                               25         56734      
udp        0      0 127.0.0.1:53                0.0.0.0:*                               25         56732  

Case II  : Different  IP address in /etc/hosts and NameServer Lookup – GIPCD not starting

****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE     OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> CSSD and GIPCD remains OFFLINE - switches STATE_DETAILS from STABLE to STARTING but doen't up

gipcd.trc:
2015-02-03 15:35:02.928327 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-02-03 15:35:02.928333 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos dep :  Cannot assign requested address (99)
2015-02-03 15:35:02.928337 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos loc :  bind
2015-02-03 15:35:02.928342 :GIPCXCPT:937420544:  gipcmodNetworkProcessBind: slos info:  addr '192.168.6.121:0'
2015-02-03 15:35:02.928391 :GIPCXCPT:937420544:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  failed to bind endp 0x7f4624027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.6.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7f4624033be0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7f4624033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 15:35:02.928405 :GIPCXCPT:937420544:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928419 :GIPCXCPT:937420544:  gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928429 :GIPCHDEM:937420544:  gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  terminating daemon thread due to exception
2015-02-03 15:35:02.928455 :GIPCXCPT:1281627904:  gipchaInternalRegister: daemon thread state invalid gipchaThreadStateFailed (5), ret gipcretFail (1)
2015-02-03 15:35:02.928477 :GIPCHGEN:1281627904:  gipchaRegisterF [gipchaInternalResolve : gipchaInternal.c : 1204]: EXCEPTION[ ret gipcretFail (1) ]  failed to register ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, name '(null)', flags 0x4000
2015-02-03 15:35:02.928544 :GIPCHGEN:1281627904:  gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 863]: EXCEPTION[ ret gipcretFail (1) ]  failed to resolve ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, host 'hract21', port 'gipcdha_hract21_', flags 0x0
2015-02-03 15:35:02.928569 :GIPCXCPT:1281627904:  gipcInternalResolve: failed to resolve addr 0x7f4638099680 [000000000000016a] { gipcAddress : name 'gipcha://hract21:gipcdha_hract21_', objFlags 0x0, addrFlags 0x4 }, ret gipcretFail (1)
 
Verify Error with OS commands
[grid@hract21 trace]$ nslookup hract21
Server:        192.168.5.50
Address:    192.168.5.50#53
Name:    hract21.example.com
Address: 192.168.5.121

[grid@hract21 trace]$ ping hract21
PING hract21 (192.168.6.121) 56(84) bytes of data.
--> Opps why to different results for nslookup and ping ?
Verify IP address from  /etc/hosts
[grid@hract21 trace]$ grep hract21 /etc/hosts
192.168.6.121 hract21 hract21.example.com

Verify Error with cluvfy  
[grid@hract21 CLUVFY]$ cluvfy comp nodereach -n  hract21
Verifying node reachability 
Checking node reachability...
PRVF-6006 : unable to reach the IP addresses "hract21" from the local node
PRKC-1071 : Nodes "hract21" did not respond to ping in "3" seconds, 
PRKN-1035 : Host "hract21" is unreachable
Verification of node reachability was unsuccessful on all the specified nodes. 

-> Fix : Keep your /etc/hosts and your Bind server in sync 
         When Changing Bind Server always verify the change in /etc/hosts too

 

Case III : Wrong Cluster Interconnect Address – GIPCD not starting

[root@hract21 Desktop]#  watch crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      hract21         STARTING
ora.cssdmonitor                1   ONLINE    ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE    INTERMEDIATE hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE
--> GPNPD remains in status INTERMEDIATE GIPCD is in state OFFLINE

gipcd.trc:
2015-02-03 16:39:18.324221 :GIPCHDEM:20907776:  gipchaDaemonThread: starting daemon thread hctx 0x22d39b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'df31173e-00000000', name2 02ff-37da-c08f-50b4, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xcd60 }
2015-02-03 16:39:23.327691 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc032310 [0000000000000308] { gipcAddress : name 'tcp://192.168.5.121', objFlags 0x0, addrFlags 0x5 }
2015-02-03 16:39:23.327721 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-02-03 16:39:23.327727 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos dep :  Cannot assign requested address (99)
2015-02-03 16:39:23.327732 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos loc :  bind
2015-02-03 16:39:23.327736 :GIPCXCPT:20907776:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.121:0'
2015-02-03 16:39:23.327806 :GIPCXCPT:20907776:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 16:39:23.327823 :GIPCXCPT:20907776:  gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327838 :GIPCXCPT:20907776:  gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327851 :GIPCHDEM:20907776:  gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ]  terminating daemon thread due to exception
2015-02-03 16:39:23.327943 : GIPCNET:20907776:  gipcmodNetworkUnprepare: failed to unprepare waits for endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x8, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x26008000, flags-2 0x0, usrFlags 0x20020 }
--> Here bind system call fails with errno 99 which mean this IP  192.168.5.121 address is not available yet ! 
[root@hract21 Desktop]# cat /usr/include/asm-generic/errno.h | grep 99
#define    EADDRNOTAVAIL    99    /* Cannot assign requested address */

Verify Error with OS commands:
[root@hract21 Desktop]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.6.121  Bcast:192.168.6.255  Mask:255.255.255.0
[root@hract21 Desktop]#  ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 08:00:27:4E:C9:BF  
          inet addr:192.168.2.121  Bcast:192.168.2.255  Mask:255.255.255.0
[root@hract21 Desktop]#   $GRID_HOME/bin/gpnptool get 2>/dev/null  |  xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id'
    <gpnp:HostNetwork id="gen" HostName="*">
      <gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/>
      <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/>
  <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
  <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/>
--> GPnPD expects PUBLIC interface eth1 to be bound on IP Adress 192.168.5.121 and not 192.168.6.121

Verify Error with cluvfy:
[grid@hract21 CLUVFY]$  cluvfy comp gpnp -n hract21
Verifying GPNP integrity 
--> cluvfy comp gpnp hangs 

Fix: Change interface eth1 back to  192.168.5.121 and reboot cluster stack

 

Case IV   :  DHCP server returns wrong IP address – VIPs not starting

  • Multiple DHCP server
  • DHCP server not available
Lower CRS stack starts 
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    ONLINE       hract21         STABLE
ora.cluster_interconnect.haip  1   ONLINE    ONLINE       hract21         STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssdmonitor                1   ONLINE    ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    ONLINE       hract21         OBSERVER,STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    ONLINE       hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    ONLINE       hract21         STABLE
--> Lower CRS stack is up and running 

Vips are in state STARTING 
ora.hract21.vip                1   ONLINE       OFFLINE      hract21         STARTING  
ora.hract22.vip                1   ONLINE       ONLINE       hract22         STABLE  
ora.hract23.vip                1   ONLINE       ONLINE       hract23         STABLE  
ora.mgmtdb                     1   ONLINE       ONLINE       hract23         Open,STABLE  
ora.oc4j                       1   ONLINE       ONLINE       hract22         STABLE  
ora.scan1.vip                  1   ONLINE       OFFLINE      hract21         STARTING 

crsd_orarootagent_root.trc
2015-02-03 12:06:42.065910 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP client id = hract21-vip
2015-02-03 12:06:42.065929 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP Server Port = 67
2015-02-03 12:06:42.065940 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet from = 192.168.5.121
2015-02-03 12:06:42.065949 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet to = 255.255.255.255
2015-02-03 12:06:47.068966 :GIPCXCPT:2822174464:  gipcWaitF [clsdhcp_sendmessage : clsdhcp.c : 616]: 
       EXCEPTION[ ret (uknown) (910) ]  failed to wait on obj 0x7fcb8c04d770 [0000000000000ddf]
      { gipcEndpoint : localAddr 'udp://0.0.0.0:68', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, 
     objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fcb8c037e70, sendp 0x7fcb8c037cb0 status 13flags 0x20000002, flags-2 0x0, usrFlags 0x8000 }, reqList 0x7fcba8364658, nreq 1, creq 0x7fcba8364b20 timeout 5000 ms, flags 0x4000
--> After sending an DHCP request - we fail in  gipcWaitF  which means we have some troubles to contact our DHCP server
    or getting the reqired DHCP address 

Verify Error with OS commands
Download and Install dhcping:
Download location:  http://pkgs.repoforge.org/dhcping  following package : dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# rpm -i  /media/sf_kits/Linux/dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# dhcping -i eth1
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0 
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0
no answer
--> Here we see that we get a wrong DHCP address
[root@ns1 dhcp]# dhcping -h   08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
no answer
--> This confirms that our DHCP server is running on wrong IP addess ( 192.168.3.50 ) and 
    can server an DHCP request for a s 192.168.5.xx address

Working dhcping output - just for reference :
[root@hract21 Desktop]#  dhcping -h   08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
Got answer from: 192.168.5.50

Verify Error with cluvfy  commands
[root@hract21 CLUVFY]#  cluvfy comp dhcp -clustername ract2 -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
PRVG-5726 : Failed to discover DHCP servers on public network listening on port "67" using command "/u01/app/121/grid/bin/crsctl discover dhcp -clientid ract2-scan1-vip "
CRS-10010: unable to discover DHCP server in the network listening on port 67 for client ID ract2-scan1-vip
CRS-4000: Command discover failed, or completed with errors.
PRVF-5704 : No DHCP server were discovered on the public network listening on port 67
Verification of DHCP Check was unsuccessful on all the specified nodes. 

Additonal info about DHCP setup  
- I always look at /etc/dhcpd.conf wich is wrong - use /etc/dhcp/dhcpd.conf file instead !
- Note if changing  /etc/dhcpd.conf you may need change /etc/sysconfig/dhcpd 
DHCP config files: 
/etc/dhcp/dhcpd.conf 
/etc/sysconfig/dhcpd

 

Case V   : Wrong GNS VIP address – GNS not starting

[root@hract21 network-scripts]#  watch 'crs | grep gns'
ora.gns                        1   ONLINE       OFFLINE      -               STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABLE
-> GNS VIP is ONLINE but GNS doesn't sart 

gnsd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
    CLSB:489064000: Argument count (argc) for this daemon is 7
    CLSB:489064000: Argument 0 is: /u01/app/121/grid/bin/gnsd.bin
    CLSB:489064000: Argument 1 is: -trace-level
    CLSB:489064000: Argument 2 is: 1
    CLSB:489064000: Argument 3 is: -ip-address
    CLSB:489064000: Argument 4 is: 192.168.6.58
    CLSB:489064000: Argument 5 is: -startup-endpoint
    CLSB:489064000: Argument 6 is: ipc://GNS_hract21_4625_9fe54b1833d5fbd2
2015-02-03 17:29:15.339039 :   CLSNS:489064000: main::clsns_SetTraceLevel:trace level set to 1.
2015-02-03 17:29:16.226261 :     GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226283 :     GNS:489064000: main::clsgndmain: GNS starting on hract21. Process ID: 29196
2015-02-03 17:29:16.226299 :     GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226338 :     GNS:489064000: main::clsgnSetTraceLevel: trace level set to 1.
..
2015-02-03 17:29:17.490335 :     GNS:489064000: main::clsgndGetInstanceInfo: version: 12.1.0.2.0 (0xc100200) 
                                 endpoints: tcp://192.168.6.58:63806 process ID: "29196" state: "Initializing".
2015-02-03 17:29:17.491219 :     GNS:489064000: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
2015-02-03 17:29:17.496441 :     GNS:349841152: Resolve::clsgndnsCreateContainerCallback: listening on port 53 address "192.168.6.58"
2015-02-03 17:29:17.499552 :  CLSDMT:351942400: PID for the Process [29196], connkey 12
2015-02-03 17:29:17.505626 :     GNS:343537408: Command #0::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.512072 :     GNS:4160747264: Command #1::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.516675 :     GNS:4156544768: Command #2::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.518326 :     GNS:4154443520: Command #3::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.747693 :     GNS:4152342272: Self-check::clsgndscRun: Name: "GNSTESTHOST.grid12c.example.com" Address: 1.2.3.4.
2015-02-03 17:29:53.882538 :     GNS:351942400: main::clsgndCLSDMExit: CLSDM request to quit received - requester: agent.
2015-02-03 17:29:53.882610 :     GNS:351942400: main::clsgndCLSDMExit: terminating GNSD on behalf of CLSDM - requester: agent.
--> Here we have some troubles as GNS was terminated

crsd_orarootagent_root.trc:
2015-02-03 17:29:24.470729 :   CLSNS:292816640: main::clsnsgFind:(:CLSNS00230:):query to find 
     GNS using service name "_Oracle-GNS._tcp" failed.: 1: clskec:has:CLSNS:5 3 args[has:CLSNS:5][mod=clsns_DNSSD_FindServers][loc=(:CLSNS00152:)]
2015-02-03 17:29:24.470771 :     
     GNS:292816640: main::clsgnctrGetGNSAddressUsingCLSNS: (:CLSGN01053:) GNS address retrieval failed with 
     error CLSNS-00025 (GNS_SERV_FIND_FAIL) - throwing CLSGN-00070. 1: clskec:has:CLSNS:25 3 args[has:CLSNS:25][mod=clsnsgFind][loc=(:CLSNS00216:)]

Verify Error with OS commands:
Check GNS and PUBLIC network interface 
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com
Check the PUBLIC network interface 
[root@hract21 network-scripts]# ifconfig
eth1:1    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.156  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.157  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:3    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.153  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:4    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.151  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:5    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.152  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:6    Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.6.58  Bcast:192.168.6.255  Mask:255.255.255.0
-->  VIPs are using 192.168.5.X as base address whereas our GNS VIP is using: 192.168.6.58
     This is not correct VIPs a GNS VIP should have the same Network address !

[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com

Let's investigate whether somebody changed the GNS base add
[grid@hract21 trace]$ grep clsgndadvAdvertise gnsd.trc
Lets check wether the GNS base address was changed :
2015-02-02 12:32:09.447471 : GNS:3141969472: main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.5.58:46453.
2015-02-03 17:22:00.410829 : GNS:4114409024: main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.5.58:25702.
2015-02-03 17:24:51.165609 : GNS:2221307456: main::clsgndadvAdvertise: 
                              Listening for commands on endpoint(s):tcp://192.168.6.58:27105.
2015-02-03 17:29:17.491219 : GNS:489064000:  main::clsgndadvAdvertise: 
                             Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
--> GNS base address was changed from  192.168.5.58 to 192.168.6.58 ! 

Verify Error with cluvy
[grid@hract21 CLUVFY]$  cluvfy comp gns -postcrsinst  -verbose
Verifying GNS integrity 
Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "grid12c.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
PRVF-5213 : GNS resource configuration check failed
PRCI-1156 : The GNS VIP 192.168.6.58 does not match any of the available subnets 192.168.5.0, 192.168.2.0.
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.6.58" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "grid12c.example.com" are reachable
WARNING: 
PRVF-5218 : "hract21-vip.grid12c.example.com" did not resolve into any IP address
PRVF-5827 : The response time for name lookup for name "hract21-vip.grid12c.example.com" exceeded 15 seconds
Checking status of GNS resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  hract21       no                        yes                     
  hract22       no                        yes                     
  hract23       no                        yes                     
PRVF-5211 : GNS resource is not running on any node of the cluster
Checking status of GNS VIP resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  hract21       yes                       yes                     
  hract22       no                        yes                     
  hract23       no                        yes                     
GNS integrity check failed
Verification of GNS integrity was unsuccessful. 
Checks did not pass for the following node(s):
    hract21
--> Cluvfy is very helpfull here as cluvfy compares the network adresses with the GNS address
    If GNS and network addresses don't match cluvfy throws PRVF-5213, PRCI-1156 error.

Fix -> Change GNS VIP back to the original address  and restart GNS
[root@hract21 network-scripts]# srvctl modify gns -vip 192.168.5.58
[root@hract21 network-scripts]# srvctl config gns 
  GNS is enabled.
  GNS VIP addresses: 192.168.5.58
  Domain served by GNS: grid12c.example.com
[root@hract21 network-scripts]# srvctl start gns
[root@hract21 network-scripts]# srvctl config gns -a -l
  GNS is enabled.
  GNS is listening for DNS server requests on port 53
  GNS is using port 5353 to connect to mDNS
  GNS status: OK
  Domain served by GNS: grid12c.example.com
  GNS version: 12.1.0.2.0
  Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12
  Name of the cluster where GNS is running: ract2
  Cluster type: server.
  GNS log level: 1.
  GNS listening addresses: tcp://192.168.5.58:30218.
  GNS is individually enabled on nodes: 
  GNS is individually disabled on nodes: 

Reference

Recreate GNS 12102

Backup profile.xml and OCR and gather data of current GNS setup

As of 12.1/11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the 
gpnp profile -  please take a backup of profile.xml on all cluster nodes before proceeding, as grid user:

[root@hract21 ~]# cd $GRID_HOME/gpnp/hract21/profiles/peer/
[root@hract21 peer]#  cp profile.xml profile.xml_bk-2-FEB-2015
[root@hract21 peer]#  ocrconfig -local -manualbackup
hract21     2015/02/02 09:04:23     /u01/app/121/grid/cdata/hract21/backup_20150202_090423.olr     0     
hract21     2015/01/30 12:40:51     /u01/app/121/grid/cdata/hract21/backup_20150130_124051.olr     0     
[root@hract21 peer]#  ocrconfig -local -showbackup
hract21     2015/02/02 09:04:23     /u01/app/121/grid/cdata/hract21/backup_20150202_090423.olr     0     
hract21     2015/01/30 12:40:51     /u01/app/121/grid/cdata/hract21/backup_20150130_124051.olr     0  

[root@hract21 peer]# oifcfg getif
eth1  192.168.5.0  global  public
eth2  192.168.2.0  global  cluster_interconnect,asm

[root@hract21 peer]# crsctl status resource ora.gns.vip -f | grep USR_ORA_VIP
USR_ORA_VIP=192.168.5.58

[root@hract21 peer]#  ifconfig eth1 | egrep 'eth|inet addr'
eth1      Link encap:Ethernet  HWaddr 08:00:27:7D:8E:49  
          inet addr:192.168.5.121  Bcast:192.168.5.255  Mask:255.255.255.0
[root@hract21 peer]# ifconfig eth2  | egrep 'eth|inet addr'
eth2      Link encap:Ethernet  HWaddr 08:00:27:4E:C9:BF  
          inet addr:192.168.2.121  Bcast:192.168.2.255  Mask:255.255.255.0
[root@hract21 peer]#  ifconfig eth3   | egrep 'eth|inet addr'
eth3      Link encap:Ethernet  HWaddr 08:00:27:3B:89:BF  
          inet addr:192.168.3.121  Bcast:192.168.3.255  Mask:255.255.255.0

[root@hract21 peer]#  srvctl config gns -a -l
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: grid12c.example.com
GNS version: 12.1.0.2.0
Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12
Name of the cluster where GNS is running: ract2
Cluster type: server.
GNS log level: 1.
GNS listening addresses: tcp://192.168.5.58:39839.
GNS is individually enabled on nodes: 
GNS is individually disabled on nodes: 

Stop resources and recreate  gns, nodeapps

[root@hract21 peer]#  srvctl stop scan_listener 
[root@hract21 peer]#  srvctl stop scan
[root@hract21 peer]#  srvctl stop nodeapps -f
PRCC-1016 : ons was already stopped
PRCR-1005 : Resource ora.ons is already stopped

[root@hract21 peer]#  srvctl stop gns
[root@hract21 Desktop]#  srvctl remove gns 
Remove GNS? (y/[n]) y


[root@hract21 Desktop]# srvctl remove nodeapps
Please confirm that you intend to remove node-level applications on all nodes of the cluster (y/[n]) y
[root@hract21 Desktop]# srvctl  add gns -i 192.168.5.58 -d  grid12c.example.com
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.5.58
Domain served by GNS: grid12c.example.com
[root@hract21 Desktop]# srvctl config gns -list
CLSNS-00005: operation timed out
  CLSNS-00025: unable to locate GNS
    CLSGN-00070: Service location failed.
[root@hract21 Desktop]# srvctl start gns
[root@hract21 Desktop]# srvctl config gns -list
Oracle-GNS A 192.168.5.58 Unique Flags: 0x115
ract2.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 46453 Weight: 0 Priority: 0 Flags: 0x115
ract2.Oracle-GNS TXT CLUSTER_NAME="ract2", CLUSTER_GUID="3d7c30fc9a0eeff3ff12b79970a14c12", NODE_NAME="hract21", SERVER_STATE="RUNNING", VERSION="12.1.0.2.0", DOMAIN="grid12c.example.com" Flags: 0x115
--> No VIPs there  

Recreate Nodeapps
[root@hract21 Desktop]#  srvctl add nodeapps -S 192.168.5.0/255.255.255.0/eth1 
 [root@hract21 Desktop]#  srvctl config gns -list
Oracle-GNS A 192.168.5.58 Unique Flags: 0x115
hract21-vip A 192.168.5.246 Unique Flags: 0x1
hract22-vip A 192.168.5.239 Unique Flags: 0x1
hract23-vip A 192.168.5.244 Unique Flags: 0x1
ract2.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 46453 Weight: 0 Priority: 0 Flags: 0x115
ract2.Oracle-GNS TXT CLUSTER_NAME="ract2", CLUSTER_GUID="3d7c30fc9a0eeff3ff12b79970a14c12", NODE_NAME="hract21", SERVER_STATE="RUNNING", VERSION="12.1.0.2.0", DOMAIN="grid12c.example.com" Flags: 0x115
--> Now VIPs should be ONLINE 
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.hract21.vip                1   ONLINE       ONLINE       hract21         STABLE  
ora.hract22.vip                1   ONLINE       ONLINE       hract22         STABLE  
ora.hract23.vip                1   ONLINE       ONLINE       hract23         STABLE 

Restart SCAN and SCAN Listeners
[root@hract21 Desktop]#  srvctl start scan
--> Now SCANs should be ONLINE
ora.scan1.vip                  1   ONLINE       ONLINE       hract22         STABLE  
ora.scan2.vip                  1   ONLINE       ONLINE       hract23         STABLE  
ora.scan3.vip                  1   ONLINE       ONLINE       hract21         STABLE  

[root@hract21 Desktop]# srvctl start scan_listener
--> Now SCAN_LISTENER should be ONLINE
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.LISTENER_SCAN1.lsnr        1   ONLINE       ONLINE       hract22         STABLE  
ora.LISTENER_SCAN2.lsnr        1   ONLINE       ONLINE       hract23         STABLE  
ora.LISTENER_SCAN3.lsnr        1   ONLINE       ONLINE       hract21         STABLE  

Verify GNS
[root@hract21 Desktop]#   srvctl config gns -list
Oracle-GNS A 192.168.5.58 Unique Flags: 0x115
hract21-vip A 192.168.5.246 Unique Flags: 0x1
hract22-vip A 192.168.5.239 Unique Flags: 0x1
hract23-vip A 192.168.5.244 Unique Flags: 0x1
ract2-scan A 192.168.5.238 Unique Flags: 0x1
ract2-scan A 192.168.5.243 Unique Flags: 0x1
ract2-scan A 192.168.5.245 Unique Flags: 0x1
ract2-scan1-vip A 192.168.5.243 Unique Flags: 0x1
ract2-scan2-vip A 192.168.5.245 Unique Flags: 0x1
ract2-scan3-vip A 192.168.5.238 Unique Flags: 0x1
ract2.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 46453 Weight: 0 Priority: 0 Flags: 0x115
ract2.Oracle-GNS TXT CLUSTER_NAME="ract2", CLUSTER_GUID="3d7c30fc9a0eeff3ff12b79970a14c12", NODE_NAME="hract21", 
   SERVER_STATE="RUNNING", VERSION="12.1.0.2.0", DOMAIN="grid12c.example.com" Flags: 0x115
--> VIPS, SCAN and SCAN VIPS should be ONLINE 
    Congrats you have successfully reconfigured GNS on 12.1.0.2 !

Potential problem : PRCN-2065,PRCN-2067  during recreating nodeapps

Note stopping nodeapps should stop the ONS !
[grid@hract21 trace]$  srvctl stop nodeapps -n hract21 -f
*****  Local Resources: *****
Rescource NAME                 TARGET     STATE           SERVER       STATE_DETAILS                       
-------------------------      ---------- ----------      ------------ ------------------                  
ora.ons                        OFFLINE    OFFLINE         hract21      STABLE   
ora.ons                        ONLINE     ONLINE          hract22      STABLE   
ora.ons                        ONLINE     ONLINE          hract23      STABLE   
[root@hract21 Desktop]# netstat -tapen | egrep '6100|6200'
-> Ons is stopped - port 6100 and 6200 not actice !
Sometimes during my testing  the remote  ONS port was still active after  :
  srvctl stop nodeapps -f
Later if we try to create the nodeapps we get the following error:
[root@hract21 Desktop]#  srvctl add nodeapps -S 192.168.5.0/255.255.255.0/eth1
PRCN-2065 : Ports 6200 are not available on the nodes given
PRCN-2067 : Port 6200 is not available on nodes: hract21,hract22,hract23

Verify TCP prot  status :
[root@hract22 ~]# netstat -taupen | grep 6200
tcp        0      0 :::6200                    ..  LISTEN      501        441704     21856/ons           
tcp        0      0 ::ffff:192.168.5.122:6200  ..  ESTABLISHED 501        67450915   21856/ons           
tcp        0      0 ::ffff:192.168.5.122:6200  ..  ESTABLISHED 501        72457163   21856/ons 
ONS was still running a occupied port 6200. This creates the above error ! 

WA: use the -skip parameter ( for details please read BUG 18317414 ) 
What is this really doing ?
[root@hract21 Desktop]# srvctl add nodeapps -skip -help
    -skip        Skip reachability check of VIP address and port validation for ONS

Now recreate the nodeapps with the skip paramter
[root@hract21 Desktop]#   srvctl add nodeapps  -skip  -S 192.168.5.0/255.255.255.0/eth1
--> Worked !!

Reference

  • Bug 18317414 : LNX64-12.1-INSTALL-SCC:RERUN ROOT.SH FAILED AT ADD NODEAPPS

Troubleshooting Clusterware and Clusterware component error : Address already in use

Generic RAC Portnumber Information

                                                  Default Port   Port Range  Protocol  Used for 
                                                  Number                               CI only? 
Cluster Synchronization Service daemon (CSSD)     42424          Dynamic     UDP       Yes
Oracle Grid Interprocess Communication (GIPCD)    42424          Dynamic     UDP       Yes
Oracle HA Services daemon (OHASD)                 42424          Dynamic     UDP       Yes
Multicast Domain Name Service (MDNSD)              5353          Dynamic     UDP/TCP    No 
Oracle Grid Naming Service (GNSD)                    53          53 (public) UDP/TCP    No
Oracle Notification Services (ONS)                 6100 (local)  Configured  TCP        No
                                                   6200 (remote)   manually
    
Port 42424 :
CSSD  : The Cluster Synchronization Service (CSS) daemon uses a fixed port for node restart 
        advisory messages.This port is used on all interfaces that have broadcast capability. 
        Broadcast  occurs only when a node  eviction restart is imminent.
OHASD : The Oracle High Availability Services (OHAS) daemon starts the Oracle Clusterware 
         stack.
GIPCD : A support daemon that enables Redundant Interconnect Usage.

Port 5353 :
MDNSD : The mDNS process is a background process on Linux and UNIX, and a service on Window, 
        and is necessary  for Grid Plug and Play and GNS.

Port 53: 
GNSD  : The Oracle Grid Naming Service daemon provides a gateway between the cluster mDNS and 
        external DNS servers. 
        The gnsd process performs name resolution within the cluster.

Port 6100/6200 :
ONS   : Port for ONS, used to publish and subscribe service for communicating information about 
        Fast Application Notification (FAN) events. The FAN notification process uses system 
        events that Oracle Database publishes  when cluster servers become unreachable or if 
        network interfaces fail.
        Use srvctl to modify ONS port

Verify port usage at OS level
As GNS runs only on a single node the cluster we need to Relocate GNS first :
[root@hract21 ~]# srvctl relocate gns -n hract21 

[root@hract21 Desktop]#  netstat -taupen |grep ":42424 "
udp        0      0 192.168.2.255:42424         0.0.0.0:*  0          10361774   11545/ohasd.bin     
udp        0      0 230.0.1.0:42424             0.0.0.0:*  0          10361773   11545/ohasd.bin     
udp        0      0 224.0.0.251:42424           0.0.0.0:*  0          10361772   11545/ohasd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*  501        10361732   11764/gipcd.bin     
udp        0      0 230.0.1.0:42424             0.0.0.0:*  501        10361731   11764/gipcd.bin     
udp        0      0 224.0.0.251:42424           0.0.0.0:*  501        10361730   11764/gipcd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*  501        10361722   11825/ocssd.bin     
udp        0      0 230.0.1.0:42424             0.0.0.0:*  501        10361721   11825/ocssd.bin     
udp        0      0 224.0.0.251:42424           0.0.0.0:*  501        10361720   11825/ocssd.bin 

[root@hract21 Desktop]# netstat -taupen |grep ":53 "
udp        0      0 192.168.5.58:53             0.0.0.0:*   0          46593880   5261/gnsd.bin  

[root@hract21 Desktop]#  netstat -taupen |grep ":5353 "
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378331    11724/mdnsd.bin     
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378210    11724/mdnsd.bin     
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378209    11724/mdnsd.bin     
udp        0      0 0.0.0.0:5353            0.0.0.0:*  501        1378208    11724/mdnsd.bin 

[root@hract21 Desktop]#  netstat -taupen |grep ":6100 "
tcp        0      0 127.0.0.1:6100     0.0.0.0:*     LISTEN  501  10419706   31762/ons    
..

 

Prepare Test program JavaUDPServer.java

Source can be found here : Simple Java example of UDP Client/Server communication

[root@hract21 JAVA]#  javac JavaUDPServer.java

Testing when a port is free and our program can successful listen to that port: 
[root@hract21 JAVA]# java  JavaUDPServer 59
Listening on UDP Port: 59
--> press <cntrl>C to terminate the program

Testing program  when port is already in use
[root@hract21 JAVA]# java  JavaUDPServer  53
Listening on UDP Port: 53
Jan 31, 2015 4:57:29 PM JavaUDPServer main
SEVERE: null
java.net.BindException: Address already in use
    at java.net.PlainDatagramSocketImpl.bind0(Native Method)
    at java.net.PlainDatagramSocketImpl.bind(PlainDatagramSocketImpl.java:125)
    at java.net.DatagramSocket.bind(DatagramSocket.java:372)

 

Case I: Clusterware startup fails as  Portnumber:  42424  is in use

Start our test program to block UPD port 42424
[root@hract21 JAVA]#  java  JavaUDPServer  42424
Listening on UDP Port: 42424

Start CRS and monitor local CRS stack
[root@hract21 Desktop]# crsct start crs
*****  Local Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    OFFLINE      -               STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      hract21         STARTING
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    ONLINE       hract21         STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE

--> evmd process remain in status INTERMEDIATE . Local cluster stack doesn't up !
Investigate Trace files:
alert.log: 
2015-01-31 17:14:54.492 [CSSDAGENT(22642)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:9:3} in /u01/app/grid/diag/crs/hract21/crs/trace/ohasd_cssdagent_root.trc.
Sat Jan 31 17:14:59 2015
Errors in file /u01/app/grid/diag/crs/hract21/crs/trace/ocssd.trc  (incident=1):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []

gipcd.trc:
2015-01-31 17:20:27.606277 :GIPCHTHR:812046080:  gipchaWorkerCreateInterface: created local interface for node 'hract21', haName 'gipcd_ha_name', inf 'udp://192.168.2.121:28764' inf 0x7fef0c190b30
2015-01-31 17:20:27.606350 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: failed to bind endp 0x7fef182d8230 [000000000001e71a] { gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fef182da320 status 13flags 0x20000000, flags-2 0x0, usrFlags 0xc000 }, addr 0x7fef182d8cf0 [000000000001e71c] { gipcAddress : name 'mcast://224.0.0.251:42424/192.168.2.121', objFlags 0x0, addrFlags 0x5 }
2015-01-31 17:20:27.606358 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos op  :  sgipcnMctBind
2015-01-31 17:20:27.606360 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 17:20:27.606361 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos loc :  bind
2015-01-31 17:20:27.606363 :GIPCXCPT:812046080:  gipcmodNetworkProcessBind: slos info:  Invalid argument
2015-01-31 17:20:27.606399 :GIPCXCPT:812046080:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed to bind endp 0x7fef182d8230 [000000000001e71a] { gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fef182da320 status 13flags 0x20000000, flags-2 0x0, usrFlags 0xc000 }, addr 0x7fef182d9a20 [000000000001e721] { gipcAddress : name 'mcast://224.0.0.251:42424/192.168.2.121', objFlags 0x0, addrFlags 0x4 }, flags 0x8000
2015-01-31 17:20:27.606408 :GIPCXCPT:812046080:  gipcInternalEndpoint: failed to bind address to endpoint name 'mcast://224.0.0.251:42424/192.168.2.121', ret gipcretAddressInUse (20)
2015-01-31 17:20:27.606426 :GIPCHTHR:812046080:  gipchaWorkerUpdateInterface: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed to create local interface 'udp://192.168.2.121', 0x7fef0c190b30 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.2.121', subnet '192.168.2.0', mask '255.255.255.0', mac '08-00-27-4e-c9-bf', ifname 'eth2', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }, hctx 0x10639b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid '8c45d6e7-00000000', name2 3aca-bf27-17d5-691e, numNode 0, numInf 1, maxPriority 0, clientMode 1, nodeIncarnation d64c9b7c-06451148 usrFlags 0x0, flags 0x2d65 }
2015-01-31 17:20:27.606432 :GIPCHGEN:812046080:  gipchaInterfaceDisable: disabling interface 0x7fef0c190b30 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.2.121', subnet '192.168.2.0', mask '255.255.255.0', mac '08-00-27-4e-c9-bf', ifname 'eth2', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }
2015-01-31 17:20:27.606438 :GIPCHDEM:812046080:  gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7fef0c190b30 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.2.121', subnet '192.168.2.0', mask '255.255.255.0', mac '08-00-27-4e-c9-bf', ifname 'eth2', numRef 0, numFail 0, idxBoot 0, flags 0x1861 }
2015-01-31 17:20:27.60

Investigate the error more in detail  : 
gipcmodNetworkProcessBind: slos dep :  Address already in use (98) 
[root@hract21 Desktop]# cat  /usr/include/asm-generic/errno.h | grep 98
#define    EADDRINUSE    98    /* Address already in use */

Locate the port number :
gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121 -->   42424 is the port 
--> CW   can't listen on port 42424 ! 

Locate the  blocking process at OS level
[root@hract21 Desktop]#    netstat -taupen |grep ":42424 "
udp        0      0 :::42424         ....    22338/java          
[root@hract21 Desktop]# ps -elf | grep 22338
0 S root     22338 26783  0  80   0 - 438331 futex_ 17:04 pts/12  00:00:01 java JavaUDPServer 42424

--> Yep our java program blocks CW from comming up ! Kill java program and restart CW 
[root@hract21 Desktop]# kill -9 22338
[root@hract21 Desktop]# crsctl stop crs -f
[root@hract21 Desktop]# crsctl start crs

 

Case II: Clusterware startup fails as Portnumber  5353  is in use

Start our test program and block MDSND port 5353
[root@hract21 JAVA]#  java  JavaUDPServer 5353
Listening on UDP Port: 5353

*****  Local Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE       ONLINE       hract21         STABLE
ora.cluster_interconnect.haip  1   ONLINE    ONLINE       hract21         STABLE
ora.crf                        1   ONLINE    ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssd                       1   ONLINE    ONLINE       hract21         STABLE
ora.cssdmonitor                1   ONLINE     ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE    ONLINE       hract21         OBSERVER,STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE    ONLINE       hract21         STABLE
ora.gipcd                      1   ONLINE    ONLINE       hract21         STABLE
ora.gpnpd                      1   ONLINE    ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE    INTERMEDIATE hract21         STABLE
ora.storage                    1   ONLINE    ONLINE       hract21         STABLE
--> MDMSD daemon doesn' start 

mdnsd.trc :
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
    CLSB:2559100480: Argument count (argc) for this daemon is 1
    CLSB:2559100480: Argument 0 is: /u01/app/121/grid/bin/mdnsd.bin
2015-01-31 17:40:17.131516 :  CLSDMT:2554820352: PID for the Process [9863], connkey 9
2015-01-31 17:40:18.042329 :    MDNS:2559100480:  mdnsd interface eth0 (0x2 AF=2 f=0x1043 mcast=-1) 192.168.1.9 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.043191 :    MDNS:2559100480:  mdnsd interface eth1 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.046952 :    MDNS:2559100480:  mdnsd interface eth1:1 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.241 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.047574 :    MDNS:2559100480:  mdnsd interface eth1:2 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.242 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.047597 :    MDNS:2559100480:  mdnsd interface eth2 (0x4 AF=2 f=0x1043 mcast=-1) 192.168.2.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.047612 :    MDNS:2559100480:  mdnsd interface eth2:1 (0x4 AF=2 f=0x1043 mcast=-1) 169.254.213.86 mask 255.255.0.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.049171 :    MDNS:2559100480:  mdnsd interface eth3 (0x5 AF=2 f=0x1043 mcast=-1) 192.168.3.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.049222 :    MDNS:2559100480:  mdnsd interface lo (0x1 AF=2 f=0x49 mcast=-1) 127.0.0.1 mask 255.0.0.0 FAILED. Error 98 (Address already in use)
2015-01-31 17:40:18.049236 :    MDNS:2559100480:  Error! No valid netowrk interfaces found to setup mDNS.
2015-01-31 17:40:18.049240 :    MDNS:2559100480:  Oracle mDNSResponder ver. mDNSResponder-1076 (Jun 30 2014 19:39:45) , init_rv=-65537
2015-01-31 17:40:18.049335 :    MDNS:2559100480:  stopping

--> Here we only get the error :  Address already in use  but info about  the portnumber. 
    We need to reference above list and remember that MSDNS is running on port 5353 

Now we can locate the blocking process , kill that process and restart clusterware
[root@hract21 Desktop]#  netstat -taupen |grep ":5353 "
udp        0      0 :::5353         ...            50111629   7252/java   
Again our java program prevents CW from startup. Kill the that process and resart CW.
[root@hract21 Desktop]# kill -9 7252

 

Case III: Investigate GNS startup problem due to Error:  Address already in use

Relocate GNS to a different host
[root@hract21 Desktop]# srvctl relocate gns -n hract23
ora.gns                        1   ONLINE       ONLINE       hract23         STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract23         STABLE

Now occupy port 53 by running our JAVA program:
[root@hract21 JAVA]# java  JavaUDPServer 53
Listening on UDP Port: 53

Now try to bring back the GNS 
[root@hract21 Desktop]# srvctl relocate gns -n hract21
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.gns                        1   ONLINE       OFFLINE      hract21         STARTING
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABLE
--> GNS is in status STARTING but doesn't come up

gnsd.trc :
2015-01-31 18:09:13.518516 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-01-31 18:09:13.518518 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 18:09:13.518520 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos loc :  bind
2015-01-31 18:09:13.518521 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.58:53'
2015-01-31 18:09:13.518577 :GIPCXCPT:255158016:  gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed to bind endp 0x7ff7000034c0 [0000000000001fc6] { gipcEndpoint : localAddr 'udp://192.168.5.58:53', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7ff7000050f0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x24000 }, addr 0x7ff7000047f0 [0000000000001fcd] { gipcAddress : name 'udp://192.168.5.58:53', objFlags 0x0, addrFlags 0x4 }, flags 0x20000
2015-01-31 18:09:13.518589 :GIPCXCPT:255158016:  gipcInternalEndpoint: failed to bind address to endpoint name 'udp://192.168.5.58:53', ret gipcretAddressInUse (20)
2015-01-31 18:09:13.518608 :GIPCXCPT:255158016:  gipcEndpointF [clsgngipcCreateEndpointInternal : clsgngipc.c : 2008]: EXCEPTION[ ret gipcretAddressInUse (20) ]  failed endp create ctx 0x7ff7196f3c80 [0000000000001e99] { gipcContext : traceLevel 2, fieldLevel 0x0, numDead 0, numPending 0, numZombie 0, numObj 4, numWait 0, numReady 0, wobj 0x7ff7196f1c10, hgid 0000000000001e9a, flags 0x1a, objFlags 0x0 }, name 'udp://192.168.5.58:53', flags 0x24000
2015-01-31 18:09:13.518728 :     GNS:255158016: Resolve::clsgndnsCreateContainerCallback: (:CLSGN01163:) Error - Address in use: port 53 address "192.168.5.58". 1: clskec:has:CLSGN:208 2 args[has:CLSGN:208][udp://192.168.5.58:53]
2: clskec:has:gipc:20 1 args[has:gipc:20]
3: clskec:has:CLSU:910 4 args[has][mod=gipcInternalEndpoint][loc=473][msg=failed to bind address to endpoint name 'udp://192.168.5.58:53']
2015-01-31 18:09:13.518769 :     GNS:255158016: Resolve::clsgndnsCreateContainer: (:CLSGN00927:) failed to listen on all addresses - throwing error.
default:255158016: listen failed with 1 errors
1: clskec:has:CLSGN:208 3 args[has:CLSGN:208][192.168.5.58][53]

The following error messages tell use Linux errno code and the related portnumber :
2015-01-31 18:09:13.518518 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 18:09:13.518521 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.58:53'

Again locate the port number and kill the process
[root@hract21 Desktop]#  netstat -taupen |grep ":53 "
udp    16128      0 :::53          ...          51417680   23723/java
Again kill the process which holds the port number and restart CW 
[root@hract21 Desktop]# kill -9  23723

Now test whether Relocated GNS works again
[root@hract21 ~]#   srvctl relocate gns -n hract21
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.gns                        1   ONLINE       ONLINE       hract21         STABLE
ora.gns.vip                    1   ONLINE       ONLINE       hract21         STABL

Complete Portnumber Usage of a working RAC system

[root@hract21 ~]#   netstat -taupen |grep 192.168
tcp        0      0 192.168.5.242:1521          0.0.0.0:*                   LISTEN      501        50803141   17310/tnslsnr       
tcp        0      0 192.168.5.241:1521          0.0.0.0:*                   LISTEN      501        50793916   17258/tnslsnr       
tcp        0      0 192.168.5.121:1521          0.0.0.0:*                   LISTEN      501        50793894   17258/tnslsnr       
tcp        0      0 192.168.2.121:1522          0.0.0.0:*                   LISTEN      501        50790436   17212/tnslsnr       
tcp        0      0 192.168.2.121:61020         0.0.0.0:*                   LISTEN      0          50773311   16994/osysmond.bin  
tcp        0      0 192.168.5.121:42942         0.0.0.0:*                   LISTEN      501        50724207   16454/gipcd.bin     
tcp        0      0 192.168.5.58:39839          0.0.0.0:*                   LISTEN      0          51856456   27381/gnsd.bin      
tcp        0      0 192.168.5.232:36063         0.0.0.0:*                   LISTEN      502        8145376    596/exectask        
tcp        0      0 192.168.5.121:15043         0.0.0.0:*                   LISTEN      0          50730332   16281/ohasd.bin     
tcp        0      0 192.168.5.121:42942         192.168.5.123:28657         ESTABLISHED 501        50841598   16454/gipcd.bin     
tcp        0      0 192.168.5.241:1521          192.168.5.121:55119         ESTABLISHED 501        50829166   17258/tnslsnr       
tcp        0      0 192.168.2.121:46847         192.168.2.122:1522          ESTABLISHED 0          50774509   17012/crsd.bin      
tcp        0      0 192.168.2.121:1522          192.168.2.123:60331         ESTABLISHED 501        50795614   17212/tnslsnr       
tcp        0      0 192.168.2.121:1522          192.168.2.121:16025         ESTABLISHED 501        50829535   17212/tnslsnr       
tcp        0      0 192.168.2.121:1522          192.168.2.122:54611         ESTABLISHED 501        50796842   17212/tnslsnr       
tcp        0      0 192.168.2.121:46865         192.168.2.122:1522          ESTABLISHED 501        50829527   17468/asm_lreg_+ASM 
tcp        0      0 192.168.5.242:1521          192.168.5.121:61101         ESTABLISHED 501        50838159   17310/tnslsnr       
tcp        0      0 192.168.5.121:42942         192.168.5.122:32304         ESTABLISHED 501        50841582   16454/gipcd.bin     
tcp        1      0 192.168.1.9:39471           80.150.192.73:80            CLOSE_WAIT  0          50900520   4786/clock-applet   
tcp        0      0 192.168.2.121:1522          192.168.2.121:16024         ESTABLISHED 501        50829534   17212/tnslsnr       
tcp        0      0 192.168.2.121:16024         192.168.2.121:1522          ESTABLISHED 501        50829529   17468/asm_lreg_+ASM 
tcp        0      0 192.168.2.121:16025         192.168.2.121:1522          ESTABLISHED 501        50829531   17468/asm_lreg_+ASM 
tcp        0      0 192.168.2.121:28139         192.168.2.123:1522          ESTABLISHED 501        50829525   17468/asm_lreg_+ASM 
tcp        0      0 192.168.5.121:64227         192.168.5.122:35547         ESTABLISHED 501        50790718   16454/gipcd.bin     
tcp        0      0 192.168.5.121:21046         192.168.5.123:6200          ESTABLISHED 501        50787900   17215/ons           
tcp        0      0 192.168.5.121:59844         192.168.5.50:22             ESTABLISHED 0          44509382   13726/ssh           
tcp        0      0 192.168.5.121:61101         192.168.5.242:1521          ESTABLISHED 502        50838158   17721/ora_lreg_bank 
tcp        0      0 192.168.5.58:39839          192.168.5.121:34266         TIME_WAIT   0          0          -                   
tcp        0      0 192.168.5.121:16432         192.168.5.122:6200          ESTABLISHED 501        50787901   17215/ons           
tcp        0      0 192.168.2.121:39861         192.168.2.123:61021         ESTABLISHED 0          50769440   16994/osysmond.bin  
tcp        0      0 192.168.5.121:55125         192.168.5.241:1521          ESTABLISHED 502        50837652   17721/ora_lreg_bank 
tcp        0      0 192.168.5.121:55119         192.168.5.241:1521          ESTABLISHED 501        50829165   17468/asm_lreg_+ASM 
tcp        0      0 192.168.5.241:1521          192.168.5.121:55125         ESTABLISHED 501        50837653   17258/tnslsnr       
tcp        0      0 192.168.5.121:10242         192.168.5.123:17701         ESTABLISHED 501        50790723   16454/gipcd.bin     
tcp        0      0 192.168.5.121:55728         192.168.5.123:22            ESTABLISHED 0          27679552   25184/ssh           
udp        0      0 192.168.2.121:35570         0.0.0.0:*                               0          50731287   16281/ohasd.bin     
udp        0      0 192.168.2.121:51962         0.0.0.0:*                               0          50751183   16922/octssd.bin    
udp        0      0 192.168.2.255:42424         0.0.0.0:*                               501        50734962   16537/ocssd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*                               0          50731290   16281/ohasd.bin     
udp        0      0 192.168.2.255:42424         0.0.0.0:*                               501        50725223   16454/gipcd.bin     
udp        0      0 192.168.2.121:15891         0.0.0.0:*                               501        50734959   16537/ocssd.bin     
udp        0      0 192.168.2.121:12075         0.0.0.0:*                               501        50782505   16408/evmd.bin      
udp        0      0 192.168.5.58:53             0.0.0.0:*                               0          51856599   27381/gnsd.bin      
udp        0      0 192.168.5.58:123            0.0.0.0:*                               38         51843931   1291/ntpd           
udp        0      0 192.168.5.242:123           0.0.0.0:*                               38         50803109   1291/ntpd           
udp        0      0 192.168.5.241:123           0.0.0.0:*                               38         50793859   1291/ntpd           
udp        0      0 192.168.3.121:123           0.0.0.0:*                               0          43573989   1291/ntpd           
udp        0      0 192.168.2.121:123           0.0.0.0:*                               0          43573987   1291/ntpd           
udp        0      0 192.168.5.121:123           0.0.0.0:*                               0          43573984   1291/ntpd           
udp        0      0 192.168.1.9:123             0.0.0.0:*                               0          43573983   1291/ntpd           
udp        0      0 192.168.2.121:53498         0.0.0.0:*                               0          50776026   17012/crsd.bin      
udp        0      0 192.168.2.121:45379         0.0.0.0:*                               501        50725220   16454/gipcd.bin

Troubleshooting hint CW startproblems due to  Address already in use errors

Before CW startup verify the the following ports are not in use at all 
[root@hract21 Desktop]#    netstat -taupen |grep ":42424 "
[root@hract21 Desktop]#    netstat -taupen |grep ":5353 "
[root@hract21 Desktop]#    netstat -taupen |grep ":53 "
[root@hract21 Desktop]#    netstat -taupen |egrep ":6100 |:6200"
If you find any processes not belonging to the Oracle Clusterware stack you need to kill/stop 
these processes


If having problem with Clusterware startup or CW components startup ( GSD, VIPs ) you may 
check your clusterware tracefils for  "Address already in use" Error .

Note the tracefile location has changed for RAC 12.1.0.2 :
[grid@hract21 trace]$  grep -l "Address already in use" *
gipcd.trc
gnsd.trc
mdnsd.trc
ocssd.trc
ohasd.trc

Now find details:
# grep "Address already in use" ohasd.trc  mdnsd.trc  ocssd.trc gnsd.trc  gipcd.trc gnsd.trc | grep "2015-01-31 17"
ohasd.trc:2015-01-31 17:30:16.613432 :GIPCXCPT:2420897536:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
mdnsd.trc:2015-01-31 17:40:18.049222 :    MDNS:2559100480:  mdnsd interface lo (0x1 AF=2 f=0x49 mcast=-1) 127.0.0.1 mask 255.0.0.0 FAILED. Error 98 (Address already in use)
ocssd.trc:2015-01-31 17:04:56.013085 :GIPCXCPT:3986515712:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
gipcd.trc:2015-01-31 17:06:25.204775 :GIPCXCPT:812046080:   gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
gnsd.trc:2015-01-31 18:09:13.518518 :GIPCXCPT:2551580oblematic Portnumber and 16:    gipcmodNetworkProcessBind: slos dep :  Address already in use (98)

For mdnsd.trc we already know the port number          :   5353   
For ohasd.trc, ocssd.trc, gipcd.trc the port number is :  42424
For GNS the tracefiles provides details about the problematic Portnumber and IP-adress
[grid@hract21 trace]$  grep gipcmodNetworkProcessBind  gnsd.trc  
2015-01-31 18:09:13.518483 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: failed to bind endp 0x7ff7000034c0 [0000000000001fc6] { gipcEndpoint : localAddr 'udp://192.168.5.58:53', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7ff7000050f0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x24000 }, addr 0x7ff7000038b0 [0000000000001fc8] { gipcAddress : name 'udp://192.168.5.58:53', objFlags 0x0, addrFlags 0x5 }
2015-01-31 18:09:13.518516 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos op  :  sgipcnTcpBind
2015-01-31 18:09:13.518518 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos dep :  Address already in use (98)
2015-01-31 18:09:13.518520 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos loc :  bind
2015-01-31 18:09:13.518521 :GIPCXCPT:255158016:  gipcmodNetworkProcessBind: slos info:  addr '192.168.5.58:53'

 

Reference

Debug Cluvfy error ERROR: PRVF-9802

ERROR: 
PRVF-9802 : Attempt to get udev information from node "hract21" failed
No UDEV rule found for device(s) specified


Checking: cv/log/cvutrace.log.0

          ERRORMSG(hract21): PRVF-9802 : Attempt to get udev information from node "hract21" failed
No UDEV rule found for device(s) specified

[Thread-757] [ 2015-01-29 15:56:44.157 CET ] [StreamReader.run:65]  OUTPUT><CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP>
</SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT>
<SLOS_OTHERINFO>No UDEV rule found for device(s) specified</SLOS_OTHERINFO>
</CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG>
<CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
</CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
  | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' 
  | sed -e 's/://' -e 's/\.\*/\*/g'</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
..
[Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:144]  runCommand: process returns 0
[Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:161]  RunTimeExec: output>

Run the exectask from OS prompt :
[root@hract21 ~]# /tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
<CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP></SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT><SLOS_OTHERINFO>No UDEV rule found for device(s)
 specified</SLOS_OTHERINFO></CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG>
<CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
</CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
 | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
 | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' 
 | sed -e 's/://' -e 's/\.\*/\*/g'
 </CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules 
 | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'
 | awk '{if ("asmdisk2_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
 | sed -e 's/://' -e 's/\.\*/\*/g'
</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>

Test the exectask in detail:
[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE  
 | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
 | awk '  {if ("asmdisk1_10G" ~ $1) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
--> Here awk returns nothing !

[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
  |awk '  {  print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'   
 
sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", 
   RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid"

--> The above sed script adds sd?1 as parameter $1 and @ as parameter $2 . 
    later awk search for "asmdisk1_10G" in parameter $1   if ("asmdisk1_10G" ~ $1) ... 
        as string "asmdisk1_10G" can be found in paramter $3 but in in paramter $1 !!
    
Potential Fix : Modify search string we get a record back !
[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'  
  |awk  '  /asmdisk1_10G/ {  print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent",
 RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid", ..

--> Seems the way Oracle extracts UDEV data is not working for OEL 6 where UDEV Records could look like: 
NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent",   
    RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c",  OWNER="grid", GROUP="asmadmin", MODE="0660"

As the ASM disk has the proper permissions I decided to ignore the warnings  
[root@hract21 rules.d]# ls -l  /dev/asm*
brw-rw---- 1 grid asmadmin 8, 17 Jan 29 09:33 /dev/asmdisk1_10G
brw-rw---- 1 grid asmadmin 8, 33 Jan 29 09:33 /dev/asmdisk2_10G
brw-rw---- 1 grid asmadmin 8, 49 Jan 29 09:33 /dev/asmdisk3_10G
brw-rw---- 1 grid asmadmin 8, 65 Jan 29 09:33 /dev/asmdisk4_10G

Using datapatch in a RAC env

Overview

  • Datapatch is the new tool that enables automation of post-patch SQL actions for RDBMS patches.
  • If we have a 3 node Rac cluster datapatch runs 3  jobs named LOAD_OPATCH_INVENTORY_1 ,LOAD_OPATCH_INVENTORY_2, LOAD_OPATCH_INVENTORY_3
  • This inventory updates requires that all RAC nodes are available ( even for Policy managed database )
  • Install Helper package from Node 1585814.1 : [ demo1.sql + demo2.sql ]
  • With 12c we have a SQL interface for quering patches (by reading lsinventory via PLSQL )
  • For patches that do not have post-patch SQL actions to be performed, calling datapatch is a no-op.
  • For patches that do have post-patch SQL instructions to be invoked on the database instance, datapatch will automatically   detect ALL pending actions (from one installed patch or multiple installed patches) and complete the actions as appropriate.

What should I do when the datapatch commands throws any error or warning ?

Rollable VS. Non-Rollable Patches: ( From Oracle Docs )
 - Patches are designed to be applied in either rolling mode or non-rolling mode.
 - If a patch is rollable, the patch has no dependency on the SQL script. 
   The database can be brought up without issue.

 OPatchauto succeeds with a warning on datapatch/sqlpatch.
  ->  For rollable patches:
        In-1gnore datapatch errors on node 1 - node().
        On the last node (node n), run datapatch again. You can cut and paste this command from the log file.
        If you still encounter datapatch errors on the last node, call Oracle Support or open a Service Request.

   -> For non-rollable patches:
        Bring down all databases and stacks manually for all nodes.
        Run opatchauto apply on every node.
        Bring up the stack and databases.
        Note that the databases must be up in order for datapatch to connect and apply the SQL.
        Manually run datapatch on the last node. 
        Note that if you do not run datapatch, the SQL for the patch will not be applied and you will not 
           benefit from the bug fix. In addition, you may encounter incorrect system behavior 
           depending on the changes the SQL is intended to implement.
        If datapatch continues to fail, you must roll back the patch. 
        Call Oracle Support for assistance or open a Service Request.

 

How to check the current patch level and reinstall a SQL patch ?

[oracle@gract1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 08:55:31 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550
Currently installed C Patches: 19121550
Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  Nothing to apply
Patch installation complete.  Total patches installed: 0
SQL Patching tool complete on Sun Jan 25 08:57:14 2015

--> Patch 19121550 is installed ( both parts C layer and SQL layer are installed )

Rollback the patch
[oracle@gract1 OPatch]$  ./datapatch -rollback 19121550
SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 09:03:03 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  The following patches will be rolled back: 19121550
  Nothing to apply
Installing patches...
Patch installation complete.  Total patches installed: 1
Validating logfiles...done
SQL Patching tool complete on Sun Jan 25 09:04:51 2015

Reapply the patch
oracle@gract1 OPatch]$  ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 09:06:55 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches:               <-- Here we can see that SQL patch is not yet installed !
Currently installed C Patches: 19121550
Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied: 19121550
Installing patches...
Patch installation complete.  Total patches installed: 1
Validating logfiles...
Patch 19121550 apply: SUCCESS
  logfile: /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2015Jan25_09_08_51.log (no errors)
  catbundle generate logfile: /u01/app/oracle/cfgtoollogs/catbundle/catbundle_PSU_DW_dw_GENERATE_2015Jan25_09_08_51.log (no errors)
  catbundle apply logfile: /u01/app/oracle/cfgtoollogs/catbundle/catbundle_PSU_DW_dw_APPLY_2015Jan25_09_08_53.log (no errors)
SQL Patching tool complete on Sun Jan 25 09:10:31 2015

Verify the current patch status 
SQL> select * from dba_registry_sqlpatch;
  PATCH_ID ACTION       STATUS       ACTION_TIME              DESCRIPTION
---------- --------------- --------------- ------------------------------ --------------------
LOGFILE
------------------------------------------------------------------------------------------------------------------------
  19121550 APPLY       SUCCESS       26-OCT-14 12.13.19.575484 PM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2014Oct26_12_01_54.log

  19121550 ROLLBACK       SUCCESS       25-JAN-15 09.04.51.585648 AM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_rollback_DW_2015Jan25_09_04_43.log

  19121550 APPLY       SUCCESS       25-JAN-15 09.10.31.872019 AM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2015Jan25_09_08_51.log

--> Here we can identify that we re-applied the SQL part of patch  19121550 at : 25-JAN-15 09.10.31

 

Using  Queryable Patch Inventory [ DEMOQP helper package ]

Overview DEMOQP helper package 
Install Helper package from Node 1585814.1 : [ demo1.sql + demo2.sql ] 
Have a short look on these package details:
SQL> desc DEMOQP
PROCEDURE CHECK_PATCH_INSTALLED
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 BUGS                QOPATCH_LIST        IN
PROCEDURE COMPARE_CURRENT_DB
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 BUGS                QOPATCH_LIST        IN
PROCEDURE COMPARE_RAC_NODE
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 NODE                VARCHAR2        IN
 INST                VARCHAR2        IN
FUNCTION GET_BUG_DETAILS RETURNS XMLTYPE
 Argument Name            Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 PATCH                VARCHAR2        IN
FUNCTION GET_DEMO_XSLT RETURNS XMLTYPE

Script to test Queryable Patch Inventory : check_patch.sql  
/*     
        For details see : 
        Queryable Patch Inventory -- SQL Interface to view, compare, validate database patches (Doc ID 1585814.1)
*/
set echo on
set pagesize 20000
set long 200000

/* Is patch 19849140 installed  ?  */
set serveroutput on
exec DEMOQP.check_patch_installed (qopatch_list('19849140'));

/* Return details about pacht 19849140 */
select xmltransform(DEMOQP.get_bug_details('19849140'), dbms_qopatch.get_opatch_xslt()) from dual;

/* As we are running on a PM managed db let's have look on host_names and instance names */
col HOST_NAME format A30
select host_name, instance_name from gv$instance;
select host_name, instance_name from v$instance;

/* check Instance ERP_1 on gract2.example.com */
exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1');
select xmltransform (dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual;

/* Compare RAC nodes - this is not working in my env ! --> Getting   ORA-06502: PL/SQL: numeric or value error */
set serveroutput on
exec demoqp.compare_rac_node('gract2.example.com','ERP_1');



1) Check whether a certain patch ins installed

SQL> /* Is patch 19849140 installed    ?    */
SQL> set serveroutput on
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED


2) Check patch details for patch  19849140

SQL> /* Return details about pacht 19849140 */
SQL> select xmltransform(DEMOQP.get_bug_details('19849140'), dbms_qopatch.get_opatch_xslt()) from dual;
XMLTRANSFORM(DEMOQP.GET_BUG_DETAILS('19849140'),DBMS_QOPATCH.GET_OPATCH_XSLT())
--------------------------------------------------------------------------------

Patch     19849140:   applied on 2015-01-23T16:31:09+01:00
Unique Patch ID: 18183131
  Patch Description: Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Comp
onent)
  Created on     : 23 Oct 2014, 08:32:20 hrs PST8PDT
  Bugs fixed:
     16505840  16505255  16505717  16505617  16399322  16390989  17486244  1
6168869  16444109  16505361  13866165  16505763  16208257  16904822  17299876  1
6246222  16505540  16505214  15936039  16580269  16838292  16505449  16801843  1
6309853  16505395  17507349  17475155  16493242  17039197  16196609  18045611  1
7463260  17263488  16505667  15970176  16488665  16670327  17551223
  Files Touched:

    cluvfyrac.sh
    crsdiag.pl
    lsnodes
..


3) Read in the inventory stuff from a gract2.example.com running instance  ERP_1
SQL> /* As we are running on a PM managed db let's have look on host_names and instance names */
SQL> col HOST_NAME format A30
SQL> select host_name, instance_name from gv$instance;

HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           ERP_2
gract2.example.com           ERP_1
gract3.example.com           ERP_3

SQL> select host_name, instance_name from v$instance;

HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           ERP_2

SQL> 
SQL> /* check Instance ERP_1 on gract2.example.com */
SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1');

SQL> select xmltransform (dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual;
XMLTRANSFORM(DBMS_QOPATCH.GET_OPATCH_LSINVENTORY(),DBMS_QOPATCH.GET_OPATCH_XSLT(
--------------------------------------------------------------------------------

Oracle Querayable Patch Interface 1.0
--------------------------------------------------------------------------------

Oracle Home      : /u01/app/oracle/product/121/racdb
Inventory      : /u01/app/oraInventory
--------------------------------------------------------------------------------
Installed Top-level Products (1):
Oracle Database 12c                       12.1.0.1.0
Installed Products ( 131)
..

4) Compare RAC nodes 
This very exiting feature doesn't work - sorry not time for debugging !

SQL> /* Compare RAC nodes - this is not working in my env ! --> Getting   ORA-06502: PL/SQL: numeric or value error */
SQL> set serveroutput on
SQL> exec demoqp.compare_rac_node('gract2.example.com','ERP_1');
BEGIN demoqp.compare_rac_node('gract2.example.com','ERP_1'); END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
ORA-06512: at "SYS.DEMOQP", line 40
ORA-06512: at line 1

gract2.example.com           ERP_1

 

Why rollback and reapply SQL patch results in a NO-OP operation ?

[oracle@gract1 OPatch]$ ./datapatch -rollback 19849140 -force
SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 19:39:29 2015
Copyright (c) 2014, Oracle.  All rights reserved.
Connecting to database...OK
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  The following patches will be rolled back: 19849140
  Nothing to apply
Error: prereq checks failed!
  patch 19849140: rollback script /u01/app/oracle/product/121/racdb/sqlpatch/19849140/19849140_rollback.sql does not exist
Prereq check failed!  Exiting without installing any patches
See support note 1609718.1 for information on how to resolve the above errors
SQL Patching tool complete on Sat Jan 24 19:39:29 2015

What is this ?
Lets check dba_registry_sqlpatch whether patch 19849140 comes with any SQL changes 

SQL> col action_time format A30
SQL> col DESCRIPTION format A20
SQL> select * from dba_registry_sqlpatch ;
  PATCH_ID ACTION       STATUS       ACTION_TIME              DESCRIPTION
---------- --------------- --------------- ------------------------------ --------------------
LOGFILE
------------------------------------------------------------------------------------------------------------------------
  19121550 APPLY       SUCCESS       26-OCT-14 12.13.19.575484 PM   bundle:PSU
/u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2014Oct26_12_01_54.log

--> Patch doesn't provide any SQL changes - so above error isn't more an informational message.

What is the root cause of ORA-20006 in a RAC env?

Stop an instance 
[oracle@gract2 ~]$  srvctl stop instance -d dw -i dw_3
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.dw.db                      1   ONLINE       ONLINE       gract1          Open,STABLE  
ora.dw.db                      2   ONLINE       ONLINE       gract3          Open,STABLE  
ora.dw.db                      3   OFFLINE      OFFLINE      -               Instance Shutdown,ST ABLE

[oracle@gract1 OPatch]$ ./datapatch  -verbose
SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 20:03:22 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550
DBD::Oracle::st execute failed: ORA-20006: Number of RAC active instances and opatch jobs configured are not same
ORA-06512: at "SYS.DBMS_QOPATCH", line 1007
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute) [for Statement "DECLARE
       x XMLType;
     BEGIN
       x := dbms_qopatch.get_pending_activity;
       ? := x.getStringVal();
     END;" with ParamValues: :p1=undef] at /u01/app/oracle/product/121/racdb/sqlpatch/sqlpatch.pm line 1293.

Note even for policy managed database we need all instances up running on all servers to apply the patch !

Start the instance and and rerun  ./datapatch command
[oracle@gract1 OPatch]$ srvctl start instance -d dw -i dw_3
[oracle@gract1 OPatch]$ vi check_it.sql
[oracle@gract1 OPatch]$  ./datapatch  -verbose
SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 20:17:33 2015
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550

...................

ORA-20008 during datapatch installation on a RAC env

You get  ORA-20008 during running datapatch tool or during quering the patch status 
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
BEGIN DEMOQP.check_patch_installed (qopatch_list('19849140')); END;

*
ERROR at line 1:
ORA-20008: Timed out, Job Load_opatch_inventory_3execution time is more than 120Secs
ORA-06512: at "SYS.DBMS_QOPATCH", line 1428
ORA-06512: at "SYS.DBMS_QOPATCH", line 182
ORA-06512: at "SYS.DEMOQP", line 157
ORA-06512: at line 1

SQL> set linesize 120
SQL> col NODE_NAME format A20
SQL> col JOB_NAME format A30
SQL> col START_DATE format A35
SQL> col INST_JOB   format A30
SQL> select NODE_NAME, INST_ID, INST_JOB from opatch_inst_job;

NODE_NAME        INST_ID INST_JOB
-------------------- ---------- ------------------------------
gract1.example.com          1 Load_opatch_inventory_1
gract3.example.com          2 Load_opatch_inventory_2
gract2.example.com          3 Load_opatch_inventory_3

SQL> 
SQL> select job_name,state, start_date from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';

JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_2        SUCCEEDED       24-JAN-15 11.35.41.629308 AM +01:00
LOAD_OPATCH_INVENTORY_3        SCHEDULED       24-JAN-15 11.35.41.683097 AM +01:00
LOAD_OPATCH_INVENTORY_1        SUCCEEDED       24-JAN-15 11.35.41.156565 AM +01:00
 
JOB was scheduled but was never succeeded ! 
--> After fixing the the connections problem to  gract2.example.com the job runs to completion

SQL> select job_name,state, start_date from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';
JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_2        SUCCEEDED       24-JAN-15 11.59.29.078730 AM +01:00
LOAD_OPATCH_INVENTORY_3        SUCCEEDED       24-JAN-15 11.59.29.148714 AM +01:00
LOAD_OPATCH_INVENTORY_1        SUCCEEDED       24-JAN-15 11.59.29.025652 AM +01:00

Verify the patch install on all cluster nodes
SQL> set echo on
SQL> set pagesize 20000
SQL> set long 200000
SQL> 
SQL> /* As we are running on a PM managed db let's have look on host_names and instance names */
SQL> col HOST_NAME format A30
SQL> select host_name, instance_name from gv$instance;
HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           dw_1
gract2.example.com           dw_3
gract3.example.com           dw_2
SQL> select host_name, instance_name from v$instance;
HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract1.example.com           dw_1

SQL> /* exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1'); */
SQL> set serveroutput on
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED

SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','dw_3');
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED

SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract3.example.com','dw_2');
SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140'));
----------Patch Report----------
19849140 : INSTALLED

 

Monitor Script to track  dba_scheduler_jobs and  opatch_inst_job tables

[oracle@gract1 ~/DATAPATCH]$ cat check_it.sql
 connect / as sysdba
 alter session set NLS_TIMESTAMP_TZ_FORMAT = 'dd-MON-yyyy HH24:mi:ss';
 set linesize 120
 col NODE_NAME format A20
 col JOB_NAME format A30
 col START_DATE format A25
 col LAST_START_DATE format A25
 col INST_JOB   format A30
 select NODE_NAME, INST_ID, INST_JOB from opatch_inst_job;
 select job_name,state, start_date, LAST_START_DATE from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';

 

How to cleanup after ORA-27477 errors ?

oracle@gract1 OPatch]$  ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Fri Jan 23 20:44:48 2015
Copyright (c) 2014, Oracle.  All rights reserved.
Connecting to database...OK
Determining current state...
Currently installed SQL Patches: 19121550
DBD::Oracle::st execute failed: ORA-27477: "SYS"."LOAD_OPATCH_INVENTORY_3" already exists
ORA-06512: at "SYS.DBMS_QOPATCH", line 1011
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute) [for Statement "DECLARE
x XMLType;
BEGIN
x := dbms_qopatch.get_pending_activity;
? := x.getStringVal();
END;" with ParamValues: :p1=undef] at /u01/app/oracle/product/121/racdb/sqlpatch/sqlpatch.pm line 1293.

sqlplus /nolog @check_it
NODE_NAME        INST_ID INST_JOB
-------------------- ---------- ------------------------------
gract2.example.com          1 Load_opatch_inventory_1
gract1.example.com          2 Load_opatch_inventory_2

JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_1        DISABLED        23-JAN-15 08.38.11.746811 PM +01:00
LOAD_OPATCH_INVENTORY_3        DISABLED        23-JAN-15 08.36.18.506279 PM +01:00
LOAD_OPATCH_INVENTORY_2        DISABLED        23-JAN-15 08.38.11.891360 PM +01:00

Drop the jobs and cleanup the  opatch_inst_job table
SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_1');
SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_2');
SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_3');
SQL>  delete from opatch_inst_job;
2 rows deleted.
SQL> commit;

Now rerun  ./datapatch verbose command and monitor progress
SQL> @check_it
Connected.
NODE_NAME        INST_ID INST_JOB
-------------------- ---------- ------------------------------
gract2.example.com          1 Load_opatch_inventory_1
gract1.example.com          2 Load_opatch_inventory_2
gract3.example.com          3 Load_opatch_inventory_3
--> All our cluster nodes are ONLINE and the required JOBS are SCHEDULED !
JOB_NAME               STATE           START_DATE
------------------------------ --------------- -----------------------------------
LOAD_OPATCH_INVENTORY_1        SUCCEEDED       23-JAN-15 08.46.08.885038 PM +01:00
LOAD_OPATCH_INVENTORY_2        SUCCEEDED       23-JAN-15 08.46.08.933665 PM +01:00
LOAD_OPATCH_INVENTORY_3        RUNNING           23-JAN-15 08.46.09.014492 PM +01:00

Reference

  • 12.1.0.1 datapatch issue : ORA-27477: “SYS”.”LOAD_OPATCH_INVENTORY_1″ already exists (Doc ID 1934882.1)
  • Oracle Database 12.1 : FAQ on Queryable Patch Inventory [ID 1530108.1]
  • Datapatch errors at “SYS.DBMS_QOPATCH” [ID 1599479.1]
  • Queryable Patch Inventory — SQL Interface to view, compare, validate database patches (Doc ID 1585814.1)

Manually applying CW Patch ( 12.1.0.1.5 )

Overview

  • In this tutorial we will manually apply a CW patch [ 19849140 ] without using opatchauto.
  • For that we closely follow the patch README – chapter 5 [  patches/12105/19849140/README.html ]   ->  Manual Steps for Apply/Rollback Patch

Check for conflicts

[root@gract1 CLUVFY-JAN-2015]#  $GRID_HOME/OPatch/opatchauto apply /media/sf_kits/patches/12105/19849140 -analyze 
 OPatch Automation Tool
Copyright (c) 2015, Oracle Corporation.  All rights reserved.
OPatchauto version : 12.1.0.1.5
OUI version        : 12.1.0.1.0
Running from       : /u01/app/121/grid
opatchauto log file: /u01/app/121/grid/cfgtoollogs/opatchauto/19849140/opatch_gi_2015-01-22_18-25-48_analyze.log
NOTE: opatchauto is running in ANALYZE mode. There will be no change to your system.
Parameter Validation: Successful
Grid Infrastructure home:
/u01/app/121/grid
RAC home(s):
/u01/app/oracle/product/121/racdb
Configuration Validation: Successful
Patch Location: /media/sf_kits/patches/12105/19849140
Grid Infrastructure Patch(es): 19849140 
RAC Patch(es): 19849140 
Patch Validation: Successful
Analyzing patch(es) on "/u01/app/oracle/product/121/racdb" ...
[WARNING] The local database instance 'dw_2' from '/u01/app/oracle/product/121/racdb' is not running. 
SQL changes, if any,  will not be analyzed. Please refer to the log file for more details.
[WARNING] SQL changes, if any, could not be analyzed on the following database(s): ERP ... Please refer to the log 
file for more details. 
Apply Summary:
opatchauto ran into some warnings during analyze (Please see log file for details):
GI Home: /u01/app/121/grid: 19849140
RAC Home: /u01/app/oracle/product/121/racdb: 19849140
opatchauto completed with warnings.
You have new mail in /var/spool/mail/root

If this is a GI Home, as the root user execute:
Oracle Clusterware active version on the cluster is [12.1.0.1.0]. The cluster upgrade state is [NORMAL]. 
The cluster active 
patch level is [482231859].
..
--> As this is a clusterware patch ONLY ignore the WARNINGs 
    
  •  Note during the analyze step we get a first hint that all instances must run on all all server for applying the patch !

Run pre root script and apply the GRID patch

1) Stop all databases rununing out of this ORACLE_HOME and unmount ACFS filesystem

2) Run the pre root script
[grid@gract1 gract1]$ $GRID_HOME>/crs/install/rootcrs.pl -prepatch

3) Apply the CRS patch 
[grid@gract1 gract1]$   $GRID_HOME/OPatch/opatch apply -oh $GRID_HOME 
                          -local /media/sf_kits/patches/12105/19849140/19849140
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/121/grid
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/121/grid/oraInst.loc
OPatch version    : 12.1.0.1.5
OUI version       : 12.1.0.1.0
Log file location : /u01/app/121/grid/cfgtoollogs/opatch/opatch2015-01-23_12-25-48PM_1.log

Applying interim patch '19849140' to OH '/u01/app/121/grid'
Verifying environment and performing prerequisite checks...
Interim patch 19849140 is a superset of the patch(es) [  17077442 ] in the Oracle Home
OPatch will roll back the subset patches and apply the given patch.
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Y
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/121/grid')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Rolling back interim patch '17077442' from OH '/u01/app/121/grid'
Patching component oracle.crs, 12.1.0.1.0...
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
RollbackSession removing interim patch '17077442' from inventory
OPatch back to application of the patch '19849140' after auto-rollback.
Patching component oracle.crs, 12.1.0.1.0...
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
Verifying the update...
Patch 19849140 successfully applied
Log file location: /u01/app/121/grid/cfgtoollogs/opatch/opatch2015-01-23_12-25-48PM_1.log
OPatch succeeded.

Verify OUI inventory
[grid@gract2 ~]$ $GRID_HOME//OPatch/opatch lsinventory
--------------------------------------------------------------------------------
Installed Top-level Products (1): 
Oracle Grid Infrastructure 12c                                       12.1.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (3) :
Patch  19849140     : applied on Fri Jan 23 15:52:12 CET 2015
Unique Patch ID:  18183131
Patch description:  "Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Component)"
   Created on 23 Oct 2014, 08:32:20 hrs PST8PDT
   Bugs fixed:
     16505840, 16505255, 16505717, 16505617, 16399322, 16390989, 17486244
     16168869, 16444109, 16505361, 13866165, 16505763, 16208257, 16904822
     17299876, 16246222, 16505540, 16505214, 15936039, 16580269, 16838292
     16505449, 16801843, 16309853, 16505395, 17507349, 17475155, 16493242
     17039197, 16196609, 18045611, 17463260, 17263488, 16505667, 15970176
     16488665, 16670327, 17551223
...
Patch level status of Cluster nodes :
Patch level status of Cluster nodes :
 Patching Level              Nodes
 --------------              -----
 3174741718                  gract2,gract1
  482231859                   gract3
--> Here Node gract1 and gract2 are ready patched where gract3 still need to be patched !

 

Apply the DB patch

[oracle@gract2 ~]$  $ORACLE_HOME/OPatch/opatch apply -oh $ORACLE_HOME 
                     -local /media/sf_kits/patches/12105/19849140/19849140
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/121/racdb
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/121/racdb/oraInst.loc
OPatch version    : 12.1.0.1.5
OUI version       : 12.1.0.1.0
Log file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_16-30-11PM_1.log

Applying interim patch '19849140' to OH '/u01/app/oracle/product/121/racdb'
Verifying environment and performing prerequisite checks...
Patch 19849140: Optional component(s) missing : [ oracle.crs, 12.1.0.1.0 ] 
Interim patch 19849140 is a superset of the patch(es) [  17077442 ] in the Oracle Home
OPatch will roll back the subset patches and apply the given patch.
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Y
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/121/racdb')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Rolling back interim patch '17077442' from OH '/u01/app/oracle/product/121/racdb'
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
RollbackSession removing interim patch '17077442' from inventory
OPatch back to application of the patch '19849140' after auto-rollback.
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
Verifying the update...
Patch 19849140 successfully applied
Log file location: /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_16-30-11PM_1.log
OPatch succeeded.

Run the post script for GRID

As root user execute:
# $GRID_HOME/rdbms/install/rootadd_rdbms.sh
# $GRID_HOME/crs/install/rootcrs.pl -postpatch
Using configuration parameter file: /u01/app/121/grid/crs/install/crsconfig_params
..

Verify the  RAC Node patch level
[oracle@gract3 ~]$   $ORACLE_HOME/OPatch/opatch lsinventory
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/121/racdb
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/121/racdb/oraInst.loc
OPatch version    : 12.1.0.1.5
OUI version       : 12.1.0.1.0
Log file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_17-59-49PM_1.log

Lsinventory Output file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/lsinv/lsinventory2015-01-23_17-59-49PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 
Oracle Database 12c                                                  12.1.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (2) :
Patch  19849140     : applied on Fri Jan 23 17:41:28 CET 2015
Unique Patch ID:  18183131
Patch description:  "Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Component)"
   Created on 23 Oct 2014, 08:32:20 hrs PST8PDT
   Bugs fixed:
     16505840, 16505255, 16505717, 16505617, 16399322, 16390989, 17486244
     16168869, 16444109, 16505361, 13866165, 16505763, 16208257, 16904822
     17299876, 16246222, 16505540, 16505214, 15936039, 16580269, 16838292
     16505449, 16801843, 16309853, 16505395, 17507349, 17475155, 16493242
     17039197, 16196609, 18045611, 17463260, 17263488, 16505667, 15970176
     16488665, 16670327, 17551223
Using configuration parameter file: /u01/app/121/grid/crs/install/crsconfig_params

....
Rac system comprising of multiple nodes
  Local node = gract3
  Remote node = gract1
  Remote node = gract2


Restart the CRS / database and login into the local instance 
root@gract2 Desktop]# su - oracle
-> Active ORACLE_SID:   ERP_1
[oracle@gract2 ~]$ 
[oracle@gract2 ~]$ sqlplus / as sysdba
SQL>  select host_name, instance_name from v$instance;
HOST_NAME               INSTANCE_NAME
------------------------------ ----------------
gract2.example.com           ERP_1

Repeat now all above steps for each RAC node !!

Run the datapatch tool for each Oracle Database

  
ORACLE_SID=ERP_1    
[oracle@gract2 OPatch]$ cd $ORACLE_HOME/OPatch
[oracle@gract2 OPatch]$ ./datapatch -verbose

ORACLE_SID=dw_1
[oracle@gract2 OPatch]$ cd $ORACLE_HOME/OPatch
[oracle@gract2 OPatch]$ ./datapatch -verbose

For potential problems runing datapatch you may read the following article

Reference