Install Oracle RAC 12.2 ( 12cR2 ) PM – Policy Managed

Overview

This tutorial is based on :

Feature Overview

  • FLEX ASM
  • Convert a Adminstrator Managed Database to Policy Managed  ( PM)
  • Configure GNS – requirement for Policy Managed Database
  • UDEV to manage ASM disks
  • Using Chrony as our Timeserver [ replacing NTP ]

Sofware Versions

  • VirtualBox 5.1.18
  • OEL 7.3
  • Oracle RAC 12.2.0.1 using UDEV, Policy Managed Database , Flex ASM feature

Virtualbox Images used

  • ns1 – Name Server / DHCP server / Chrony server running on 192.168.5.50 IP addresss
  • ractw21 – Cluster node 1
  • ractw22 – Cluster node 2

For installing the nameserver please read the article mentioned below :

In this tutorial we have replaced NTP by Chrony. For Chrony setup please read

Networking Details

Domain            : example.com            Name Server: ns1.example.com                192.168.5.50
RAC Sub-Domain    : grid122.example.com    Name Server: gns122.grid122.example.com     192.168.5.55
DHCP Server       : ns1.example.com
Chrony NTP  Server: ns1.example.com
DHCP adresses     : 192.168.1.100 ... 192.168.1.254

Configure DNS:
Identity     Home Node     Host Node                         Given Name                        Type        Address        Address Assigned By     Resolved By
GNS VIP        None        Selected by Oracle Clusterware    gns122.example.com                Virtual     192.168.5.59   Net administrator       DNS + GNS
Node 1 Public  Node 1      ractw21                           ractw21.example.com               Public      192.168.5.141  Fixed                   DNS
Node 1 VIP     Node 1      Selected by Oracle Clusterware    ractw21-vip.grid122.example.com   Private     Dynamic        DHCP                    GNS
Node 1 Private Node 1      ractw21int                        ractw21-int.example.com           Private     192.168.2.141  Fixed                   DNS
Node 2 Public  Node 2      ractw22                           ractw22.example.com               Public      192.168.5.142  Fixed                   DNS
Node 2 VIP     Node 2      Selected by Oracle Clusterware    ractw22-vip.grid122.example.com   Private     Dynamic        DHCP                    GNS
Node 2 Private Node 2      ractw22int                        ractw22-int.example.com           Private     192.168.2.142  Fixed                   DNS
SCAN VIP 1     none        Selected by Oracle Clusterware    RACTW2-scan.grid122.example.com   Virtual     Dynamic        DHCP                    GNS
SCAN VIP 2     none        Selected by Oracle Clusterware    RACTW2-scan.grid122.example.com   Virtual     Dynamic        DHCP                    GNS
SCAN VIP 3     none        Selected by Oracle Clusterware    RACTW2-scan.grid122.example.com   Virtual     Dynamic        DHCP                    GNS

Create VirtualBox Image OEL73 as our RAC Provisioning Server

After basic OS Installation install some helpful YUM packages

[root@ractw21 ~]# yum install system-config-users
[root@ractw21 ~]# yum install wireshark
[root@ractw21 ~]# yum install wireshark-gnome

Install X11 applications like xclock
[root@ractw21 ~]# yum install xorg-x11-apps
[root@ractw21 ~]# yum install telnet 

Use the “oracle-database-server-12cR2-preinstall” package to perform all your prerequisite setup

[root@ractw21 ~]# yum install oracle-database-server-12cR2-preinstall -y

Perpare Network

Change the setting of SELinux to permissive by editing the "/etc/selinux/config" file, i
making sure the SELINUX flag is set as follows.

SELINUX=permissive

If you have the Linux firewall enabled, you will need to disable or configure it, as shown 
here or here. The following is an example of disabling the firewall.

Disable AVAHI daemon  [ only if running ]
# /etc/init.d/avahi-daemon stop
To disable it: 
# /sbin/chkconfig  avahi-daemon off
Stop Firewall 
[root@ractw21 ~]# systemctl stop firewalld
[root@ractw21 ~]# systemctl disable firewalld
[root@ractw21 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
Potential Nameserver issues

Make sure the "/etc/resolv.conf" file includes a nameserver entry that points to the correct nameserver. 
Also, if the "domain" and "search" entries are both present, comment out one of them. For the OS installation
 
my "/etc/resolv.conf" looked like this.

#domain localdomain
search localdomain
nameserver 192.168.1.1

The changes to the "resolv.conf" will be overwritten by the network manager, due to the presence of the 
NAT interface. For this reason, this interface should now be disabled on startup. You can enable it manually 
if you need to access the internet from the VMs. Edit the "/etc/sysconfig/network-scripts/ifcfg-enp0s3" (eth0) 
file, making the following change. This will take effect after the next restart.

ONBOOT=no

Modify resolv.conf and reboot the system:
# Generated by Helmut 
search example.com
nameserver 192.168.5.50

Afer a reboot we have following IP setting and Nameserver loopup should working fine
[root@ractw21 ~]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 08:00:27:c6:ab:10  txqueuelen 1000  (Ethernet)
       
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.5.141  netmask 255.255.255.0  broadcast 192.168.5.255
        inet6 fe80::a00:27ff:fe83:429  prefixlen 64  scopeid 0x20<link>
        
enp0s9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.2.141  netmask 255.255.255.0  broadcast 192.168.2.255

Note device emp0s3 has no IP V4 address assigned ! 
        
[root@ractw21 ~]# nslookup ractw21
Server:        192.168.5.50
Address:    192.168.5.50#53

Name:    ractw21.example.com
Address: 192.168.5.141

Perpare and verify Swap Space and /u01 pratition

Follow article :   Prepare Swap and /u01 Oracle partition

Verify Space and Memory
[root@ractw21 ~]# df / /u01
Filesystem                  1K-blocks    Used Available Use% Mounted on
/dev/mapper/ol_ractw21-root  15718400 9381048   6337352  60% /
/dev/mapper/ol_ractw21-u01   15718400   39088  15679312   1% /u01
[root@ractw21 ~]# free
              total        used        free      shared  buff/cache   available
Mem:        5700592      595844     4515328       13108      589420     5001968
Swap:       8257532           0     8257532

Note as we can only provide 6 MByte memory the cluvfy Physical Memory Test will fail !

Switching between RAC and Internet ACCESS Networking

RAC Networking
 - enp0s3 down - No IP V4 address assigned 
 - /etc/resolv.conf points to our RAC Nameserver 192.168.5.50
 - RAC Nameserver is working 
 - BUT Internet access is disabled  

[root@ractw21 ~]# ifdown enp0s3
Device 'enp0s3' successfully disconnected.
[root@ractw21 ~]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 08:00:27:c6:ab:10  txqueuelen 1000  (Ethernet)

[root@ractw21 ~]# ping google.de
connect: Network is unreachable

[root@ractw21 ~]# nslookup ractw21
Server:        192.168.5.50
Address:    192.168.5.50#53

Name:    ractw21.example.com
Address: 192.168.5.141

Internet Access Networking
Note we may need to have Internet access for running yum, ...
 - activate enp0s3  
 - /etc/resolv.conf points to our Router Gateway  192.168.1.1
 - RAC Nameserver is NOT working 
 - Internet access is enabled  

[root@ractw21 ~]# ifup enp0s3
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
[root@ractw21 ~]# ifconfig enp0s3
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
Verify 
[root@ractw21 ~]#   ping google.de
PING google.de (172.217.18.3) 56(84) bytes of data.
64 bytes from fra15s28-in-f3.1e100.net (172.217.18.3): icmp_seq=1 ttl=46 time=45.0 ms
64 bytes from fra15s28-in-f3.1e100.net (172.217.18.3): icmp_seq=2 ttl=46 time=60.8 ms

nslookup for our RAC IP fails - this is expected
[root@ractw21 ~]# nslookup ractw21
Server:        192.168.1.1
Address:    192.168.1.1#53
** server can't find ractw21: NXDOMAIN

Note if ping google.de does not work you may need to add following entrry to 
/etc/sysconfig/network
    GATEWAYDEV=enp0s3  

After reboot ping and route should work fine
[root@ractw21 ~]# ip route
default via 10.0.2.2 dev enp0s3  proto static  metric 100 
10.0.2.0/24 dev enp0s3  proto kernel  scope link  src 10.0.2.15  metric 100 
192.168.2.0/24 dev enp0s9  proto kernel  scope link  src 192.168.2.141  metric 100 
192.168.5.0/24 dev enp0s8  proto kernel  scope link  src 192.168.5.141  metric 100 
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 

[root@ractw21 ~]# traceroute google.de
traceroute to google.de (172.217.22.35), 30 hops max, 60 byte packets
 1  gateway (10.0.2.2)  0.164 ms  0.195 ms  0.074 ms
 2  gateway (10.0.2.2)  1.156 ms  1.917 ms  1.832 ms

Create and Verify User accounts

[root@ractw21 ~]# /usr/sbin/groupadd -g 1001 oinstall
[root@ractw21 ~]# /usr/sbin/groupadd -g 1002 dba
[root@ractw21 ~]# /usr/sbin/groupadd -g 1004 asmadmin
[root@ractw21 ~]# /usr/sbin/groupadd -g 1006 asmdba
[root@ractw21 ~]# /usr/sbin/groupadd -g 1007 asmoper

Create the users that will own the Oracle software using the commands:
[root@ractw21 ~]# /usr/sbin/useradd -u 1001 -g oinstall -G asmadmin,asmdba,asmoper grid
[root@ractw21 ~]#  /usr/sbin/useradd -u 1002 -g oinstall -G dba,asmdba oracle

[root@ractw21 ~]# usermod -G vboxsf -a grid
[root@ractw21 ~]# usermod -G vboxsf -a oracle
[root@ractw21 ~]# passwd oracle
[root@ractw21 ~]# passwd grid

[root@ractw21 ~]# su - grid
[grid@ractw21 ~]$ id
uid=1001(grid) gid=1001(oinstall) groups=1001(oinstall),983(vboxsf),1004(asmadmin),1006(asmdba),1007(asmoper) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[root@ractw21 ~]# su - oracle
[oracle@ractw21 ~]$ id
uid=1002(oracle) gid=1001(oinstall) groups=1001(oinstall),983(vboxsf),1002(dba),1006(asmdba) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Creating Installation directories

Oracle GridHome restrictions

    It must not be placed under one of the Oracle base directories, including the Oracle base 
    directory of the Oracle Grid Infrastructure installation owner.

    It must not be placed in the home directory of an installation owner. These requirements 
    are specific to Oracle Grid Infrastructure for a cluster installations.

Create Directories:
 - Have a separate ORACLE_BASE for both GRID and RDBMS install !

Create the Oracle Inventory Directory
To create the Oracle Inventory directory, enter the following commands as the root user:
  [root@ractw21 ~]#  mkdir -p /u01/app/oraInventory
  [root@ractw21 ~]# chown -R grid:oinstall /u01/app/oraInventory

Creating the Oracle Grid Infrastructure Home Directory
To create the Grid Infrastructure home directory, enter the following commands as the root user:

Creating the Oracle Base Directory for Grid User
  [root@ractw21 ~]# mkdir -p /u01/app/grid
  [root@ractw21 ~]# chown -R grid:oinstall /u01/app/grid
  [root@ractw21 ~]# chmod -R 775 /u01/app/grid

Creating  GridHome 
  [root@ractw21 ~]#  mkdir -p /u01/app/122/grid
  [root@ractw21 ~]# chown -R grid:oinstall /u01/app/122/grid
  [root@ractw21 ~]# chmod -R 775 /u01/app/122/grid

Creating the Oracle Base Directory for Database Software 
To create the Oracle Base directory, enter the following commands as the root user:
  [root@ractw21 ~]# mkdir -p /u01/app/oracle
  [root@ractw21 ~]# chown -R oracle:oinstall /u01/app/oracle
  [root@ractw21 ~]# chmod -R 775 /u01/app/oracle

Create file /etc/oraInst.loc wiht following content

<inventory_loc=/u01/app/oraInventory
inst_group=oinstall

Modify .bashrc scripts for users oracle and grid

Add following lines to grid user .bashrc script
export ORACLE_BASE=/u01/app/grid
export ORACLE_SID=+ASM1
export GRID_HOME=/u01/app/122/grid
export ORACLE_HOME=$GRID_HOME
export PATH=$ORACLE_HOME/bin:.:$PATH
export HOST=`/bin/hostname`
alias h=history
alias log='cd $ORACLE_BASE/diag/crs/hract21/crs/trace'
alias trc='cd $ORACLE_BASE/diag/crs/hract21/crs/trace'
#unalias ls 
alias sys='sqlplus / as sysdba'
alias sql='sqlplus scott/tiger'

Add following lines to grid user .bashrc script

export ORACLE_BASE=/u01/app/oracle
export ORACLE_SID=ractw21
export ORACLE_HOME=/u01/app/oracle/product/122/ractw2
export PATH=$ORACLE_HOME/bin:.:$PATH
export  LD_LIBRARY_PATH=$ORACLE_HOME/lib:.
export HOST=`/bin/hostname`
alias h=history
#unalias ls 
alias sys='sqlplus / as sysdba'
alias sql='sqlplus scott/tiger'

Verify our  newly created VM with cluvfy

 Prepare 12.2 Clufy Usage and Cluvfy log File Location


Create a shared Folder and mount the Installation media
[root@ractw21 ~]# mkdir /kits
[root@ractw21 ~]# mount -t vboxsf kits /kits
[root@ractw21 ~]# su - grid

Extract the GRID Zip file on your VBox Host Computer ot get access to Cluvfy Tool  ./runcluvfy.sh

The runcluvfy.sh script contains temporary variable definitions which enable it to 
run before you install Oracle Grid Infrastructure or Oracle Database. 
After you install Oracle Grid Infrastructure, use the cluvfy command to check 
prerequisites and perform other system readiness checks.

Add the proper protections for grid and oracle user  to get access to the GRID media
[root@ractw21 ~]# usermod -G vboxsf -a grid
[root@ractw21 ~]# usermod -G vboxsf -a oracle


Run Cluvfy from 12.2 Installation Media and print the cluvfy version number 

Clufy logs [ very usefull ] are found under $ORACLE_BASE of our grid user 
[grid@ractw21 ~]$ ls -ld $ORACLE_BASE
drwxrwxr-x. 2 grid oinstall 6 Apr 18 14:47 /u01/app/grid

[grid@ractw21 linuxx64_12201_grid_home]$ cd  /media/sf_kits/Oracle/122/linuxx64_12201_grid_home
[grid@ractw21 linuxx64_12201_grid_home]$  ./runcluvfy.sh -version
12.2.0.1.0 Build 010417x8664

Run cluvfy a first time  with -fixup options

Run cluvfy a first time  with -fixup options before we are cloning the system 
[grid@ractw21 linuxx64_12201_grid_home]$  cd /media/sf_kits/Oracle/122/linuxx64_12201_grid_home

[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh stage -pre crsinst -fixup -n  ractw21
..
Got a lot of Errors on the first run !

Execute fixup script "/tmp/CVU_12.2.0.1.0_grid/runfixup.sh" as root user on nodes "ractw21" 
to perform the fix up operations manually

Rerunning clufy ends up with ONLY a following Error / Warning 
[grid@ractw21 linuxx64_12201_grid_home]$  ./runcluvfy.sh stage -pre crsinst -fixup -n  ractw21

..
Verifying Physical Memory ...FAILED
ractw21: PRVF-7530 : Sufficient physical memory is not available on node
         "ractw21" [Required physical memory = 8GB (8388608.0KB)]

-
CVU operation performed:      stage -pre crsinst
Date:                         Apr 8, 2017 7:40:07 AM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

PRVF-7530  : I could not fix this as I have only 16 GByte memory - but I've spend at least 6 GByte for the RAC VMs


Cluvfy Logs can be found below:  ORACLE_BASE=/u01/app/grid
      /u01/app/grid/crsdata/ractw21/cvu

Testing the DHCP connectivty with cluvfy

Testing  DHCP connectivty with cluvfy: 
[grid@ractw21 linuxx64_12201_grid_home]$ runcluvfy.sh comp dhcp -clustername  ractw2 -method root
Enter "ROOT" password:

Verifying Task DHCP configuration check ...
  Verifying IP address availability ...PASSED
  Verifying DHCP response time ...PASSED
Verifying Task DHCP configuration check ...PASSED

Verification of DHCP Check was successful. 

CVU operation performed:      DHCP Check
Date:                         Apr 11, 2017 1:25:39 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Our NameServer should log the following in /var/log/messages
Apr 11 13:26:07 ns1 dhcpd: DHCPOFFER on 192.168.5.198 to 00:00:00:00:00:00 via eth1
Apr 11 13:26:07 ns1 dhcpd: DHCPRELEASE of 192.168.5.198 from 00:00:00:00:00:00 via eth1 (found)
Apr 11 13:26:12 ns1 dhcpd: DHCPRELEASE of 192.168.5.198 from 00:00:00:00:00:00 via eth1 (found)
Apr 11 13:26:13 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth1
Apr 11 13:26:13 ns1 dhcpd: DHCPOFFER on 192.168.5.252 to 00:00:00:00:00:00 via eth1

Verify DHCP Connectivity at OS Level

Verify DHCP Connectivity at OS Level
[root@ractw21 OEL73]# dhclient enp0s8

Check messages file on our RAC system
[root@ractw21 ~]#   tail -f /var/log/messages
Apr 11 13:11:21 ractw21 dhclient[10998]: DHCPREQUEST on enp0s8 to 255.255.255.255 port 67 (xid=0x43037720)
Apr 11 13:11:21 ractw21 dhclient[10998]: DHCPACK from 192.168.5.50 (xid=0x43037720)
Apr 11 13:11:21 ractw21 NET[11077]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Apr 11 13:11:21 ractw21 dhclient[10998]: bound to 192.168.5.218 -- renewal in 8225 second

--> Our NameServer sucessfully answered with an DHCP address !

Cleanup OS test by Killing the dhclient process 
[root@ractw21 OEL73]#  ps -efl | grep dhclient
1 S root     11106     1  0  80   0 - 28223 poll_s 13:11 ?        00:00:00 dhclient enp0s8
[root@ractw21 OEL73]# kill -9 11106

Verify GNS Setup

Name Server Entry for GNS
$ORIGIN grid122.example.com.
@       IN          NS        gns122.grid122.example.com. ; NS  grid.example.com
        IN          NS        ns1.example.com.      ; NS example.com
gns122  IN          A         192.168.5.59 ; glue record

Verify  whether UDP port 53 is in use on our RAC VM
[root@ractw21 ~]# lsof -P  -i:53 
dnsmasq 2507 nobody    5u  IPv4  22448      0t0  UDP ractw21.example.com:53 
dnsmasq 2507 nobody    6u  IPv4  22449      0t0  TCP ractw21.example.com:53 (LISTEN)

Run cluvfy - should fail  !
[grid@ractw21 linuxx64_12201_grid_home]$  runcluvfy.sh  comp gns -precrsinst -domain grid122.example.com  -vip 192.168.5.59 -verbose 
Verifying GNS Integrity ...
  Verifying subdomain is a valid name ...PASSED
  Verifying GNS VIP is a valid address ...PASSED
  Verifying Port 53 available for component 'GNS' ...FAILED (PRVG-0330)
Verifying GNS Integrity ...FAILED (PRVG-0330)
Verification of GNS integrity was unsuccessful on all the specified nodes. 
Failures were encountered during execution of CVU verification request "GNS integrity".
Verifying GNS Integrity ...FAILED
  Verifying Port 53 available for component 'GNS' ...FAILED
  ractw21: PRVG-0330 : "UDP" port number "53" required for component "GNS" is
           already in use on nodes "ractw21"

As expected cluvfy fails wiht  PRVG-0330 !
 
You may read article to identify the service which occupies port #53:
  What process/service occupies a certain port like port 53 ? 

Having done this you may now disable the related service
[root@ractw21 ~]# systemctl stop libvirtd.service
[root@ractw21 ~]# systemctl disable libvirtd.service
You may need to kill the remaining /sbin/dnsmasq  processess
[root@ractw21 ~]#  ps -elf | grep dnsmasq
5 S nobody   17124     1  0  80   0 -  3888 poll_s 07:59 ?        00:00:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
1 S root     17126 17124  0  80   0 -  3881 pipe_w 07:59 ?        00:00:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
0 R root     18099 11262  0  80   0 - 28162 -      08:20 pts/4    00:00:00 grep --color=auto dnsmasq
[root@ractw21 ~]# kill -9 17124 17126

Note soemtimes dnsmasq service blocks this port. In that case run
[root@ractw21 ~]# systemctl stop dnsmasq.service
[root@ractw21 ~]# systemctl disable dnsmasq.service

Now cluvfy GNS verification should run fine 
[grid@ractw21 linuxx64_12201_grid_home]$ runcluvfy.sh  comp gns -precrsinst -domain grid122.example.com  -vip 192.168.5.59 -verbose
Verifying GNS Integrity ...
  Verifying subdomain is a valid name ...PASSED
  Verifying GNS VIP is a valid address ...PASSED
  Verifying Port 53 available for component 'GNS' ...PASSED
Verifying GNS Integrity ...PASSED

Verification of GNS integrity was successful. 

CVU operation performed:      GNS integrity
Date:                         Apr 11, 2017 12:38:29 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Testing subdomain delegation with cluvfy

Start the DNS server 
[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh comp dns -server -domain grid122.example.com -vipaddress 192.168.5.59/255.255.255.0/enp0s8 -verbose -method root
Enter "ROOT" password:

Verifying Task DNS configuration check ...
Waiting for DNS client requests...

--> Server blocks here

Note : If the /runcluvfy.sh comp dns -server doesn't block [ means cluvfy returns to command prompt ] you may read the following article :     
Common cluvfy errors and warnings including first debugging steps  
  Section:: Running cluvfy comp dns -server fails silent – Cluvfy logs show PRCZ-2090 error

Now Start the  DNS client
[grid@ractw21 linuxx64_12201_grid_home]$ runcluvfy.sh comp  dns -client -domain   grid122.example.com -vip  192.168.5.59  -method root -verbose
Enter "ROOT" password:
Verifying Task DNS configuration check ...PASSED
Verification of DNS Check was successful. 
CVU operation performed:      DNS Check
Date:                         Apr 14, 2017 6:18:49 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Run DNS client again and terminate DNS server by using -last swich
[grid@ractw21 linuxx64_12201_grid_home]$  runcluvfy.sh comp  dns -client -domain   grid122.example.com -vip  192.168.5.59  -method root -verbose -last
...
Now server responds but terminates after servicing the request 
Verifying Task DNS configuration check ...PASSED
Verification of DNS Check was successful. 
CVU operation performed:      DNS Check
Date:                         Apr 14, 2017 6:17:53 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Current Status

  • At this time we have created a base system which we will use for cloning
  • Following cluvfy commands runs sucessfully
  • Only the first command fails with PRVF-7530 : Sufficient physical memory is not available on node
[grid@ractw21 linuxx64_12201_grid_home]$ cd  /media/sf_kits/Oracle/122/linuxx64_12201_grid_home
[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh stage -pre crsinst -fixup -n  ractw21
[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh  comp gns -precrsinst -domain grid122.example.com  -vip 192.168.5.59 -verbose
[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh comp dhcp -clustername  ractw2 -method root

[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh comp dns -server -domain grid122.example.com -vipaddress 192.168.5.59/255.255.255.0/enp0s8 -verbose -method root
[grid@ractw21 linuxx64_12201_grid_home]$ ./runcluvfy.sh comp  dns -client -domain   grid122.example.com -vip  192.168.5.59  -method root -verbose -last
  • Its now time to shutdown the VM and  do backup  VM and Vbox file

Clone ractw21 system

You man change in File-> Preference the default machine path first  
M:\VM\RACTW2

Cloning ractw21 :
Now cleanly shutdown your Reference/Clone system 
Virtualbox -> Clone [ Name clone ractw22 ]  ->  add new Network Addresses -> Full Clone 


Boot ractw22 VM a first time - retrieve the new MAC addresses 
[root@ractw22 ~]#  dmesg |grep eth
[    8.153534] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 08:00:27:dd:af:2c
[    8.153539] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[    8.565884] e1000 0000:00:08.0 eth1: (PCI:33MHz:32-bit) 08:00:27:05:1d:93
[    8.565888] e1000 0000:00:08.0 eth1: Intel(R) PRO/1000 Network Connection
[    8.956019] e1000 0000:00:09.0 eth2: (PCI:33MHz:32-bit) 08:00:27:7b:22:80
[    8.956023] e1000 0000:00:09.0 eth2: Intel(R) PRO/1000 Network Connection
[    8.957249] e1000 0000:00:03.0 enp0s3: renamed from eth0
[    8.966330] e1000 0000:00:09.0 enp0s9: renamed from eth2
[    8.983008] e1000 0000:00:08.0 enp0s8: renamed from eth1

Change and verify the NEW hostname 
[root@ractw21 ~]# hostnamectl set-hostname ractw22.example.com  --static
[root@ractw21 ~]# hostnamectl status
   Static hostname: ractw22.example.com
Transient hostname: ractw21.example.com
         Icon name: computer-vm
           Chassis: vm
        Machine ID: c173da783e4c45eab1402e69f783ce10
           Boot ID: 3dad3ed3b1f345d3b7fee60778257cd6
    Virtualization: kvm
  Operating System: Oracle Linux Server 7.3
       CPE OS Name: cpe:/o:oracle:linux:7:3:server
            Kernel: Linux 4.1.12-61.1.33.el7uek.x86_64
      Architecture: x86-64


Go to /etc/sysconfig/network-scripts and change IP addresses 
file "ifcfg-enp0s8" IPADDR=192.168.5.142
     "ifcfg-enp0s9" IPADDR=192.168.2.142

Restart the Network 
[root@ractw22 network-scripts]# service network restart

Verify that our Network devices has the proper setttings
[root@ractw22 ~]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 08:00:27:dd:af:2c  txqueuelen 1000  (Ethernet)
       
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.5.142  netmask 255.255.255.0  broadcast 192.168.5.255

enp0s9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.2.142  netmask 255.255.255.0  broadcast 192.168.2.255

Note as mentioned before enp0s3 ( our NAT device ) is disabled and has no IP V4 address

Verify RAC nameserver connectivity
[root@ractw22 ~]# nslookup ractw22
Server:        192.168.5.50
Address:    192.168.5.50#53

Name:    ractw22.example.com
Address: 192.168.5.142
 
Now boot rawtw21 VM and check IP connectivty
[root@ractw22 ~]#  ping ractw21
PING ractw21.example.com (192.168.5.141) 56(84) bytes of data.
64 bytes from ractw21.example.com (192.168.5.141): icmp_seq=1 ttl=64 time=0.663 ms
 64 bytes from ractw21.example.com (192.168.5.141): icmp_seq=2 ttl=64 time=0.309 ms
..
[root@ractw22 ~]# ping ractw21int
PING ractw21int.example.com (192.168.2.141) 56(84) bytes of data.
64 bytes from ractw21int.example.com (192.168.2.141): icmp_seq=1 ttl=64 time=0.210 ms
64 bytes from ractw21int.example.com (192.168.2.141): icmp_seq=2 ttl=64 time=0.264 ms
...
Check disk space and memory 
[root@ractw22 ~]# free
              total        used        free      shared  buff/cache   available
Mem:        5700592      528364     4684328       11256      487900     5083212
Swap:       8257532           0     8257532
[root@ractw22 ~]# df / /u01
Filesystem                  1K-blocks    Used Available Use% Mounted on
/dev/mapper/ol_ractw21-root  15718400 9381788   6336612  60% /
/dev/mapper/ol_ractw21-u01   15718400   46552  15671848   1% /u01


Login into ractw21
[root@ractw21 ~]# ping ractw22
PING ractw22.example.com (192.168.5.142) 56(84) bytes of data.
64 bytes from ractw22.example.com (192.168.5.142): icmp_seq=1 ttl=64 time=0.216 ms
64 bytes from ractw22.example.com (192.168.5.142): icmp_seq=2 ttl=64 time=0.303 ms
...

[root@ractw21 ~]#  ping ractw22int
PING ractw22int.example.com (192.168.2.142) 56(84) bytes of data.
64 bytes from ractw22int.example.com (192.168.2.142): icmp_seq=1 ttl=64 time=0.266 ms
64 bytes from ractw22int.example.com (192.168.2.142): icmp_seq=2 ttl=64 time=0.445 ms

Create and attach ASM Disks

Create ASM disks 

M:\VM>cd RACTW2
M:\VM\RACTW2>VBoxManage createhd --filename M:\VM\RACTW2\asm1_122_20G.vdi --size 20480 --format VDI --variant Fixed
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Medium created. UUID: 886d76b0-eeb0-43e4-9ab2-bd7d9e8ad879

M:\VM\RACTW2>VBoxManage createhd --filename M:\VM\RACTW2\asm2_122_20G.vdi --size 20480 --format VDI --variant Fixed
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Medium created. UUID: be823513-0d1c-4b25-af29-fe504e28e910

M:\VM\RACTW2>VBoxManage createhd --filename M:\VM\RACTW2\asm3_122_20G.vdi --size 20480 --format VDI --variant Fixed
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Medium created. UUID: 3238860f-a90f-459c-965b-f93773f2ad38

M:\VM\RACTW2>VBoxManage createhd --filename M:\VM\RACTW2\asm4_122_20G.vdi --size 20480 --format VDI --variant Fixed
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Medium created. UUID: 185218e4-a7f6-4d29-ae7c-dc7499b05238

M:\VM\RACTW2>VBoxManage modifyhd  asm1_122_20G.vdi  --type shareable
M:\VM\RACTW2>VBoxManage modifyhd  asm2_122_20G.vdi  --type shareable
M:\VM\RACTW2>VBoxManage modifyhd  asm3_122_20G.vdi  --type shareable
M:\VM\RACTW2>VBoxManage modifyhd  asm4_122_20G.vdi  --type shareable

Attach Disks to ractw21
M:\VM\RACTW2>VBoxManage storageattach ractw21 --storagectl "SATA" --port 1 --device 0 --type hdd --medium asm1_122_20G.vdi  --mtype shareable
M:\VM\RACTW2>VBoxManage storageattach ractw21 --storagectl "SATA" --port 2 --device 0 --type hdd --medium asm2_122_20G.vdi  --mtype shareable
M:\VM\RACTW2>VBoxManage storageattach ractw21 --storagectl "SATA" --port 3 --device 0 --type hdd --medium asm3_122_20G.vdi  --mtype shareable
M:\VM\RACTW2>VBoxManage storageattach ractw21 --storagectl "SATA" --port 4 --device 0 --type hdd --medium asm4_122_20G.vdi  --mtype shareable


Attach the ASM disks to ractw22
M:\VM\RACTW2>VBoxManage storageattach ractw22 --storagectl "SATA" --port 1 --device 0 --type hdd --medium asm1_122_20G.vdi  --mtype shareable
M:\VM\RACTW2>VBoxManage storageattach ractw22 --storagectl "SATA" --port 2 --device 0 --type hdd --medium asm2_122_20G.vdi  --mtype shareable
M:\VM\RACTW2>VBoxManage storageattach ractw22 --storagectl "SATA" --port 3 --device 0 --type hdd --medium asm3_122_20G.vdi  --mtype shareable
M:\VM\RACTW2>VBoxManage storageattach ractw22 --storagectl "SATA" --port 4 --device 0 --type hdd --medium asm4_122_20G.vdi  --mtype shareable

Use parted to create a single disk partition
root@ractw21 ~]# parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Error: /dev/sdb: unrecognised disk label
Model: ATA VBOX HARDDISK (scsi)                                           
Disk /dev/sdb: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

(parted) mklabel msdos
(parted) print                                                            
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sdb: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start  End  Size  Type  File system  Flags

(parted) mkpart                                                           
Partition type?  primary/extended? primary                                
File system type?  [ext2]?                                                
Start?                                                                  
End? -1                                                                   
(parted) print                                                            
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sdb: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start   End     Size    Type     File system  Flags
 1      1049kB  21.5GB  21.5GB  primary

Display in disk sectors
(parted) unit s print
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sdb: 41943040s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start  End        Size       Type     File system  Flags
 1      2048s  41940991s  41938944s  primary

Repeat this for /dev/sdc /dev/sdd and /dev/sde

Create following bash script
[root@ractw21 ~]# cat ./check_wwid.sh 
#!/bin/bash
#
#Usage:  As root user run : ./check_wwid.sh 
#
for FILE in `find /dev -name "sd*" | sort`
   do
     WWID=`/lib/udev/scsi_id --whitelisted --replace-whitespace --device=$FILE `
     echo $FILE " WWID:  "  $WWID
   done

Display the disk UUIDs
[root@ractw21 ~]# ./check_wwid.sh 
/dev/sda  WWID:   1ATA_VBOX_HARDDISK_VB9a03ce8e-7e44db17
/dev/sda1  WWID:   1ATA_VBOX_HARDDISK_VB9a03ce8e-7e44db17
/dev/sda2  WWID:   1ATA_VBOX_HARDDISK_VB9a03ce8e-7e44db17
/dev/sdb  WWID:   1ATA_VBOX_HARDDISK_VB4f9101ae-136f6f41
/dev/sdb1  WWID:   1ATA_VBOX_HARDDISK_VB4f9101ae-136f6f41
/dev/sdc  WWID:   1ATA_VBOX_HARDDISK_VBe92a2992-3bbc2c78
/dev/sdc1  WWID:   1ATA_VBOX_HARDDISK_VBe92a2992-3bbc2c78
/dev/sdd  WWID:   1ATA_VBOX_HARDDISK_VBed1a6a99-97750413
/dev/sdd1  WWID:   1ATA_VBOX_HARDDISK_VBed1a6a99-97750413
/dev/sde  WWID:   1ATA_VBOX_HARDDISK_VBc47b21a6-a7b01781
/dev/sde1  WWID:   1ATA_VBOX_HARDDISK_VBc47b21a6-a7b01781



Add the following to the "/etc/scsi_id.config" file to configure SCSI devices as trusted. Create the file if it doesn't already exist.
options=-g


[root@ractw21 ~]# /sbin/partprobe /dev/sdb1
[root@ractw21 ~]# /sbin/partprobe /dev/sdc1
[root@ractw21 ~]# /sbin/partprobe /dev/sdd1
[root@ractw21 ~]# /sbin/partprobe /dev/sde1

[root@ractw21 ~]# /sbin/udevadm control --reload-rules

[root@ractw21 ~]#  ls -l /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
brw-rw----. 1 grid   asmadmin 8, 17 Apr 19 15:53 /dev/sdb1
brw-rw----. 1 grid   asmadmin 8, 33 Apr 19 15:53 /dev/sdc1
brw-rw----. 1 grid   asmadmin 8, 49 Apr 19 15:53 /dev/sdd1
brw-rw----. 1 grid   asmadmin 8, 65 Apr 19 15:53 /dev/sde1

[root@ractw21 ~]#  ls -l /dev/oracleasm
total 0
lrwxrwxrwx. 1 root root 7 Apr 19 15:53 asmdisk1_sdb1 -> ../sdb1
lrwxrwxrwx. 1 root root 7 Apr 19 15:53 asmdisk2_sdc -> ../sdc1
lrwxrwxrwx. 1 root root 7 Apr 19 15:53 asmdisk3_sdd -> ../sdd1
lrwxrwxrwx. 1 root root 7 Apr 19 15:53 asmdisk_sde -> ../sde1


Copy scsi_id.config 99-oracle-asmdevices.rules to ractw22 
[root@ractw21 ~]# scp  /etc/scsi_id.config ractw22:/etc
scsi_id.config                                                                                    100%   11     0.0KB/s   00:00    
[root@ractw21 ~]# scp  /etc/udev/rules.d/99-oracle-asmdevices.rules  ractw22:/etc/udev/rules.d/
99-oracle-asmdevices.rules                                                                        100%  887     0.9KB/s   00:00 

Reboot ractw22 and verify ASM disks 
[root@ractw22 ~]#  ls -l /dev/oracleasm
total 0
lrwxrwxrwx. 1 root root 7 Apr 19 16:06 asmdisk1_sdb1 -> ../sdb1
lrwxrwxrwx. 1 root root 7 Apr 19 16:06 asmdisk2_sdc1 -> ../sdc1
lrwxrwxrwx. 1 root root 7 Apr 19 16:06 asmdisk3_sdd1 -> ../sdd1
lrwxrwxrwx. 1 root root 7 Apr 19 16:06 asmdisk_sde1 -> ../sde1

[root@ractw22 ~]# ls -l /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
brw-rw----. 1 grid asmadmin 8, 17 Apr 19 16:06 /dev/sdb1
brw-rw----. 1 grid asmadmin 8, 33 Apr 19 16:06 /dev/sdc1
brw-rw----. 1 grid asmadmin 8, 49 Apr 19 16:06 /dev/sdd1
brw-rw----. 1 grid asmadmin 8, 65 Apr 19 16:06 /dev/sde1

Setup ssh connectivity for grid user with cluvfy

Cleanup .shh directories which may have inconsistent data due to cloning
On Note ractw21
[grid@ractw21 linuxx64_12201_grid_home]$  rm -rf /home/grid/.ssh
[grid@ractw21 linuxx64_12201_grid_home]$  ssh grid@ractw22

On Note ractw22
[grid@ractw22 linuxx64_12201_grid_home]$  rm -rf /home/grid/.ssh
[grid@ractw22 linuxx64_12201_grid_home]$  ssh grid@ractw21

Now run cluvfy with -fixup option
[grid@ractw21 linuxx64_12201_grid_home]$ runcluvfy.sh comp admprv -n "ractw21,ractw22" -o user_equiv -verbose -fixup
Verifying User Equivalence ...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  ractw22                               failed                  
Verifying User Equivalence ...FAILED (PRVG-2019, PRKN-1038)

Verification of administrative privileges was unsuccessful. 
Checks did not pass for the following nodes:
    ractw22
Failures were encountered during execution of CVU verification request "administrative privileges".

Verifying User Equivalence ...FAILED
ractw22: PRVG-2019 : Check for equivalence of user "grid" from node "ractw21"
         to node "ractw22" failed
         PRKN-1038 : The command "/usr/bin/ssh -o FallBackToRsh=no  -o
         PasswordAuthentication=no  -o StrictHostKeyChecking=yes  -o
         NumberOfPasswordPrompts=0  ractw22 -n /bin/true" run on node "ractw21"
         gave an unexpected output: "Permission denied
         (publickey,gssapi-keyex,gssapi-with-mic,password)."

CVU operation performed:      administrative privileges
Date:                         Apr 19, 2017 5:05:53 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid
Setting up SSH user equivalence for user "grid" between nodes "ractw21,ractw22"
Enter "grid" password:
SSH user equivalence was successfully set up for user "grid" on nodes "ractw21,ractw22"

Now rerun the command :
[grid@ractw21 linuxx64_12201_grid_home]$  runcluvfy.sh comp admprv -n "ractw21,ractw22" -o user_equiv -verbose 
Verifying User Equivalence ...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  ractw22                               passed                  
  Verifying Checking user equivalence for user "grid" on all cluster nodes ...PASSED
  From node     To node                   Status                  
  ------------  ------------------------  ------------------------
  ractw22       ractw21                   SUCCESSFUL              
Verifying User Equivalence ...PASSED

Verification of administrative privileges was successful. 
CVU operation performed:      administrative privileges
Date:                         Apr 19, 2017 5:18:21 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Verify at OS level:
[grid@ractw21 linuxx64_12201_grid_home]$  ssh grid@ractw22
Last login: Wed Apr 19 17:14:22 2017 from ractw21.example.com

[grid@ractw22 ~]$ ssh grid@ractw21
Last login: Wed Apr 19 17:12:16 2017 from ractw22.example.com




Run cluvfy  a first time against both nodes

  • with Network  Connectivity checks
  • with ASM Disk checks
 
[grid@ractw21 ~]$  cd  /media/sf_kits/Oracle/122/linuxx64_12201_grid_home
[grid@ractw21 linuxx64_12201_grid_home]$ runcluvfy.sh stage -pre crsinst -asm -presence local -asmgrp asmadmin  \
>     -asmdev /dev/oracleasm/asmdisk1_sdb1,/dev/oracleasm/asmdisk2_sdc1,/dev/oracleasm/asmdisk3_sdd1,/dev/oracleasm/asmdisk4_sde1     \
>     -networks enp0s8:192.168.5.0:PUBLIC/enp0s9:192.168.2.0:cluster_interconnect  \
>     -n "ractw21,ractw22" 

Verifying Physical Memory ...FAILED (PRVF-7530)
Verifying Available Physical Memory ...PASSED
Verifying Swap Size ...PASSED
Verifying Free Space: ractw22:/usr,ractw22:/var,ractw22:/etc,ractw22:/sbin,ractw22:/tmp ...PASSED
Verifying Free Space: ractw21:/usr,ractw21:/var,ractw21:/etc,ractw21:/sbin,ractw21:/tmp ...PASSED
..
Verifying Node Connectivity ...
  Verifying Hosts File ...PASSED
  Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED
  Verifying subnet mask consistency for subnet "192.168.2.0" ...PASSED
  Verifying subnet mask consistency for subnet "192.168.5.0" ...PASSED
Verifying Node Connectivity ...PASSED
Verifying Multicast check ...PASSED
Verifying Device Checks for ASM ...
  Verifying ASM device sharedness check ...
    Verifying Package: cvuqdisk-1.0.10-1 ...PASSED
    Verifying Shared Storage Accessibility:/dev/oracleasm/asmdisk1_sdb1,/dev/oracleasm/asmdisk3_sdd1,/dev/oracleasm/asmdisk2_sdc1,/dev/oracleasm/asmdisk4_sde1 ...PASSED
  Verifying ASM device sharedness check ...PASSED
  Verifying Access Control List check ...PASSED
Verifying Device Checks for ASM ...PASSED
...

Verifying ASM Filter Driver configuration ...PASSED
Pre-check for cluster services setup was unsuccessful on all the nodes. 
Failures were encountered during execution of CVU verification request "stage -pre crsinst".
Verifying Physical Memory ...FAILED
ractw22: PRVF-7530 : Sufficient physical memory is not available on node
         "ractw22" [Required physical memory = 8GB (8388608.0KB)]

ractw21: PRVF-7530 : Sufficient physical memory is not available on node
         "ractw21" [Required physical memory = 8GB (8388608.0KB)]

CVU operation performed:      stage -pre crsinst
Date:                         Apr 19, 2017 6:12:00 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Again the above failures were expected due to memory constraints 

Install GRID software

Unzip and install GRID software

$ cd $GRID_HOME
$ unzip -q $SOFTWARE_LOCATION/linuxx64_12201_grid_home.zip

$ ./gridSetup.sh

-> Configure a standard cluster
-> Advanced Installation
   Cluster name : ractw2
   Scan name    : ractw2-scan
   Scan port    : 1521
   -> Create New GNS
      GNS VIP address: 192.168.5.59
      GNS Sub domain : grid122.example.com
  Public Hostname           Virtual Hostname 
  ractw21.example.com        AUTO
  ractw22.example.com        AUTO

-> Test and Setup SSH connectivity
-> Setup network Interfaces
   enp0s3: don't use
   enp0s8: PUBLIC                              192.168.5.X
   enp0s9: Private Cluster_Interconnect,ASM    192.168.2.X
 
-> Configure GRID Infrastruce: YES
-> Use standard ASM for storage
-> ASM setup
   Diskgroup         : DATA
   Disk discover PATH: /dev/asm*
--> Don't use IPMI

Run root scritps as requested by the installation script !
[root@ractw22 etc]# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.

[root@ractw21 etc]# /u01/app/122/grid/root.sh

[root@ractw22 etc]# /u01/app/122/grid/root.sh

Verify GRID Installation status

Verify Grid Installation Status using crsctl stat res 

[grid@ractw21 ~]$  crsi

*****  Local Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE       ONLINE       ractw21         STABLE  
ora.cluster_interconnect.haip  1   ONLINE       ONLINE       ractw21         STABLE  
ora.crf                        1   ONLINE       ONLINE       ractw21         STABLE  
ora.crsd                       1   ONLINE       ONLINE       ractw21         STABLE  
ora.cssd                       1   ONLINE       ONLINE       ractw21         STABLE  
ora.cssdmonitor                1   ONLINE       ONLINE       ractw21         STABLE  
ora.ctssd                      1   ONLINE       ONLINE       ractw21         OBSERVER,STABLE  
ora.diskmon                    1   OFFLINE      OFFLINE      -               STABLE  
ora.driver.afd                 1   ONLINE       ONLINE       ractw21         STABLE  
ora.drivers.acfs               1   ONLINE       ONLINE       ractw21         STABLE  
ora.evmd                       1   ONLINE       ONLINE       ractw21         STABLE  
ora.gipcd                      1   ONLINE       ONLINE       ractw21         STABLE  
ora.gpnpd                      1   ONLINE       ONLINE       ractw21         STABLE  
ora.mdnsd                      1   ONLINE       ONLINE       ractw21         STABLE  
ora.storage                    1   ONLINE       ONLINE       ractw21         STABLE  

[grid@ractw21 ~]$ crs
*****  Local Resources: *****
Rescource NAME                 TARGET     STATE           SERVER       STATE_DETAILS                       
-------------------------      ---------- ----------      ------------ ------------------                  
ora.ASMNET1LSNR_ASM.lsnr       ONLINE     ONLINE          ractw21      STABLE   
ora.ASMNET1LSNR_ASM.lsnr       ONLINE     ONLINE          ractw22      STABLE   
ora.DATA.dg                    ONLINE     ONLINE          ractw21      STABLE   
ora.DATA.dg                    ONLINE     ONLINE          ractw22      STABLE   
ora.LISTENER.lsnr              ONLINE     ONLINE          ractw21      STABLE   
ora.LISTENER.lsnr              ONLINE     ONLINE          ractw22      STABLE   
ora.net1.network               ONLINE     ONLINE          ractw21      STABLE   
ora.net1.network               ONLINE     ONLINE          ractw22      STABLE   
ora.ons                        ONLINE     ONLINE          ractw21      STABLE   
ora.ons                        ONLINE     ONLINE          ractw22      STABLE   
ora.proxy_advm                 OFFLINE    OFFLINE         ractw21      STABLE   
ora.proxy_advm                 OFFLINE    OFFLINE         ractw22      STABLE   
*****  Cluster Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.LISTENER_SCAN1.lsnr        1   ONLINE       ONLINE       ractw22         STABLE  
ora.LISTENER_SCAN2.lsnr        1   ONLINE       ONLINE       ractw21         STABLE  
ora.LISTENER_SCAN3.lsnr        1   ONLINE       ONLINE       ractw21         STABLE  
ora.MGMTLSNR                   1   ONLINE       ONLINE       ractw21         169.254.51.105 192.1 68.2.141,STABLE
ora.asm                        1   ONLINE       ONLINE       ractw21         Started,STABLE  
ora.asm                        2   ONLINE       ONLINE       ractw22         Started,STABLE  
ora.asm                        3   OFFLINE      OFFLINE      -               STABLE  
ora.cvu                        1   ONLINE       ONLINE       ractw21         STABLE  
ora.gns                        1   ONLINE       ONLINE       ractw21         STABLE  
ora.gns.vip                    1   ONLINE       ONLINE       ractw21         STABLE  
ora.mgmtdb                     1   ONLINE       ONLINE       ractw21         Open,STABLE  
ora.qosmserver                 1   ONLINE       ONLINE       ractw21         STABLE  
ora.ractw21.vip                1   ONLINE       ONLINE       ractw21         STABLE  
ora.ractw22.vip                1   ONLINE       ONLINE       ractw22         STABLE  
ora.scan1.vip                  1   ONLINE       ONLINE       ractw22         STABLE  
ora.scan2.vip                  1   ONLINE       ONLINE       ractw21         STABLE  
ora.scan3.vip                  1   ONLINE       ONLINE       ractw21         STABLE

Verify Grid Installation Status by using cluvfy

[grid@ractw21 ~]$ cluvfy stage -post crsinst -n ractw21,ractw22

Verifying Node Connectivity ...
  Verifying Hosts File ...PASSED
  Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED
  Verifying subnet mask consistency for subnet "192.168.2.0" ...PASSED
  Verifying subnet mask consistency for subnet "192.168.5.0" ...PASSED
Verifying Node Connectivity ...PASSED
Verifying Multicast check ...PASSED
Verifying ASM filter driver configuration consistency ...PASSED
Verifying Time zone consistency ...PASSED
Verifying Cluster Manager Integrity ...PASSED
Verifying User Mask ...PASSED
Verifying Cluster Integrity ...PASSED
Verifying OCR Integrity ...PASSED
Verifying CRS Integrity ...
  Verifying Clusterware Version Consistency ...PASSED
Verifying CRS Integrity ...PASSED
Verifying Node Application Existence ...PASSED
Verifying Single Client Access Name (SCAN) ...
  Verifying DNS/NIS name service 'ractw2-scan.ractw2.grid122.example.com' ...
    Verifying Name Service Switch Configuration File Integrity ...PASSED
  Verifying DNS/NIS name service 'ractw2-scan.ractw2.grid122.example.com' ...PASSED
Verifying Single Client Access Name (SCAN) ...PASSED
Verifying OLR Integrity ...PASSED
Verifying GNS Integrity ...
  Verifying subdomain is a valid name ...PASSED
  Verifying GNS VIP belongs to the public network ...PASSED
  Verifying GNS VIP is a valid address ...PASSED
  Verifying name resolution for GNS sub domain qualified names ...PASSED
  Verifying GNS resource ...PASSED
  Verifying GNS VIP resource ...PASSED
Verifying GNS Integrity ...PASSED
Verifying Voting Disk ...PASSED
Verifying ASM Integrity ...
  Verifying Node Connectivity ...
    Verifying Hosts File ...PASSED
    Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED
    Verifying subnet mask consistency for subnet "192.168.2.0" ...PASSED
    Verifying subnet mask consistency for subnet "192.168.5.0" ...PASSED
  Verifying Node Connectivity ...PASSED
Verifying ASM Integrity ...PASSED
Verifying Device Checks for ASM ...PASSED
Verifying ASM disk group free space ...PASSED
Verifying I/O scheduler ...
  Verifying Package: cvuqdisk-1.0.10-1 ...PASSED
Verifying I/O scheduler ...PASSED
Verifying User Not In Group "root": grid ...PASSED
Verifying Clock Synchronization ...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP

  Verifying Network Time Protocol (NTP) ...
    Verifying '/etc/chrony.conf' ...PASSED
    Verifying '/var/run/chronyd.pid' ...PASSED
    Verifying Daemon 'chronyd' ...PASSED
    Verifying NTP daemon or service using UDP port 123 ...PASSED
    Verifying chrony daemon is synchronized with at least one external time source ...PASSED
  Verifying Network Time Protocol (NTP) ...PASSED
Verifying Clock Synchronization ...PASSED
Verifying Network configuration consistency checks ...PASSED
Verifying File system mount options for path GI_HOME ...PASSED

Post-check for cluster services setup was successful. 

CVU operation performed:      stage -post crsinst
Date:                         Apr 20, 2017 10:00:30 AM
CVU home:                     /u01/app/122/grid/
User:                         grid

Verify GNS installation with OS tools

[root@ractw22 ~]#  dig @192.168.5.50 ractw2-scan.grid122.example.com
...
;ractw2-scan.grid122.example.com. IN    A

;; ANSWER SECTION:
ractw2-scan.grid122.example.com. 0 IN    A    192.168.5.223
ractw2-scan.grid122.example.com. 0 IN    A    192.168.5.221
ractw2-scan.grid122.example.com. 0 IN    A    192.168.5.222

;; AUTHORITY SECTION:
grid122.example.com.    3600    IN    NS    ns1.example.com.
grid122.example.com.    3600    IN    NS    gns122.grid122.example.com.

;; ADDITIONAL SECTION:
ns1.example.com.    3600    IN    A    192.168.5.50

;; Query time: 11 msec
;; SERVER: 192.168.5.50#53(192.168.5.50)
;; WHEN: Thu Apr 20 09:49:09 CEST 2017
;; MSG SIZE  rcvd: 163

[grid@ractw21 ~]$ nslookup ractw2-scan.grid122.example.com
Server:        192.168.5.50
Address:    192.168.5.50#53

Non-authoritative answer:
Name:    ractw2-scan.grid122.example.com
Address: 192.168.5.223
Name:    ractw2-scan.grid122.example.com
Address: 192.168.5.221
Name:    ractw2-scan.grid122.example.com
Address: 192.168.5.222

[grid@ractw21 ~]$  srvctl config gns -a -list
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: grid122.example.com
GNS version: 12.2.0.1.0
..
ractw2-scan.ractw2 A 192.168.5.221 Unique Flags: 0x81
ractw2-scan.ractw2 A 192.168.5.222 Unique Flags: 0x81
ractw2-scan.ractw2 A 192.168.5.223 Unique Flags: 0x81
ractw2-scan1-vip.ractw2 A 192.168.5.221 Unique Flags: 0x81
ractw2-scan2-vip.ractw2 A 192.168.5.223 Unique Flags: 0x81
..

Install RDBMS software

  • Extract database zip File on a VirtualBox Host
Setup ssh connectivity for oracle user by using sshUserSetup.sh

[oracle@ractw21 sshsetup]$ cd /media/sf_kits/ORACLE/122/linuxx64_12201_database/database/sshsetup
[oracle@ractw21 sshsetup]$ ./sshUserSetup.sh -user oracle -hosts "ractw21 ractw22" -advanced -exverify -noPromptPassphrase

Verify setup
[oracle@ractw22 ~]$ /usr/bin/ssh -x -l oracle ractw21
Last login: Thu Apr 20 13:32:51 2017

[oracle@ractw21 ~]$  /usr/bin/ssh -x -l oracle ractw22
Last login: Thu Apr 20 13:43:35 2017 from ractw21.example.com

In case trouble you may read following article :

Verify oracle account for rdbms installation with cluvfy

 
[oracle@ractw21 ~]$ /u01/app/122/grid/bin/cluvfy stage -pre  dbinst  -n  ractw21,ractw22 -d /u01/app/oracle/product/122/ractw2 -fixup

Verifying Physical Memory ...PASSED
Verifying Available Physical Memory ...PASSED
Verifying Swap Size ...PASSED
Verifying Free Space: ractw22:/u01/app/oracle/product/122/ractw2 ...PASSED
Verifying Free Space: ractw22:/tmp ...PASSED
Verifying Free Space: ractw21:/u01/app/oracle/product/122/ractw2 ...PASSED
Verifying Free Space: ractw21:/tmp ...PASSED
Verifying User Existence: oracle ...
  Verifying Users With Same UID: 1002 ...PASSED
...
Verifying Maximum locked memory check ...PASSED

Pre-check for database installation was successful. 

CVU operation performed:      stage -pre dbinst
Date:                         Apr 20, 2017 1:49:00 PM
CVU home:                     /u01/app/122/grid/
User:                         oracle

Install RDBMS software

Allow and check X-Windows Usage 
[root@ractw21 ~]# xhost +
access control disabled, clients can connect from any host
[oracle@ractw21 ~]$ export DISPLAY=:0.0
[oracle@ractw21 ~]$ xclock

[oracle@ractw21 ~]$ cd /media/sf_kits/ORACLE/122/linuxx64_12201_database/database/
[oracle@ractw21 database]$ ./runInstaller
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 500 MB.   Actual 5972 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 7977 MB    Passed
Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2017-04-20_02-01-56PM. Please wait ...

-> Select "Install database software only" option
-> Select "Oracle Real Application Clusters database installation" option 
-> Select Oracle Nodes: ractw21, ractw22
-> Test SSH Connectivity
-> Select Enterpris Edition
-> Seelct Group: dba 
-> Select ORACLE_BASE:  /u01/app/oracle
          ORACLE_HOME:  /u01/app/oracle/product/122/ractw2

-> Press Install Button 

-> Execute root scripts
   ractw21 :
   ractw22 :  

Create a RAC database with dbca

Create a RAC database with dbca – Adminstator managed

[oracle@ractw21 ~]$ id
uid=1002(oracle) gid=1001(oinstall) groups=1001(oinstall),983(vboxsf),1002(dba),1006(asmdba) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[oracle@ractw21 database]$ dbca
--> Create and Configure a Database
  --> Oracle Real Application Cluster database 
   --> Policy Managed : false 
    --> Generel Purpose or Transaction Procession
     --> Server Pool:  Top_Priority Cardinality :2
      --> Global database Name : ractw2.example.com
       --> Use local UNDO for PDPs
       --> Create as Container db 
        --> Let ASM unchanged 
         --> Enable Archiving
         --> Enable FRA
         --> recoveryAreaDestination=+DATA
          --> Limit SGA, PGA
              sga_target=1367MB
              pga_aggregate_target=456MB
            --> password : sys/sys
 
       --> PDP Name : ractw2pdb
     --> Select all 2 RAC members
      --> Test/Create SSH connectivity
       --> Advanced Install 
        --> Select Generell Purpose / Transaction database type
         --> recoveryAreaDestination=+DATA
          --> Select ASM and for OSDDBA use group:  dba ( default )


Convert Adminstrator Managed Database to Policy Managed

Verify setup by connecting via Scan Name

Find the Scan Name 
[grid@ractw21 ~]$  srvctl config scan
SCAN name: ractw2-scan.ractw2.grid122.example.com, Network: 1
Subnet IPv4: 192.168.5.0/255.255.255.0/enp0s8, dhcp
Subnet IPv6: 
SCAN 1 IPv4 VIP: -/scan1-vip/192.168.5.234
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 2 IPv4 VIP: -/scan2-vip/192.168.5.232
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 3 IPv4 VIP: -/scan3-vip/192.168.5.231
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
--> Our Scan Name is  ractw2-scan.ractw2.grid122.example.com

Test the Namesever connectivity: Note we expect to see our 3 VIP addresses here for a working setup
[grid@ractw21 ~]$ nslookup ractw2-scan.ractw2.grid122.example.com
Server:        192.168.5.50
Address:    192.168.5.50#53

Non-authoritative answer:
Name:    ractw2-scan.ractw2.grid122.example.com
Address: 192.168.5.231
Name:    ractw2-scan.ractw2.grid122.example.com
Address: 192.168.5.234
Name:    ractw2-scan.ractw2.grid122.example.com
Address: 192.168.5.232

Find one of our SCAN listener running on this node 
[oracle@ractw21 ~]$ ps -elf | grep LISTENER_SCAN
0 S grid     16391     1  0  80   0 - 70631 -      10:25 ?        00:00:00 /u01/app/122/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit

Query the SCAN Listener for registered services 
[grid@ractw21 ~]$  lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 23-APR-2017 13:25:23
...
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "4d93c06c65234fffe0538d05a8c0a06a" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "4dac7be3b0930f43e0538e05a8c01163.example.com" has 2 instance(s).
  Instance "ractw_1", status READY, has 1 handler(s) for this service...
  Instance "ractw_2", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "gimr_dscrep_10" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ractw.example.com" has 2 instance(s).
  Instance "ractw_1", status READY, has 1 handler(s) for this service...
  Instance "ractw_2", status READY, has 1 handler(s) for this service...
... 
Both instances ractw_1 and ractw_2 are registered for service ractw.example.com at portnumber 1521 !

Now build the connect string using service and scan name
[oracle@ractw21 ~]$  sqlplus system/sys@ractw2-scan.ractw2.grid122.example.com:1521/ractw.example.com @v
INSTANCE_NUMBER INSTANCE_NAME     STATUS       HOST_NAME
--------------- ---------------- ------------ ----------------------------------------------------------------
          2 ractw_2      OPEN          ractw22.example.com


[oracle@ractw21 ~]$  sqlplus system/sys@ractw2-scan.ractw2.grid122.example.com:1521/ractw.example.com @v
INSTANCE_NUMBER INSTANCE_NAME     STATUS       HOST_NAME
--------------- ---------------- ------------ ----------------------------------------------------------------
          1 ractw_1      OPEN          ractw21.example.com

- Using the SCAN connect string connect us to different RAC instances on different hosts.
- This means our scan setup works fine and connection load balacing takes place

Reference

Convert a administrator managed RAC database to policy managed [ 12.2 ]

Check Database and Pool Status

Check RAC database status
[grid@ractw21 ~]$   srvctl config database -d ractw
Database unique name: ractw
Database name: ractw
Oracle home: /u01/app/oracle/product/122/ractw2
Oracle user: oracle
Spfile: +DATA/RACTW/PARAMETERFILE/spfile.325.941894301
Password file: +DATA/RACTW/PASSWORD/pwdractw.284.941893247
Domain: example.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: DATA
Mount point paths: 
Services: 
Type: RAC
Start concurrency: 
Stop concurrency: 
OSDBA group: dba
OSOPER group: 
Database instances: ractw1,ractw2
Configured nodes: ractw21,ractw22
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

Check pool status 
[oracle@ractw21 ~]$  srvctl status srvpool -a
Server pool name: Free
Active servers count: 0
Active server names: 
Server pool name: Generic
Active servers count: 2
Active server names: ractw21,ractw22
NAME=ractw21 STATE=ONLINE
NAME=ractw22 STATE=ONLINE

Convert Database to Policy Managed

Stop database,add RAC database to pool and finally check pool status [grid@ractw21 grid]$ srvctl stop database -d ractw
[grid@ractw21 grid]$ srvctl add srvpool -g TopPriority  -l 1 -u 2 -i 5
[grid@ractw21 grid]$ srvctl modify database -d ractw -g TopPriority
                        
PRCD-1130 : Failed to convert administrator-managed database ractw into a policy-managed database to use server pool TopPriority
PRCR-1071 : Failed to register or update resource ora.ractw.db
CRS-0245:  User doesn't have enough privilege to perform the operation

Run as user Oracle :
[oracle@ractw21 ~]$ srvctl modify database -d ractw -g TopPriority

[grid@ractw21 grid]$ srvctl config  srvpool -g TopPriority
Server pool name: TopPriority
Importance: 5, Min: 1, Max: 2
Category: hub
Candidate server names: 

[grid@ractw21 grid]$  srvctl config  database -d ractw
Database unique name: ractw
Database name: ractw
Oracle home: /u01/app/oracle/product/122/ractw2
Oracle user: oracle
Spfile: +DATA/RACTW/PARAMETERFILE/spfile.325.941894301
Password file: +DATA/RACTW/PASSWORD/pwdractw.284.941893247
Domain: example.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: TopPriority
Disk Groups: DATA
Mount point paths: 
Services: 
Type: RAC
Start concurrency: 
Stop concurrency: 
OSDBA group: dba
OSOPER group: 
Database instances: 
Configured nodes: 
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is policy managed

[oracle@ractw21 ~]$ srvctl start database -d ractw
PRCR-1079 : Failed to start resource ora.ractw.db
CRS-2643: The server pool(s) where resource 'ora.ractw.db' could run have no servers

Remove category and add servers 

Trying to add servers 
[grid@ractw21 ~]$ srvctl modify srvpool -serverpool TopPriority  -servers  "ractw21,ractw22" -verbose
[grid@ractw21 ~]$ srvctl config srvpool -serverpool TopPriority 
Server pool name: TopPriority
Importance: 5, Min: 1, Max: 2
Category: 
Candidate server names: ractw21,ractw22

[grid@ractw21 ~]$  srvctl status srvpool -a
Server pool name: Free
Active servers count: 0
Active server names: 
Server pool name: Generic
Active servers count: 2
Active server names: ractw21,ractw22
NAME=ractw21 STATE=ONLINE
NAME=ractw22 STATE=ONLINE
Server pool name: TopPriority
Active servers count: 0
Active server names: 

Both severs  ractw21,ractw22 still assigned to GENERIC pool !

Removing our admin managed test database ! 
[oracle@ractw21 ~]$ srvctl remove database -db test
Remove the database test? (y/[n]) y

Verify Database Status and Pool Status after changes

Now verify pool 
[grid@ractw21 ~]$  srvctl status srvpool -a
Server pool name: Free
Active servers count: 0
Active server names: 
Server pool name: Generic
Active servers count: 0
Active server names: 
Server pool name: TopPriority
Active servers count: 2
Active server names: ractw21,ractw22
NAME=ractw21 STATE=ONLINE
NAME=ractw22 STATE=ONLINE
--> Now both servers are assigned to our TopPriority pool 

Starting ractw database and confirm cluster resource status
[oracle@ractw21 dbs]$ srvctl start  database -db ractw

*****  Cluster Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.ractw.db                   1   ONLINE    ONLINE       ractw21         Open,HOME=/u01/app/oracle/product/122/ractw2,STABLE
ora.ractw.db                   2   ONLINE    ONLINE       ractw22         Open,HOME=/u01/app/oracle/product/122/ractw2,STABLE

Setup Chrony to avoid PRVG-13606 in a VirtualBox/RAC env

Configure Local Chrony Server on our RAC Nameserver ns1.example.com

1) Install chrony and disable NTP
[root@ns1 etc]# yum install -y chrony
[root@ns1 etc]# service ntpd stop
[root@ns1 etc]# chkconfig ntpd off
[root@ns1 etc]#  chkconfig --list ntpd
ntpd               0:off    1:off    2:off    3:off    4:off    5:off    6:off

2) Edit the Server Config file '/etc/chrony.conf' and make the following changes.

#Comment the server entries
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst

# Allow NTP client access from local network.
allow 192.168/16

#Not synchronized to any external source. (local mode)
local stratum 10

3) Start the server and enable at boot time
[root@ns1 etc]# systemctl start chronyd
[root@ns1 etc]# systemctl enable chronyd

4) Verify Chrony tracking
[root@ns1 etc]# chronyc tracking
Reference ID    : 127.127.1.1 ()
Stratum         : 10
Ref time (UTC)  : Sat Apr  8 08:58:22 2017
System time     : 0.000000000 seconds fast of NTP time
Last offset     : +0.000000000 seconds
RMS offset      : 0.000000000 seconds
Frequency       : 0.000 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 0.000000 seconds
Root dispersion : 0.000001 seconds
Update interval : 0.0 seconds
Leap status     : Not synchronised

The Reference ID field shows the IP Address or name of the NTP Server to which we are synchronized.
The value 127.127.1.1 indicates "local" mode (Not synchronized with any external source).

Configure Chrony Client for our RAC VMs

1) Install chrony
[root@ractw21 etc]# yum install -y chrony

2) Edit the Client Config file '/etc/chrony.conf' and make the following changes.

#Comment the server entries
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst

#Add local Chrony server entry pointing to our RAC Nameserver
server 192.168.5.50  iburst

3) Start the server and enable at boot time
[root@ractw21 etc]# systemctl start chronyd
[root@ractw21 etc]# systemctl enable chronyd

4) Verify Chrony tracking

[root@ractw21 etc]# chronyc tracking
Reference ID    : 192.168.5.50 (ns1.example.com)
Stratum         : 11
Ref time (UTC)  : Sat Apr  8 09:02:32 2017
System time     : 0.000008410 seconds slow of NTP time
Last offset     : +0.000009758 seconds
RMS offset      : 0.012060679 seconds
Frequency       : 4.164 ppm slow
Residual freq   : +0.001 ppm
Skew            : 0.482 ppm
Root delay      : 0.000209 seconds
Root dispersion : 0.000002 seconds
Update interval : 65.1 seconds
Leap status     : Normal

The Reference ID field shows the IP Address or name of the NTP Server to which we are synchronized.

5) Verify Chrony Sources and resync time with our Nameservers time

[root@ractw21 ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* ns1.example.com              10   6    77    30    -23us[ +206us] +/-   91us

NOTE: To quickly synchronize with the server run ntpdate
[root@ractw21 ~]# ntpdate 192.168.5.50
8 Apr 11:10:56 ntpdate[25464]: adjust time server 192.168.5.50 offset 0.000664 sec


Chrony not working after reboot

Chrony status after reboot
[root@ractw21 ~]#  chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? ns1.example.com               0   6     0   10y     +0ns[   +0ns] +/-    0ns

Local chrony is not synchronized with our Master Server 


[root@ractw21 ~]# systemctl status chronyd.service
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2017-04-14 11:15:17 CEST; 2min 42s ago
  Process: 858 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 805 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 811 (chronyd)
   CGroup: /system.slice/chronyd.service
           └─811 /usr/sbin/chronyd

Apr 14 11:15:17 ractw21.example.com systemd[1]: Starting NTP client/server...
Apr 14 11:15:17 ractw21.example.com chronyd[811]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DE...CHASH)
Apr 14 11:15:17 ractw21.example.com chronyd[811]: Could not resolve address of initstepslew server ns1.example.com
Apr 14 11:15:17 ractw21.example.com chronyd[811]: Frequency -3.525 +/- 2.346 ppm read from /var/lib/chrony/drift
Apr 14 11:15:17 ractw21.example.com systemd[1]: Started NTP client/server.
Hint: Some lines were ellipsized, use -l to show in full.

Status
- chronyc Client reports a problem reaching our Chrony Master server during the Boot process
- Not sure whether this problem is related to VirtualBox or due to problems with the parallel
  OEL 7.3 Linux startup scripts

Anyway a quick and dirty fix is to use rc.local the set Chrony ONLINE when fully network connectivity is established  
- First set  server attribute to OFFLINE  chrony.conf 
    server ns1.example.com offline
- Set chrony ONLINE in rc.local. Add line
   ( sleep 5; chronyc -a  online ) &>/tmp/CHRONY.log 
   Don't forget to give proper permissions 
   [root@ractw21 ~]# chmod +x /etc/rc.d/rc.local'


After a reboot chronyc looks like 
[root@ractw21 ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* ns1.example.com              10   6   377    18   -384us[ -686us] +/-  168us

Working Chrony Client Setup

  • Don’t forget the set chrony ONLIE in rc.local
Configuration:

192.168.5.50    : IP Address of our Nameserver running Chrony Master
ns1.example.com : Hostname of our Nameserver running Chrony Master

Client File: /etc/chrony.conf
server ns1.example.com offline
driftfile /var/lib/chrony/drift
logdir /var/log/chrony
log measurements statistics tracking
keyfile /etc/chrony.keys
local stratum 10
initstepslew 20 ns1.example.com
allow 192.168.5.50

Chrony Commands

Check Chrony version 
[root@ractw21 ~]# rpm -qa |grep chrony
chrony-2.1.1-4.el7_3.x86_64


If chronyc command fails with:  501 Not authorised     
[root@ractw21 ~]# chronyc online
501 Not authorised

Login to chrony in interactive Mode 
chronyc> authhachrony.confsh SHA1
chronyc> password  HEX:F88F845A9773AEEC9340EFA825AA60435D78D773
chronyc> online 
200 OK

Using chrony from command line when login in required 
[root@ractw21 ~]# chronyc -a  offline
200 OK
200 OK
[root@ractw21 ~]# chronyc -a  online
200 OK
200 OK

Display current chrony Status 
[root@ractw21 ~]# systemctl  status chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2017-04-14 10:05:43 CEST; 6min ago
  Process: 4461 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 4457 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 4459 (chronyd)
   CGroup: /system.slice/chronyd.service
           └─4459 /usr/sbin/chronyd

Apr 14 10:05:37 ractw21.example.com systemd[1]: Starting NTP client/server...
Apr 14 10:05:37 ractw21.example.com chronyd[4459]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +D...CHASH)
Apr 14 10:05:37 ractw21.example.com chronyd[4459]: Frequency -2.683 +/- 0.064 ppm read from /var/lib/chrony/drift
Apr 14 10:05:41 ractw21.example.com chronyd[4459]: System's initial offset : 0.082425 seconds fast of true (slew)
Apr 14 10:05:43 ractw21.example.com systemd[1]: Started NTP client/server.
Apr 14 10:05:47 ractw21.example.com chronyd[4459]: Selected source 192.168.5.50
Hint: Some lines were ellipsized, use -l to show in full.



Check sources stats
[root@ractw21 ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* ns1.example.com              10   6   377    18   -384us[ -686us] +/-  168us

The columns are as follows:
M
    This indicates the mode of the source. 
     ^ means a server 
     = means a peer  
     # indicates a locally connected reference clock. 
S
    This column indicates the state of the sources. 
    * indicates the source to which chronyd is current synchronised. 
    + indicates other acceptable sources. 
    ? indicates sources to which connectivity has been lost.


Chrony Status Change to Synchronized - this may take a while  
[root@ractw21 ~]#  chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? ns1.example.com              10   6     3    62  -1487ms[-1487ms] +/-  221us

[root@ractw21 ~]#  chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? ns1.example.com              10   6     3    64  -1487ms[-1487ms] +/-  221us
[root@ractw21 ~]#  chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* ns1.example.com              10   6     7     0  -5616ns[-1487ms] +/-  976us

Status is set to synchronized with source when Reach Count > 3 - so be patient !


Verify Chrony tracking status
[root@ractw21 etc]# chronyc tracking
Reference ID    : 192.168.5.50 (ns1.example.com)
Stratum         : 11
Ref time (UTC)  : Sat Apr  8 09:02:32 2017
System time     : 0.000008410 seconds slow of NTP time
Last offset     : +0.000009758 seconds
RMS offset      : 0.012060679 seconds
Frequency       : 4.164 ppm slow
Residual freq   : +0.001 ppm
Skew            : 0.482 ppm
Root delay      : 0.000209 seconds
Root dispersion : 0.000002 seconds
Update interval : 65.1 seconds
Leap status     : Normal

The Reference ID field shows the IP Address or name of the NTP Server used for time synchronizing.


Reference:

A deeper dive into JPA, 2-Phase-Commit [ 2PC ] and RAC

Overview JPA and 2-Phase-Commit

Mike Keith Architect at Oracle and Author

  Pro JPA 2: Mastering the Java Persistence API (Second edition)

summarizes the usage of JPA in a distributed evironment the following :

  • A JPA application will get the 2PC benefits the same as any other application
  • The peristence unit data source is using JTA and  is configured to use an XA data source
  • The XA resources and transaction manager 2PC interactions happen on their own without the JPA EMF knowing or having to be involved.
  • If a 2PC XA tx fails then an exception will be thrown just the same as if the tx was optimized to not have 2PC.

This was enough motivation for me working on Oracle RAC and JDBC projects to have a closer look on JPA and 2PC.

Versions used  / Configuration File persistence.xml

Wildfly:  8.2
Hibernate Version: 4.3.7.Final
--> Collecting Data for RAC database1
    Driver Name             : Oracle JDBC driver
    Driver Version          : 12.1.0.2.0
    Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    DB Name:  BANKA
    1. Instance Name: bankA_2 - Host: hract21.example.com - Pooled XA Connections: 61

--> Collecting Data for RAC database2
    Driver Name             : Oracle JDBC driver
    Driver Version          : 12.1.0.2.0
    Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    DB Name:  BANKB
    1. Instance Name: bankb_3 - Host: hract21.example.com - Pooled XA Connections: 62

persistence.xml

<?xml version="1.0"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
             version="2.0">

    <persistence-unit name="RacBankAHibPU" transaction-type="JTA">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <jta-data-source>java:/jboss/datasources/xa_rac12g_banka</jta-data-source>
        <class>com.hhu.wfjpa2pc.Accounts</class>
        <properties>
            <property name="hibernate.transaction.jta.platform"
                 value="org.hibernate.service.jta.platform.internal.JBossAppServerJtaPlatform" />
            <property name="hibernate.show_sql" value="true" />
            <property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect"/>
        </properties>
    </persistence-unit>
    <persistence-unit name="RacBankBHibPU" transaction-type="JTA">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <jta-data-source>java:/jboss/datasources/xa_rac12g_bankb</jta-data-source>
        <class>com.hhu.wfjpa2pc.Accounts</class>
        <properties>
            <property name="hibernate.transaction.jta.platform"
                 value="org.hibernate.service.jta.platform.internal.JBossAppServerJtaPlatform" />
            <property name="hibernate.show_sql" value="true" />
            <property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect"/>
        </properties>
    </persistence-unit>
</persistence>

Running a successful 2PC operation with JPA

Call Flow 

- Get EntityManager for RAC Database1 [ em1=getEntityManager1(); ]
- Get EntityManager for RAC Database2 [ em2=getEntityManager2(); ]
- Start as Usertransacation             [ ut.begin(); ]
- Join transaction from EntityManager 1  [ em1.joinTransaction(); ]
- Join transaction from EntityManager 2  [ em2.joinTransaction(); ]
- Chance Balance on both databases
bankA_acct.setBalance( bankA_acct.getBalance().add(b) );
em1.merge(bankA_acct);
if (isEnableFlush() )
em1.flush();

bankB_acct.setBalance( bankB_acct.getBalance().subtract(b) );
em2.merge(bankB_acct);
if (isEnableFlush() )
em2.flush();
- Finally commit the Transaction [ ut.commit(); ]

Application log :
14:51:58.071 transferMoneyImpl():: Found both Entity Managers for PUs : RacBankAHibPU and RacBankBHibPU
14:51:58.074 transferMoneyImpl():: Account at bank A: User99_at_BANKA - Balance: 10000
14:51:58.075 transferMoneyImpl():: Account at bank B: User98_at_BANKB - Balance: 10000
14:51:58.076 transferMoneyImpl():: Both EMs joined our XA Transaction...
14:51:58.092 transferMoneyImpl():: Before Commit ...
14:51:58.160 transferMoneyImpl():: Tx Commit worked !
14:51:58.165 Database Name:BANKA -- Account: User99_at_BANKA -- Balance: 11000.0
14:51:58.168 Database Name:BANKB -- Account: User98_at_BANKB -- Balance: 9000.0
14:51:58.169 transferMoneyImpl():: Leaving with TX Status:: [UT status:  6 - STATUS_NO_TRANSACTION]

-> We successfully managed to transfer some money from bankA to bankB !

Testing Rollback operation with EM flush enabled [ transaction status : STATUS_MARKED_ROLLBACK ]

Account Balance
 transferMoneyImpl():: Account at bank A: User99_at_BANKA - Balance: 20000
 transferMoneyImpl():: Account at bank B: User98_at_BANKB - Balance: 0
Note the next money transfer/transaction should trigger a constraint violation ! 

Call Flow
- Get EntityManager for RAC Database1 [ em1=getEntityManager1(); ]
- Get EntityManager for RAC Database2 [ em2=getEntityManager2(); ]
- Start a User transaction             [ ut.begin(); ] 
- Join transaction from EntityManager 1  [ em1.joinTransaction(); ]
- Join transaction from EntityManager 2  [ em2.joinTransaction(); ]
- Chance Balance on both databases
     bankA_acct.setBalance( bankA_acct.getBalance().add(b) );
        em1.merge(bankA_acct);
        if (isEnableFlush() )
          em1.flush();
                
        bankB_acct.setBalance( bankB_acct.getBalance().subtract(b) );
        em2.merge(bankB_acct);           
        if (isEnableFlush() )
          em2.flush();              
- em2.flush is failing due to a constraint violation and set the TX status to  : STATUS_MARKED_ROLLBACK 
   Error : org.hibernate.exception.ConstraintViolationException: could not execute statement
- Exception handler checks transaction status : STATUS_MARKED_ROLLBACK and is rolling back the TX
       if ( status != javax.transaction.Status.STATUS_NO_TRANSACTION   ) 
         {
         ut.rollback();
         ...
- After rollback() transaction status changed to   STATUS_NO_TRANSACTION                      
  
Application log :
15:11:03.920 transferMoneyImpl():: Found both Entity Managers for PUs : RacBankAHibPU and RacBankBHibPU
15:11:03.929 transferMoneyImpl():: Account at bank A: User99_at_BANKA - Balance: 20000
15:11:03.931 transferMoneyImpl():: Account at bank B: User98_at_BANKB - Balance: 0
15:11:03.931 transferMoneyImpl():: Both EMs joined our XA Transaction... 
15:11:03.960 transferMoneyImpl():: FATAL ERROR - Tx Status : [UT status:  1 - STATUS_MARKED_ROLLBACK]
15:11:03.962 transferMoneyImpl():: Before TX rollback ... 
15:11:03.974 transferMoneyImpl():: TX rollback worked !
15:11:03.974 transferMoneyImpl():: Leaving with TX Status:: [UT status:  6 - STATUS_NO_TRANSACTION]


Exception stack :
15:11:03.960 Error in top level function: transferMoneyImpl():: 
15:11:03.960 org.hibernate.exception.ConstraintViolationException: could not execute statement
15:11:03.961 javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: could not execute statement
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1763)
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1677)
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1683)
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.flush(AbstractEntityManagerImpl.java:1338)
    at com.hhu.wfjpa2pc.Jpa2pcTest.transferMoneyImpl(Jpa2pcTest.java:235)
    at com.hhu.wfjpa2pc.Jpa2pcTest.transferMoney(Jpa2pcTest.java:166)
        ..
Caused by: org.hibernate.exception.ConstraintViolationException: could not execute statement
    at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:72)
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java
    ... 
Caused by: java.sql.SQLIntegrityConstraintViolationException: ORA-02290: check constraint (SCOTT.S_LOWER_CHK) violated

Testing Rollback operation without EM flush enabled [ transaction status : STATUS_NO_TRANSACTION  ]

Account Balance
 transferMoneyImpl():: Account at bank A: User99_at_BANKA - Balance: 20000
 transferMoneyImpl():: Account at bank B: User98_at_BANKB - Balance: 0
Note the next money transfer/transaction should trigger a constraint violation ! 

Call Flow
- Get EntityManager for RAC Database1 [ em1=getEntityManager1(); ]
- Get EntityManager for RAC Database2 [ em2=getEntityManager2(); ]
- Start a User transaction            [ ut.begin(); ] 
- Join transaction from EntityManager 1  [ em1.joinTransaction(); ]
- Join transaction from EntityManager 2  [ em2.joinTransaction(); ]
- Chance Balance on both databases
     bankA_acct.setBalance( bankA_acct.getBalance().add(b) );
        em1.merge(bankA_acct);
        if (isEnableFlush() )
          em1.flush();
                
        bankB_acct.setBalance( bankB_acct.getBalance().subtract(b) );
        em2.merge(bankB_acct);           
        if (isEnableFlush() )
          em2.flush();        
- Commit the Transaction [ ut.commit(); ] fails with :  ARJUNA016053: Could not commit transaction.
- As the Commit itself fails Wildfly rollback the transaction 
- Tx Status after COMMIT error :  STATUS_NO_TRANSACTION 
- Exception handler checks transaction status : STATUS_MARKED_ROLLBACK and is not rolling back the TX
       if ( status != javax.transaction.Status.STATUS_NO_TRANSACTION   ) 
         {
         ut.rollback();
         ...
- Here we don't run any rollback operation -> the TX status remains at   STATUS_NO_TRANSACTION                      
  
Application log :
  15:27:53.818 transferMoneyImpl():: Found both Entity Managers for PUs : RacBankAHibPU and RacBankBHibPU
  15:27:53.827 transferMoneyImpl():: Account at bank A: User99_at_BANKA - Balance: 20000
  15:27:53.829 transferMoneyImpl():: Account at bank B: User98_at_BANKB - Balance: 0
  15:27:53.829 transferMoneyImpl():: Both EMs joined our XA Transaction... 
  15:27:53.829 transferMoneyImpl():: Before Commit ... 
  15:27:53.857 transferMoneyImpl():: FATAL ERROR - Tx Status : [UT status:  6 - STATUS_NO_TRANSACTION]
  15:27:53.859 transferMoneyImpl():: TX not active / TX already rolled back
  15:27:53.859 transferMoneyImpl():: Leaving with TX Status:: [UT status:  6 - STATUS_NO_TRANSACTION]

Testing transaction Recovery with JPA

What we are expecting  and what we are testing
  - Transaction Timeout is set to 600 seconds
  - We set a breakpoint at   OracleXAResource.commit
    ==> This means Wildfly has written a COMMIT record to  the  Wildlfly LOG-STORE
  - After stop at the first OracleXAResource.commit breakpoint  we kill the Wildfly server 
  - Both RMs [ Oracle RAC databases ] are now counting down the Transaction Timeout 
  - If Timeout is reached the failed transaction becomes visible in dba_2pc_pending table
  - Trying to get a lock on these records should lead to a ORA-1591 error 
  - After Wildfly restart the Periodic Recovery should run OracleXAResource.commit and release all locks

Preparing and running the test scenario

Start Wildfly in Debug Mode :
Set breakpoint on OracleXAResource.commit and run the application

Stack Trace 
"default task-3"
oracle.jdbc.xa.client.OracleXAResource.commit(OracleXAResource.java:553)
org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.commit(XAManagedConnection.java:338)
org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.commit(XAResourceWrapperImpl.java:107)
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:461)
com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2810)
com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2726)
com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1820)
com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1504)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:96)
com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1166)
com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)
com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
org.jboss.tm.usertx.client.ServerVMClientUserTransaction.commit(ServerVMClientUserTransaction.java:173)
com.hhu.wfjpa2pc.Jpa2pcTest.transferMoneyImpl(Jpa2pcTest.java:242)
com.hhu.wfjpa2pc.Jpa2pcTest.transferMoney(Jpa2pcTest.java:166)

Wildfly Check for prepared transaction 
$ $WILDFLY_HOME/bin/jboss-cli.sh --connect --file=list_prepared_xa_tx.cli
{"outcome" => "success"}
0:ffffc0a805c9:f5a10ef:56039e68:d

Locate and kill JBOSS server process 
0 S oracle    5875  5821  7  80   0 - 413473 futex_ 08:55 ?       00:00:30 
     /usr/java/latest/bin/java .... -Djboss.server.base.dir=/usr/local/wildfly-8.2.0.Final/standalone -c standalone.xml
0 S oracle    6174  5680  0  80   0 - 25827 pipe_w 09:02 pts/1    00:00:00 grep java
[oracle@wls1 WILDFLY]$ kill -9 5875

Now wait [ at lest 600 seconds ] until the Transaction becomes visible in  dba_2pc_pending

SQL> SELECT * FROM GLOBAL_NAME;
GLOBAL_NAME
----------------
BANKA

SQL> select * from dba_2pc_pending;
LOCAL_TRAN_ID           GLOBAL_TRAN_ID                            STATE         MIX A TRAN_COMMENT
---------------------- ---------------------------------------------------------------- ---------------- --- - ----------------
FAIL_TIM FORCE_TI RETRY_TI OS_USER    OS_TERMINAL  HOST          DB_USER       COMMIT#
-------- -------- -------- ------------ ------------ ---------------- ------------ ----------------
9.21.7139           131077.00000000000000000000FFFFC0A805C90F5A10EF56039E680000000D3 prepared     no
               1
09:07:22      09:15:34 oracle    unknown      wls1.example.com           43619336



SQL> SELECT * FROM GLOBAL_NAME;
GLOBAL_NAME
----------------
BANKB

SQL> select * from dba_2pc_pending;
LOCAL_TRAN_ID           GLOBAL_TRAN_ID                            STATE         MIX A TRAN_COMMENT
---------------------- ---------------------------------------------------------------- ---------------- --- - ----------------
FAIL_TIM FORCE_TI RETRY_TI OS_USER    OS_TERMINAL  HOST          DB_USER       COMMIT#
-------- -------- -------- ------------ ------------ ---------------- ------------ ----------------
4.15.3293           131077.00000000000000000000FFFFC0A805C90F5A10EF56039E680000000D3 prepared     no
               1
09:07:22      09:15:34 oracle    unknown      wls1.example.com           20931538

Check for locks 
-> Connected to  scott/tiger@ract2-scan.grid12c.example.com:1521/banka
select * from accounts for update
*
ERROR at line 1:
ORA-01591: lock held by in-doubt distributed transaction 9.21.7139


-> Connected to  scott/tiger@ract2-scan.grid12c.example.com:1521/bankb
select * from accounts for update
*
ERROR at line 1:
ORA-01591: lock held by in-doubt distributed transaction 4.15.3293


Restart Wildfly in Debug Mode and let the Periodic Recovery Thread commit the transaction 

"Periodic Recovery"
oracle.jdbc.xa.client.OracleXAResource.commit(OracleXAResource.java:553)
org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.commit(XAManagedConnection.java:338)
org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.commit(XAResourceWrapperImpl.java:107)
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:461)
com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2810)
com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2726)
com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1820)
com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71)
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152)
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:253)
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109)
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789)
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371)

-> WildFly Thread Periodic Recovery stops at OracleXAResource.commit
-> Press Debugger Command : Continue 
-> WildFly Thread Periodic Recovery has committed Transaction Branch 1
-> WildFly Thread Periodic Recovery stops again  at .OracleXAResource.commit
-> Press Debugger Command : Continue 
-> WildFly Thread Periodic Recovery has committed Transaction Branch 2
-> Complete Transaction is now committed 

Verify Access to the Database records and Wildfly Prepared Transaction Cleanup 
-> Connected to  scott/tiger@ract2-scan.grid12c.example.com:1521/banka
ACCOUNT                 BALANCE
-------------------------------- ----------
User99_at_BANKA               14000


-> Connected to  scott/tiger@ract2-scan.grid12c.example.com:1521/bankb
ACCOUNT                 BALANCE
-------------------------------- ----------
User98_at_BANKB                6000


List prepared Transaction
$  $WILDFLY_HOME/bin/jboss-cli.sh --connect --file=list_prepared_xa_tx.cli
{"outcome" => "success"}

-> After a successful transaction recovery the locks are gone 

 

Java Code

public void transferMoneyImpl()
      {
        String methodName = "transferMoneyImpl():: ";
        EntityManager em1;
        EntityManager em2;
        UserTransaction ut =null;
      try
        {
        setRunTimeInfo(methodName  + "Entering ... ");
            
        HttpSession session = (HttpSession) FacesContext.getCurrentInstance().getExternalContext().getSession(true);
        if ( session == null)
              {
                throw new IllegalArgumentException(methodName+ ": Could not get HTTP session : ");    
              }                        
        final Object lock = session.getId().intern();       
        synchronized(lock) 
              {
                em1=getEntityManager1();
                em2=getEntityManager2();
                    //
                    // Note even we get an EntityManager Object we still not sure that the 
                    // EntityManager Could open connection the underlying JDBC connection !
                    //
                if ( em1 == null )
                    setRunTimeInfo(methodName  + "Faild to get EM for PU: " + EMF.getPU1() );
                else if ( em2 == null )
                    setRunTimeInfo(methodName  + "Faild to get EM for PU: " + EMF.getPU2() );
                else
                    setRunTimeInfo(methodName  + "Found both Entity Managers for PUs : " + 
                       EMF.getPU1()  + " and " +  EMF.getPU2()  ); 
                   
                 
                String bankA_acct_name = "User99_at_BANKA";
                Accounts bankA_acct = em1.find(Accounts.class, bankA_acct_name);
                if ( bankA_acct == null)
                    { 
                    setRunTimeInfo(methodName + "Could not locate Account at bankA : " + bankA_acct_name );
                    return;
                    }
                setRunTimeInfo(methodName  +"Account at bank A: " + bankA_acct.getAccount()  + " - Balance: " +  bankA_acct.getBalance() );
                
                String bankB_acct_name = "User98_at_BANKB";
                Accounts bankB_acct = em2.find(Accounts.class, bankB_acct_name);
                if ( bankB_acct == null)
                    { 
                    setRunTimeInfo(methodName + "Could not locate Account at bankB : " + bankB_acct_name );
                    return;
                    }
                setRunTimeInfo(methodName  +"Account at bank B: " + bankB_acct.getAccount()  + " - Balance: " +  bankB_acct.getBalance() );
              
                ut  = (javax.transaction.UserTransaction)new InitialContext().lookup("java:comp/UserTransaction"); 
                    // Set tranaction time to 120 seconds to avoid any timeouts during testing -
                    // especially when testing transaction recovery by restarting Wildfly server 
                    // Note as we kill the JAVA process both RMs will wait 120 s before Tx becomes visible in dba_2pc_pending 
                int tx_timeout = 120;
                ut.setTransactionTimeout(tx_timeout);
                ut.begin();
                em1.joinTransaction();
                em2.joinTransaction();
                setRunTimeInfo(methodName  + "Both EMs joined our XA Transaction... - TX Timeout: " + tx_timeout );
                BigDecimal b = new BigDecimal(1000);
                bankA_acct.setBalance( bankA_acct.getBalance().add(b) );
                em1.merge(bankA_acct);
                if (isEnableFlush() )
                    em1.flush();
                
                bankB_acct.setBalance( bankB_acct.getBalance().subtract(b) );
                em2.merge(bankB_acct);           
                if (isEnableFlush() )
                    em2.flush();
                
                setRunTimeInfo(methodName  + "Before Commit ... ");                
                ut.commit();
                setRunTimeInfo(methodName  + "Tx Commit worked !");
                checkBalanceImpl();
              }
        } catch ( Throwable t1)
          { 
            try
              {    
              String tx_status = returnTXStatus(ut);
              setRunTimeInfo( methodName  + "FATAL ERROR - Tx Status : " + tx_status  );
                // Use Throwable as we don't want to loose any important imformation
                // Note: Throwable is super class of Exception class          
               genericException("Error in top level function: " + methodName , (Exception)t1);                          
               if ( ut != null )
                  {
                    int status = ut.getStatus();    
                        // rollback transaction if still active - if not do nothing 
                    if ( status != javax.transaction.Status.STATUS_NO_TRANSACTION   ) {
                        setRunTimeInfo(methodName  + "Before TX rollback ... ");
                        ut.rollback();
                        setRunTimeInfo(methodName  + "TX rollback worked !");
                    } else
                        setRunTimeInfo(methodName  + "TX not active / TX already rolled back");
                  }
              }  catch ( Throwable t2)
                 { 
                   genericException(methodName + "FATAL ERROR during ut.rollback() ", (Exception)t2); 
                 } 
          }
        closeEntityManagers();       
        String tx_status_exit = "";
        try
          {    
            tx_status_exit = returnTXStatus(ut);
          }   catch ( Throwable t3)
            { 
              genericException(methodName + " Error during returning TX status ", (Exception)t3); 
            }    
        setRunTimeInfo(methodName  + "Leaving with TX Status:: " + tx_status_exit );
      }

Reference

Hibernate / JPA : Access to DialectResolutionInfo cannot be null

Pom.xml 
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-core</artifactId>
            <version>4.3.6.Final</version>
        </dependency>

        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-entitymanager</artifactId>
            <version>4.3.6.Final</version>
        </dependency>


Error and Cause Determination 

Deploying a new WEBApplication using Hibernate fails with : 
Deploying /usr/local/wildfly-8.2.0.Final/standalone/deployments/WFJPA2PC-1.0.war
"{\"JBAS014671: Failed services\" => {\"jboss.persistenceunit.\\\"WFJPA2PC-1.0.war#RacBankBHibPU\\\"\" 
  => \"org.jboss.msc.service.StartException in service jboss.persistenceunit.\\\"WFJPA2PC-1.0.war#RacBankBHibPU\\\": 
     org.hibernate.HibernateException: Access to DialectResolutionInfo cannot be null when 'hibernate.dialect' not set
     Caused by: org.hibernate.HibernateException: Access to DialectResolutionInfo cannot be null when 'hibernate.dialect' not set\"}}"

Cause : Oracle RAC database was down ! 

Fix : Add hibernate.dialect to your persistence.xml 
   <persistence-unit name="RacBankBHibPU" transaction-type="JTA">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <jta-data-source>java:/jboss/datasources/xa_rac12g_bankb</jta-data-source>
        <class>com.hhu.wfjpa2pc.Accounts</class>
        <properties>
            <property name="hibernate.transaction.jta.platform"
                 value="org.hibernate.service.jta.platform.internal.JBossAppServerJtaPlatform" />
            <property name="hibernate.show_sql" value="true" />
            <property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect"/>
        </properties>
    </persistence-unit>

-> Now application can be redeployed 
-> Application now starts but throws the following stack 
08:55:04.761 javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:jboss/datasources/xa_rac12g_bankb
08:55:04.762 java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:jboss/datasources/xa_rac12g_bankb
    at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:154)
    at com.hhu.wfjpa2pc.Jpa2pcTest.getRacInfoDS(Jpa2pcTest.java:398)
    at com.hhu.wfjpa2pc.Jpa2pcTest.getRacInfo(Jpa2pcTest.java:376)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.

-> With above Stack we easily can figure out that we need to start our RAC DB

Copy a Maven based JPA project to new project folder

New Project Details

ArtifactId                   : WFJPA2PC
New Project Home             : /home/oracle/NetBeansProjects/GIT/WFJPA2PC 
New persistence.xml location :  /home/oracle/NetBeansProjects/GIT/WFJPA2PC/src/main/resources/META-INF/persistence.xml Your old and working project should have a working pom.xml and persistence.xml 

Modify and copy pom,xml

Modify artifactId and name in your pom.xml 
    <artifactId>WFJPA2PC</artifactId>
    <name>WFJPA2PC</name>

Copy pom.xml to your new Project Folder 
$ cp pom.xml   /home/oracle/NetBeansProjects/GIT/WFJPA2PC

Add JSF support to our project 
Project -> Properties -> Framework -> Add JavaServerFaces 
                                   -> Components -> PrimeFaces 

Create new Entity Classes

Create new Entity Classes as needed [ in our sample we use Entity Class : Accounts 

Source package -> New -> EnititY Class form Database 
  -> Select Datasource 
  -> Select Table Name : Accounts  
  -> Add 
  -> Finish 

Create a new JPA persistence.xml and copy over the working persistence.xml

Netbeans : File -> NEW ->  Persistence Unit 

Locate your Persistence Location of persistence.xml:
  /home/oracle/NetBeansProjects/GIT/WFJPA2PC/src/main/resources/META-INF/persistence.xml

-> Copy over your new RAC XA persistence.xml to   src/main/resources/META-INF/persistence.xml
   and add your newly created Entitly Classes    

Sample for a Hibernate persistence.xml with Entity Class : Accounts
<?xml version="1.0"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
             version="2.0">

    <persistence-unit name="RacBankAHibPU" transaction-type="JTA">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <jta-data-source>java:/jboss/datasources/xa_rac12g_banka</jta-data-source>
        <class>com.hhu.wfjpa2pc.Accounts</class>
        <properties>
            <property name="hibernate.transaction.jta.platform"
                 value="org.hibernate.service.jta.platform.internal.JBossAppServerJtaPlatform" />
            <property name="hibernate.show_sql" value="true" />
        </properties>
</persistence>

Details on HttpSession Object

Overview

  • HTTP is a stateless protocol
  • To keep JAVA object available longer than the duration of a HTTP request  each Object is stored within the HTTP session object. This is called Session tracking feature.
  • Typical technical solutions  for Session Tracking are
      • Cookies
      • URL rewriting

    Hidden form fields.

 

Why and how should you protect your HTTP session object

  • HttpSession object is not thread safe
  • Browser Tabbing, Fast Page Reloading  and async. AJAX request may give you concurrent access to the same HttpSession object
  • There is no guarantee that multiple calls to HttpServletRequest.getSession() will return the same HttpSession object .
  • Don’t synchronize on HttpSession object as this object may be recreated by certain containers
  • Synchronize on immutable object returned by :  session.getId().intern()

Reference

Creating a HTTP session using cookies

JSESSIONID cookie is created/sent when session is created. Session is created when your code calls request.getSession() 
or request.getSession(true) for the first time. If you just want get session, but not create it if it doesn't exists, 
use request.getSession(false) -- this will return you a session or null. In this case, new session is not created, 
and JSESSIONID cookie is not sent. (This also means that session isn't necessarily created on first request... you 
and your code is in control when the session is created)

Understanding Browser Session and HTTP Session Object

Using Browser tabbing with firefox  

Start a first Browser session and run their initial  HTTP POST to create the HTTP session by running timeoutTest() :
Session 1:  $ firefox http://localhost:8180/WFJPA2EL-1.0/ 
12:56:42.530 timeoutTest() - NEW Session  - ID zjiDVDLbFkQYhJ4bARWTmP6P - Access Count: 0- i
             JSESSIONID cookie: zjiDVDLbFkQYhJ4bARWTmP6P.wls1 - MaxIncativeInterval:  10 - 
             Last AccessedTime: 12:56:34.636 - Cookie MaxAge: -1

Now start as new Browser Tab using the same URL  and run the HTTP Post request again 
Session 2: $ firefox http://localhost:8180/WFJPA2EL-1.0/
12:56:47.856 timeoutTest() - WELCOME Back Session   - ID zjiDVDLbFkQYhJ4bARWTmP6P - Access Count: 1- 
             JSESSIONID cookie: zjiDVDLbFkQYhJ4bARWTmP6P.wls1 - MaxIncativeInterval:  10 - 
             Last AccessedTime: 12:56:42.547 - Cookie MaxAge: -1

- The second request doesn't create a new HTTP session object
- Instead browser session are uing the HTTP Session object 
- Thats is why you should serialize access to the HTTP session object as multiple threads may use this JAVA object !
 

Using a different Firefox profile or different Host name will create different HTTP Objects:

Session 1: $ firefox http://localhost:8180/WFJPA2EL-1.0/
10:06:59.096 timeoutTest() - NEW Session  - ID O3tl4IqIDCAGWB16O52HO6h6 - Access Count: 0- 
             JSESSIONID cookie: O3tl4IqIDCAGWB16O52HO6h6.wls1 - MaxIncativeInterval:  10 - 
             Last AccessedTime: 10:06:53.374 - Cookie MaxAge: -1

Session 2: $  firefox http://wls1:8180/WFJPA2EL-1.0/
10:07:42.817 timeoutTest() - NEW Session  - ID 6kqRcL-DDpKCywOqrq3wo8yB - Access Count: 0- 
             JSESSIONID cookie: 6kqRcL-DDpKCywOqrq3wo8yB.wls1 - MaxIncativeInterval:  10 - 
             Last AccessedTime: 10:07:37.164 - Cookie MaxAge: -1

- Using a different hostname [ or Firefox Profiles ] will create a new HTTP session object !

HTTP session object details

  • A HTTP Session objects stores data about Cookies and Attributes
  • The JSESSIONID is the cookie for implementing the Session Feature
  • The accessCount attribute tracks how often this session is reused by an HTTP request
10:19:27.673  timeoutTest() - WELCOME Back Session   - ID fwdaf1VInpfsA-20853SiT_F - Access Count: 1- 
              JSESSIONID cookie: fwdaf1VInpfsA-20853SiT_F.wls1 - MaxIncativeInterval:  10 - 
              Last AccessedTime: 10:19:23.205 - Cookie MaxAge: -1
10:19:27.674  attr  = accessCount      value = 1
10:19:27.675  Cookie   name  = JSESSIONID   value = fwdaf1VInpfsA-20853SiT_F.wls1    Cookie MaxAge = -1
Details:
 - Cookie MaxAge = -1       -> No Cookie Timeout
 - MaxIncativeInterval:  10 -> HTTP session timeout [ 10 seconds ] 
 - accsessCount             -> HTTP object stored within HTTP session object 

Java Code to display HTTP session details and set and read attributes

public void trackSession(HttpSession session, String methodName)
      {
        final Object lock = session.getId().intern();  
        synchronized(lock) 
          {
            String heading = null;
            accessCount =    (Integer)session.getAttribute("accessCount");
            if (accessCount == null) 
              {
                accessCount = new Integer(0);
                heading =  methodName + " - NEW Session";
              } 
            else 
              {
                heading =  methodName + " - WELCOME Back Session ";
                accessCount = new Integer(accessCount.intValue() + 1);
              }
            String jSessionId = "JSESSIONID not found";
            int jSessionMaxAge = 0;
            HttpServletRequest req = (HttpServletRequest) FacesContext.getCurrentInstance().getExternalContext().getRequest(); 

                // There no API to return JESSIONID cookie - we need to loop throught the Cookie Arrary 
            Cookie[] cookies = req.getCookies();
            for(Cookie cookie : cookies)
              {
                if("JSESSIONID".equals(cookie.getName()))
                  {    
                    jSessionId = cookie.getValue();
                    jSessionMaxAge = cookie.getMaxAge();
                               }
               }
                       
            session.setAttribute("accessCount", accessCount);
            setSessionInfo(heading +  "  - ID "  + session.getId() + " - Access Count: " + accessCount.intValue() 
                 + "- JSESSIONID cookie: " + jSessionId + " - MaxIncativeInterval:  " + session.getMaxInactiveInterval()
                 + " - Last AccessedTime: " + Tools.getTime2(session.getLastAccessedTime()) + " - Cookie MaxAge: " + jSessionMaxAge);
            
                // display Session details : Cookies and 
            if (displaySessionDetails )
              {   
                Enumeration es = session.getAttributeNames();
                while (es.hasMoreElements())
                  {
                    String attr = (String)es.nextElement();
                    Object value = session.getValue(attr);
                    setSessionInfo("      attr  = "+ attr +"      value = "+ value);
                  } 

                for (Cookie cookie : cookies) 
                  {
                    String cookieName = cookie.getName();
                    String cookieValue = cookie.getValue();
                    setSessionInfo("    Cookie   name  = "+  cookieName+"   value = "+ cookieValue + "    Cookie MaxAge = " +  cookie.getMaxAge());
                  }    
              }
          }                                  
      }

How to deal with HTTP Session Timeout – a JSF sample

  • For security and memory management, sessions need to be invalidated at a certain time
There are two related methods in HttpSession.
- HttpSession.invalidate() 
  By invoking invalidate(), the session will be invalidated immediately. 
  This is useful for the case such as logout.
- HttpSession.setMaxInactiveInterval(int interval)
 The method setMaxInactiveInterval(int interval) allows us to configure the time (in seconds) 
 between client requests before the servlet container will invalidate the session.
 That is, an idle session will be invalidated after the specified time.

See https://weblogs.java.net/blog/swchan2/archive/2013/08/29/when-httpsession-invalidated

HTTP session timemout can be configured via web.xml 

web.xml sample 
 <session-config>
        <session-timeout>
            30
        </session-timeout>
    </session-config>

setMaxInactiveInterval() sample
    HttpSession session = (HttpSession) FacesContext.getCurrentInstance().getExternalContext().getSession(true);
    if ( session == null)
      {
        throw new IllegalArgumentException(methodName+ ": Could not get HTTP session : ");    
      }
    session.setMaxInactiveInterval(5);

Note setMaxInactiveInterval configures the time (in seconds) between client requests 
and before the servlet container will invalidate the session.

After your initial HTTP GET request wait 5 seconds and send a HTTP POST request. 
This HTTP POST request will fail with:   
  javax.faces.application.ViewExpiredException: viewId:/index.xhtml - 
    View /index.xhtml could not be restored.
    at com.sun.faces.lifecycle.RestoreViewPhase.execute(RestoreViewPhase.java:210)
    at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:101)
    at com.sun.faces.lifecycle.RestoreViewPhase.doPhase(RestoreViewPhase.java:121)
    at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:198)
    at javax.faces.webapp.FacesServlet.service(FacesServlet.java:646)
    at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85)

Fix : You may Add an error page to your web.xml file
    <error-page>
        <exception-type>javax.faces.application.ViewExpiredException</exception-type>
        <location>/faces/index.xhtml</location>
    </error-page> 

Further details on Thread Safety

Never assign any request or session scoped data as an instance variable of a servlet or filter. 
It will be shared among all other requests in other sessions. 
That's threadunsafe! The below example illustrates that:

public class ExampleServlet extends HttpServlet 
  {
    private Object thisIsNOTThreadSafe;
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
      {
        Object thisIsThreadSafe;

        thisIsNOTThreadSafe = request.getParameter("foo"); // BAD!! Shared among all requests!
        thisIsThreadSafe = request.getParameter("foo"); // OK, this is thread safe.
     } 
 }

Reference

 

 

A deeper dive into Transaction Timeouts with JEE7,Wildfly and Oracle RAC

Potential Transaction Timeouts in a JEE7 / RAC 12.1.0.2 env

  1.  JTA Transaction Timeout triggering an OracleXAResource.rollback operation on our TM side [ Wildfly 8.2 ]
  2. JTA Transaction Timeout occuring on the RM side [ ORA-24756 ]
  3. DISTRIBUTED LOCK TIMEOUT occuring on the RM side [ ORA-2049 ]
  4. HTTPSession Object Timeout

Rule of Thumb

  •  JTA transaction timeout  <  DISTRIBUTED_LOCK_TIMEOUT  <  HTTP Session Timeout
  • In a working JTA env we only want to see JTA timeouts triggered by our TM  [ see 1. ]

Overview Bean Managed Transaction with JTA

  • Each JTA transaction is associated with an Execution Thread
  • Only a single Transaction can be active within an  Execution Thread
  • If one Transaction is active the user can’t start an new one until the TX was committed suspended or rolled back
  • The setTransactionTimeout() method causing  the transaction to roll back after reaching the timeout limit
  • Do not invoke the getRollbackOnly and setRollbackOnly methods of the EJBContext interface in bean-managed transactions [ BMT ]
  • For bean-managed transactions, invoke the getStatus and rollback methods of the UserTransaction interface
  • If a session times out within Oracle, then the Oracle database Process Monitor (PMON) will delete the transaction branch if not prepared or cause the transaction to become in-doubt if prepared.
  • We can track the Transaction Timeout at RM side using Event 10246
         alter system set event=”10246 trace name context forever, level 1″ scope=spfile sid=’*’;
  • If the PMON terminates the transaction branch you should get error: ORA-24756: transaction does not exist
  • If the Transaction can’t get a database lock you should get error: ORA-02049: timeout: distributed transaction waiting for lock
  • During this JPA test we use OnePhaseCommit [ means no 2PC / no prepare call ]
  • We always want that our TM is controlling the Transaction
  • If runnig with correct timeout settings we should not see any ORA-2049 or ORA-24756 errors

How to track the XA transaction at database level

  • FORMATID, GLOBALID, BRANCHID build our  XA Xid
  • TIGHTLY COUPLED means we serialize using a DX lock [ can lead to a performance problem ]  and can use RAC ClusterWide Transaction Feature
  • PREPARECOUNT=0 this Tx is not yet prepared
  • STATE ACTIVE means this transaction is active
SQL> select FORMATID, GLOBALID, BRANCHID, BRANCHES, REFCOUNT, PREPARECOUNT, STATE, 
     FLAGS, COUPLING  from v$global_transaction;

  FORMATID GLOBALID
---------- ----------------------------------------------------------------
BRANCHID
--------------------------------------------------------------------------------------------------------------------------------
  BRANCHES   REFCOUNT PREPARECOUNT STATE                       FLAGS COUPLING
---------- ---------- ------------ -------------------------------------- ---------- ---------------
    131077 00000000000000000000FFFFC0A805C935ABDAA855FBBB600000046931
00000000000000000000FFFFC0A805C935ABDAA855FBBB600000046C0000000200000000
     1        1         0 ACTIVE                       0 TIGHTLY COUPLED

TEST 1:  Testing global JTA Timeout

  • The service routine will run 60 seconds before committing the data
  • The JPA timeout will rollback the transaction after the JTA timeout [ 30 seconds ]
  • The Transaction Reaper Wildfly Thread will rollback the transaction after 30 second
  • Later on the commit fails as we don’t run inside a transaction anymore
Setup
HTTP Session Timeout       : 1800 seconds
JTA Timeout                :   30 seconds
ut.setTransactionTimeout() :  not active
Service routine sleep      :   60 seconds 

Setup and verify  JTA Transaction Timeout : 30 seconds:

$ $WILDFLY_HOME/bin/jboss-cli.sh  --connect --controller=localhost:9990 '/subsystem=transactions/:write-attribute(name=default-timeout,value=30)';
{
    "outcome" => "success",
    "result" => 30,
    "response-headers" => {"process-state" => "reload-required"}
}

$  $WILDFLY_HOME/bin/jboss-cli.sh  --connect --controller=localhost:9990 '/subsystem=transactions/:read-attribute(name=default-timeout)'
{
    "outcome" => "success",
    "result" => 30
}

-> Restart Wildfly
-> Run test test case
 
Wildfly Logs:
10:18:52,211 INFO  [stdout] (default task-15) +++ HttpSessionListenerImpl:: sessionCreated()....
10:18:55,489 INFO  [org.eclipse.persistence.connection] (default task-36) 
        Connected: jdbc:oracle:thin:@ract2-scan.grid12c.example.com:1521/banka
    User: SCOTT
    Database: Oracle  Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    Driver: Oracle JDBC driver  Version: 12.1.0.2.0
...
2015-09-17 10:18:55,500 INFO  [org.eclipse.persistence.connection] (default task-17) vfs:/usr/local/wildfly-8.2.0.Final/standalone/deployments/WFJPA2EL-1.0.war/WEB-INF/classes/_jpaELPU login successful
10:19:27,365 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffffc0a805c9:6a216e79:55fa71c5:72 in state  RUN
10:19:27,367 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffffc0a805c9:6a216e79:55fa71c5:72 invoked while multiple threads active within it.
10:19:27,368 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffffc0a805c9:6a216e79:55fa71c5:72 aborting with 1 threads active!
10:19:27,383 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffffc0a805c9:6a216e79:55fa71c5:72
10:19:57,381 WARN  [com.arjuna.ats.arjuna] (default task-17) ARJUNA012077: Abort called on already aborted atomic action 0:ffffc0a805c9:6a216e79:55fa71c5:72

-> JPA connection returned at 10:18:55
   About 32 seconds later the global JTA timeout kicks in and terminates the transaction a

Stack :
10:19:57.382 Error in top level function: timeoutTest()
10:19:57.384 ARJUNA016102: The transaction is not active! Uid is 0:ffffc0a805c9:6a216e79:55fa71c5:72
10:19:57.385 javax.transaction.RollbackException: ARJUNA016102: The transaction is not active! Uid is 0:ffffc0a805c9:6a216e79:55fa71c5:72
    at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1156)
    at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)
    at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
    at org.jboss.tm.usertx.client.ServerVMClientUserTransaction.commit(ServerVMClientUserTransaction.java:173)
    at com.hhu.wfjpa2el.JPATestBean.timeoutTest(JPATestBean.java:456)

TEST 2: Overriding global JTA Timeout with  ut.setTransactionTimeout()

  • This test is similar to TEST 1
  • it only overrides the JTA transaction timeout with setTransactionTimeout()
Setup 
HTTP Session Timeout       : 1800 seconds
JTA Timeout                :   30 seconds
ut.setTransactionTimeout() :   15 seconds
Service routine sleep      :   60 seconds 

Java Code 
     ...
     ut  = (UserTransaction)new InitialContext().lookup("java:comp/UserTransaction");          
     ut.setTransactionTimeout(getTxTimeout());                     
     ut.begin();
     ...

Wildfly Log :
2015-09-17 10:58:58,426 INFO  [org.eclipse.persistence.connection] (default task-36) 
        Connected: jdbc:oracle:thin:@ract2-scan.grid12c.example.com:1521/banka
    User: SCOTT
    Database: Oracle  Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    Driver: Oracle JDBC driver  Version: 12.1.0.2.0
2015-09-17 10:58:58,431 INFO  [org.eclipse.persistence.connection] (default task-36) vfs:/usr/local/wildfly-8.2.0.Final/standalone/deployments/WFJPA2EL-1.0.war/WEB-INF/classes/_jpaELPU login successful
2015-09-17 10:59:15,094 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffffc0a805c9:6a216e79:55fa71c5:11a in state  RUN
2015-09-17 10:59:15,096 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffffc0a805c9:6a216e79:55fa71c5:11a invoked while multiple threads active within it.
2015-09-17 10:59:15,097 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffffc0a805c9:6a216e79:55fa71c5:11a aborting with 1 threads active!
2015-09-17 10:59:15,115 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffffc0a805c9:6a216e79:55fa71c5:11a

-> JPA connection returned at  10:58:58,
-> About 17 seconds later the Timeout value set by ut.setTransactionTimeout()  kicks in and terminates the transaction a
-> As in TEST 1 the commit fail with a similar stack  

TEST 3 : Stop all TM Threads and  let the RM [ Resource Manager ] terminate the transaction

  • In this test set a breakpoint at OracleXAResource.commit to stop all WildFly Threads
  • This will avoid that any WildFly Thread is running a ROLLBACK
  • You may read following article to setup a XA Breakpoints with Netbeans
  • As no client ROLLBACK operation can terminate the transaction PMON takes over control and terminates the TX
Setup 
HTTP Session Timeout       : 1800 seconds
JTA Timeout                :   30 seconds
ut.setTransactionTimeout() :   20 seconds
Service routine sleep      :  not active

PMON Tracing 
Enable PMON tracing   for our RAC 12.1.0.2 database 
SQL> alter system set event="10246 trace name context forever, level 1" scope=spfile sid='*';
SQL> startup force
SQL> show parameter event
SQL>
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
even t                                string      10246 trace name context forever, level 1

PMON trace 
*** 2015-09-18 11:47:04.143
 bno=1 timeout=20 dtime=1442569601 ctime=1442569623 ser=1 evt=1
ksuprog() called at ktur.c:3906
[claim lock 0x74cb0100 for dead session/txn in procp 0x76b29430.18437][hist x424224a1]
deleted branch 0x72558fc0 in first attempt
PMON last posted from location=FILE:/ade/b/2502491802/oracle/rdbms/src/hdir/ksa2.h LINE:294 ID:ksasnr, process=47, post_num=41

-> timeout=20  translates to our JTA timing set with  ut.setTransactionTimeout(20); 
-> The RM ( Oracle Database ) has terminated this Transaction 
-> The transaction timeout was to 20 [ timeout=20 ] 

Errors  :
  XA Failure     : XAException.XAER_NOTA: oracle.jdbc.xa.OracleXAException
  ORacle Failure : java.sql.SQLException: ORA-24756: transaction does not exist
 
Java Stack :
Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options jndiName=java:jboss/datasources/xa_rac12g_banka]) failed with exception XAException.XAER_NOTA: oracle.jdbc.xa.OracleXAException
    at oracle.jdbc.driver.T4CXAResource.kputxrec(T4CXAResource.java:965)
    at oracle.jdbc.driver.T4CXAResource.doCommit(T4CXAResource.java:454)
    at oracle.jdbc.xa.client.OracleXAResource.commit(OracleXAResource.java:583)
    at org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.commit(XAManagedConnection.java:338)
    at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.commit(XAResourceWrapperImpl.java:107)
    at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelOnePhaseCommit(XAResourceRecord.java:679) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.arjuna.coordinator.BasicAction.onePhaseCommit(BasicAction.java:2317) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1475) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:96) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1166) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126) [jbossjta-4.16.6.Final.jar:]
    at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
    at org.jboss.tm.usertx.client.ServerVMClientUserTransaction.commit(ServerVMClientUserTransaction.java:173)
    at com.hhu.wfjpa2el.JPATestBean.timeoutTest(JPATestBean.java:466) [classes:]

Caused by: java.sql.SQLException: ORA-24756: transaction does not exist
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:392)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:385)
    at oracle.jdbc.driver.T4CTTIfun.processError(T4CTTIfun.java:1018)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:522)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
    at oracle.jdbc.driver.T4CTTIOtxen.doOTXEN(T4CTTIOtxen.java:166)
    at oracle.jdbc.driver.T4CXAResource.doTransaction(T4CXAResource.java:757)
    at oracle.jdbc.driver.T4CXAResource.doCommit(T4CXAResource.java:428)

TEST 4 : Understand and Testing DISTRIBUTED_LOCK_TIMEOUT

  • DISTRIBUTED_LOCK_TIMEOUT default value is 60 seconds
  • If a DML can get the lock within DISTRIBUTED_LOCK_TIMEOUT ORA-2049 is thrown
If the locking transaction runs for longer than the database DISTRIBUTED_LOCK_TIMEOUT  (e.g. a single phase trans
action with no timeout) a deadlock  type situation occurs and the XA / Oracle distributed transactions can fail 
with “ORA-02049: timeout: distributed transaction waiting for lock”. If the ordering of timeouts is correct, then 
this error should never be received within a XA distributed transaction since the TM timeout should be exceeded 
before the DISTRIBUTED_LOCK_TIMEOUT is reached

Setup 
HTTP Session Timeout       : 1800 seconds
JTA Timeout                :   30 seconds
ut.setTransactionTimeout() :  120 seconds
DISTRIBUTED_LOCK_TIMEOUT   :   60 seconds

As ut.setTransactionTimeout() < DISTRIBUTED_LOCK_TIMEOUT  the database will throw ORA-2049 errors


Prepare test  
SQL>  show parameter dist
NAME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
distributed_lock_timeout         integer     60
Note PMON runs only every 60 seconds so we can see a delay between 60 and 120 seconds untill we hot ORA-2049

Lock the records
SQL> select * from emp2 for update;
     EMPNO ENAME      JOB           SAL
---------- ---------- --------- ----------
      9997 Helmut     Progr.         63206

Our JPA code will start a BMT and update the same record resulting in a ORA-2049 error 
  UPDATE EMP2 SET SAL = ? WHERE (EMPNO = ?)

Call flow 
  ut.begin()
  JPA update gets blocked untill we hit ORA-2049 error

Verify the Transation status before getting ORA-2049
SQL> select FORMATID, GLOBALID, BRANCHID, BRANCHES, REFCOUNT, PREPARECOUNT, STATE, FLAGS, COUPLING
           from v$global_transaction;

  FORMATID GLOBALID
---------- ----------------------------------------------------------------
BRANCHID
--------------------------------------------------------------------------------------------------------------------------------
  BRANCHES   REFCOUNT PREPARECOUNT STATE                       FLAGS COUPLING
---------- ---------- ------------ -------------------------------------- ---------- ---------------
    131077 00000000000000000000FFFFC0A805C935ABDAA855FBBB60000003C931
00000000000000000000FFFFC0A805C935ABDAA855FBBB60000003CC0000000200000000
     1        1         0 ACTIVE                       0 TIGHTLY COUPLED


Error Stacks 
12:06:04.535 Error in top level function: timeoutTest()

12:06:04.536 Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLSyntaxErrorException: ORA-02049: timeout: distributed transaction waiting for lock

Error Code: 2049
Call: UPDATE EMP2 SET SAL = ? WHERE (EMPNO = ?)
    bind => [2 parameters bound]
Query: UpdateObjectQuery(com.hhu.wfjpa2el.Emp2[ empno=9997 ])

12:06:04.539 javax.persistence.PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLSyntaxErrorException: ORA-02049: timeout: distributed transaction waiting for lock

Error Code: 2049
Call: UPDATE EMP2 SET SAL = ? WHERE (EMPNO = ?)
    bind => [2 parameters bound]
Query: UpdateObjectQuery(com.hhu.wfjpa2el.Emp2[ empno=9997 ])
    at org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:868)
    at com.hhu.wfjpa2el.JPATestBean.timeoutTest(JPATestBean.java:472)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)


2015-09-18 12:27:05,295 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use 
for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=29, bqual_length=36, 
  tx_uid=0:ffffc0a805c9:35abdaa8:55fbbb60:49, node_name=1, branch_uid=0:ffffc0a805c9:35abdaa8:55fbbb60:4c, subordinatenodename=null, 
  eis_name=java:jboss/datasources/xa_rac12g_banka >, heuristic:   TwoPhaseOutcome.FINISH_OK, product: 
  Oracle/Oracle Database 12c12:27:05,295 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: 
    Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord 
       < resource:null, txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a805c9:35abdaa8:55fbbb60:49, 
       node_name=1, branch_uid=0:ffffc0a805c9:35abdaa8:55fbbb60:4c, subordinatenodename=null, eis_name=java:jboss/datasources/xa_rac12g_banka >, 
    heuristic: TwoPhaseOutcome.FINISH_OK, product: Oracle/Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production


As we don't rollback the Transaction in our sample this is done by the Transaction Reaper Wildfly thread 
"Transaction Reaper Worker 1"
oracle.jdbc.xa.client.OracleXAResource.rollback(OracleXAResource.java:945)
org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.rollback(XAManagedConnection.java:346)
org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.rollback(XAResourceWrapperImpl.java:196)
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelAbort(XAResourceRecord.java:355)
com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2939)
com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2918)
com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1632)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:116)
com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215)
com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:377)
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78)

TEST 5 : Testing HTTP session Timeout

  • In this test the service routine runs 5 seconds longer [ see sleep 20 s ] compared to our HTTP session limit
  • When returned from our service routine the HTTP session object was already deleted
  • Again The XA transaction will be rolled back by the Wildfly Transaction Reaper Worker Thread
Setup :
HTTP Session Timeout                :   15 seconds
JTA Timeout                         :   30 seconds
ut.setTransactionTimeout()          :   not active
Service routine duration/sleep      :   20 seconds 

Java Code : 
  HttpSession session;
  ...
  session.setMaxInactiveInterval(getSessTimeout());

Wildfly Log reports :
15:55:35,839 SEVERE [javax.enterprise.resource.webcontainer.jsf.application] (default task-5) Error Rendering View[/index.xhtml]: java.lang.IllegalStateException: UT000010: Session not found Iu250lTlBLzIu_FlPHWNxq5_
    at io.undertow.server.session.InMemorySessionManager$SessionImpl.getAttribute(InMemorySessionManager.java:353) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
    at io.undertow.servlet.spec.HttpSessionImpl.getAttribute(HttpSessionImpl.java:121) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
    at com.sun.faces.context.SessionMap.get(SessionMap.java:118) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at com.sun.faces.application.view.FaceletViewHandlingStrategy.getResponseEncoding(FaceletViewHandlingStrategy.java:1271) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at com.sun.faces.application.view.FaceletViewHandlingStrategy.createResponseWriter(FaceletViewHandlingStrategy.java:1182) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at com.sun.faces.application.view.FaceletViewHandlingStrategy.renderView(FaceletViewHandlingStrategy.java:403) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at com.sun.faces.application.view.MultiViewHandler.renderView(MultiViewHandler.java:133) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at javax.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:337) [jboss-jsf-api_2.2_spec-2.2.8.jar:2.2.8]
    at javax.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:337) [jboss-jsf-api_2.2_spec-2.2.8.jar:2.2.8]
    at com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:120) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:101) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:219) [jsf-impl-2.2.8-jbossorg-1.jar:]
    at javax.faces.webapp.FacesServlet.service(FacesServlet.java:647) [jboss-jsf-api_2.2_spec-2.2.8.jar:2.2.8]
    at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]

Which Thread is rolling back the transaction ?

Setting a breakpoint on OracleXAResource.rollback shows that the Transaction Reaper is rolling back the Transaction 
"Transaction Reaper Worker 2"
oracle.jdbc.xa.client.OracleXAResource.rollback(OracleXAResource.java:945)
org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.rollback(XAManagedConnection.java:346)
org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.rollback(XAResourceWrapperImpl.java:196)
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelAbort(XAResourceRecord.java:355)
com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2939)
com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2918)
com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1632)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:116)
com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215)
com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:377)
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78)

JAVA Test CODE

 public String timeoutTest()  
      {
        EntityManager em;
        String methodName = "timeoutTest()";
        UserTransaction ut =null;
        cleanall();
          try
          {    
            setRunTimeInfo("Calling " + methodName  +  " in progress .. ");
            setRunTimeInfo("Requesting Entity Manager.. ");
            
            HttpSession session = (HttpSession) FacesContext.getCurrentInstance().getExternalContext().getSession(true);
            if ( session == null)
              {
                throw new IllegalArgumentException(methodName+ ": Could not get HTTP session : ");    
              }                        
            /*
                HTTP session object is not thread safe. We need to syncronoise on session id.
                For datails read :  http://yet-another-dev.blogspot.de/2009/08/synchronizing-httpsession.html
            */
            final Object lock = session.getId().intern();  
            Emp2 e = null;
            long eno = -1;
            synchronized(lock) 
              {
                
                if ( getTestMode() == 4)
                  {
                    setSessTimeout(15);
                    setTxTimeout(100); 
                    session.setMaxInactiveInterval(getSessTimeout());
                    setRunTimeInfo("HTTP SessionTimeout() set to: " 
                            + getSessTimeout()  + " seconds");
                  }
                
                if ( getTestMode() == 3)
                  {
                    setTxTimeout(100);  
                    //setSessTimeout(15);
                    setRunTimeInfo("ut.setTransactionTimeout() set to: " 
                            + getTxTimeout()  + " seconds");
                    
                  } 
                
                if ( getTestMode() == 2)
                  {
                    setTxTimeout(20);  
                    //setSessTimeout(15);
                    setRunTimeInfo("Overriding global JTA Timeout : 30 seconds with  ut.setTransactionTimeout() set to: " 
                            + getTxTimeout()  + " seconds");
                    
                  }  
                if ( getTestMode() == 1)
                  {
                    setTxTimeout(15);    
                    setRunTimeInfo("Overriding global JTA Timeout : 30 seconds with  ut.setTransactionTimeout() set to: " 
                            + getTxTimeout()  + " seconds");
                    
                  }    
                if ( getTestMode() == 0)
                  {
                    setRunTimeInfo("Testing global JTA Timeout : 30 seconds");
                  }   
                
                // startAndSuspendSimple(session);
                trackSession(session, methodName );
                em=getEntityManager();
                eno= getEmpno();              
                setRunTimeInfo("Inside " + methodName + " - Empno: " + eno + " - Thread: " +  Thread.currentThread().getName());                             
                e = em.find(Emp2.class, eno );
                checkEntity(e,em);
                e.setSal(e.getSal().add(increaseSalStep1) );
                ut  = (UserTransaction)new InitialContext().lookup("java:comp/UserTransaction");
                String tx_status_before = returnTXStatus(ut);                  
            
                if ( getTestMode() == 1 ||  getTestMode() == 2 || getTestMode() == 3 )
                  {
                    ut.setTransactionTimeout(getTxTimeout());
                    setRunTimeInfo("JTA Transation started with  ut.setTransactionTimeout() set to  : " + getTxTimeout() + " seconds");
                  }    
                ut.begin();
                em.joinTransaction();
                e.setSal(e.getSal().add(increaseSalStep1) );
                em.merge(e);
                    // Update database record 
                em.flush();
                int sleep_time = 20;
                sleep(sleep_time*1000);
               
                ut.commit();
                setRunTimeInfo("Commit Called ");
               
                String tx_status_after = returnTXStatus(ut);     
                setRunTimeInfo(methodName + ": em.flush() // but NO ut.commit   - Before " + tx_status_before
                    + " - After " + tx_status_after);
                checkEntity(e,em);
                setRunTimeInfo("Closing  Entity Manager.. !");
                closeEntityManager();
              }
          } catch ( Throwable t1)
          { 
            // ut.rollback() operation is missing here !           
            setRunTimeInfo("FATAL ERROR in :  "  + methodName  );
                // Use Throwable as we don't want to loose any important imformation
                // Note: Throwable is super class of Exception class          
            genericException("Error in top level function: " + methodName , (Exception)t1);            
          }
        closeEntityManager();            
        setRunTimeInfo("Leaving " + methodName + " - Entity manager  closed !\n");
        return "index";
      }

 

Reference

Debugging XA Calls in a JPA session running BMT with Netbeans 8.0.2

Overview

BMT calls          JDBC Driver calls 
ut.begin()         oracle.jdbc.xa.client.OracleXAResource   
... 
ut.commit()        oracle.jdbc.xa.client.OracleXAResource.end
                   oracle.jdbc.xa.client.OracleXAResource.commit  <--- Here we want to set a Breakpoint 

Prepare the Breakpoint

Class Name           : oracle.jdbc.xa.client.OracleXAResource
Method Name          : commit
Stop on              : Method Entry 
Actions - Suspend    : All Threads
  •  Select suspend all Threads if you want to avoid that any  Wildly Thread is running a rollback
  •  If you want Transaction Recovery takes place by other Wildfly Threads you may select Suspend Breakpoint Thread
  •  Don’t use  oracle.jdbc.xa.OracleXAResource for debugging as this package only defines abstract methods

Start the Netbeans project in DEBUG MODE

Netbeans Debug -> Debug Main Project 
--> Above action will redeploy our project in DEBUG mode

NetBeans: Deploying on WildFly Application Server
    profile mode: false
    debug mode: true
    force redeploy: true
  Undeploying ...
  Initial deploying WFJPA2EL to /usr/local/wildfly-8.2.0.Final/standalone/deployments/WFJPA2EL-1.0.war
  Completed initial distribution of WFJPA2EL
  Deploying /usr/local/wildfly-8.2.0.Final/standalone/deployments/WFJPA2EL-1.0.war
  Application Deployed

Debugger console reports successfully setting the breakpoint in [oracle.jdbc.xa.client.OracleXAResource].commit
  Attaching to localhost:8787
  MethodBreakpoint [oracle.jdbc.xa.client.OracleXAResource].commit successfully submitted.
  User program running

After the program hits a breakpoint runs Debugger Console should report 
Method breakpoint hit in oracle.jdbc.xa.client.OracleXAResource.commit at line 553 by thread default task-3.
Thread default task-3 stopped at OracleXAResource.java:553.

How to Copy the Stack Trace after our Worker Thread stops at a Breakpoint

Select Debugging TAB
  default task-7  [ this Thread was stopped by our Debugger ]
     Hidden Soruce Calls
       OracleXAResource.java:553   <-- Right Click here and select  COPY STACK
Stack Sample 
"default task-7"
oracle.jdbc.xa.client.OracleXAResource.commit(OracleXAResource.java:553)
org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.commit(XAManagedConnection.java:338)
org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.commit(XAResourceWrapperImpl.java:107)
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelOnePhaseCommit(XAResourceRecord.java:679)
com.arjuna.ats.arjuna.coordinator.BasicAction.onePhaseCommit(BasicAction.java:2317)
com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1475)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:96)
com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1166)
com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)
com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
org.jboss.tm.usertx.client.ServerVMClientUserTransaction.commit(ServerVMClientUserTransaction.java:173)
com.hhu.wfjpa2el.JPATestBean.timeoutTest(JPATestBean.java:466)

JPA: Best practice to get and close EntityManagerFactory to avoid Unknown entity bean class error !

Problem Description

After redeploying a WebApplication using JPA you see the following error:

13:02:18.090 Unknown entity bean class: class com.hhu.wfjpa2el.Emp2, 
 please verify that this class has been marked with the @Entity annotation.

13:02:18.091 java.lang.IllegalArgumentException: Unknown entity bean class: class com.hhu.wfjpa2el.Emp2, 
  please verify that this class has been marked with the @Entity annotation.

Solution

  • Restart your WEBServer   or
  • Install a ServletContextListener and close the EMF during contextDestroyed() Event

JAVA Code for implementing a WebListener

package com.hhu.wfjpa2el;

import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.servlet.ServletContextEvent;
import javax.servlet.ServletContextListener;
import javax.servlet.annotation.WebListener;
import org.slf4j.LoggerFactory;

@WebListener
public class EMF implements ServletContextListener {

    private static EntityManagerFactory emf;
    final private static String pu = "jpaELPU";
    final static org.slf4j.Logger logger = LoggerFactory.getLogger(ServletContextListenerImpl.class);
    
    @Override
    public void contextInitialized(ServletContextEvent event) {
        logger.info("+++ ServletContextListener : contextInitialized - Inititalizing EMF for PU: " + pu);
        emf = Persistence.createEntityManagerFactory(pu);
        logger.info("+++ ServletContextListener : contextInitialized - Init EMF done for PU: " + pu);
    }

    @Override
    public void contextDestroyed(ServletContextEvent event) {
        logger.info("+++ ServletContextListener : contextDestroyed - Closing EMF for PU: " + pu);
        emf.close();
        logger.info("+++ ServletContextListener : contextDestroyed - Closed EMF done for PU " + pu);
    }

    public static EntityManager createEntityManager() {
        if (emf == null) {
            throw new IllegalStateException("Context is not initialized yet.");
        }
        return emf.createEntityManager();
    }
}

Initialize your Entity Manager by running :  em = EMF.createEntityManager();
Java Code: 
public static EntityManager getEntityManager() 
      {
        EntityManager em = threadLocal.get();
        if (em == null) 
          {
          // setRunTimeInfo(getRunTimeInfo() + Tools.add_hmtl_pre_tag("Creating Entity Manager ! "  ));
            if ( enableLogger )            
                logger.info("Creating Entity Manager Factory ..." );
            em = EMF.createEntityManager();
            threadLocal.set(em);
          }
        return em;
      }

Testing the Code wiht maven deploy/undeploy command

Deploy the WebApplication 
[oracle@wls1 WFJPA2EL]$  mvn wildfly:deploy

Wilfdly Log report:  
13:11:23,170 INFO  [stdout] (MSC service thread 1-4) 13:11:23 [MSC service thread 1-4] INFO  c.h.w.ServletContextListenerImpl - i
                   +++ ServletContextListener : contextInitialized - Init EMF done for PU: jpaELPU
13:11:23,173 INFO  [javax.enterprise.resource.webcontainer.jsf.config] (MSC service thread 1-4) Initializing Mojarra 2.2.8-jbossorg-1 20140822-1131 for context '/WFJPA2EL-1.0'
13:11:23,518 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) JBAS017534: Registered web context: /WFJPA2EL-1.0
13:11:24,010 INFO  [org.jboss.as.server] (management-handler-thread - 1) JBAS018559: Deployed "WFJPA2EL-1.0.war" (runtime-name : "WFJPA2EL-1.0.war")

Undeploy the WebApplication 
[oracle@wls1 WFJPA2EL]$   mvn wildfly:undeploy 

Wilfdly Log report:  
13:11:23,170 INFO  [stdout] (MSC service thread 1-4) 13:11:23 [MSC service thread 1-4] INFO  c.h.w.ServletContextListenerImpl - i
                    +++ ServletContextListener : contextInitialized - Init EMF done for PU: jpaELPU
13:11:23,173 INFO  [javax.enterprise.resource.webcontainer.jsf.config] (MSC service thread 1-4) Initializing Mojarra 2.2.8-jbossorg-1 20140822-1131 for context '/WFJPA2EL-1.0'
13:11:23,518 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) JBAS017534: Registered web context: /WFJPA2EL-1.0
13:11:24,010 INFO  [org.jboss.as.server] (management-handler-thread - 1) JBAS018559: Deployed "WFJPA2EL-1.0.war" (runtime-name : "WFJPA2EL-1.0.war")
13:13:02,751 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) JBAS017535: Unregistered web context: /WFJPA2EL-1.0

 

Reference