TPM for testing oracle clusterwide XA transactions

Overview

Goal :
 Write a low level TPM to test Oracle Custerwide Transaction 
 Parallelize work where possilbe by using queues and threads
 Currently the code runs with 3 Threads ( 1 TXProducer and 2 Worker threads ) 
    using 3 queues ( Global receiver and 2 worker queues ) 

Wishlist - Not yet implemented 
  Using UCP connection pool to test more threads for a more cluster load test  
  Using xa_recover to test XA transaction recovery 

Status
   WARNING : This code isn't tested well - use it at you own risk and run it only on your test system - 
             and of course there is no SUPPORT !

What is coming next  ?

  • Simulating  Weblogig feature : XA Transaction without Transaction Logs- and Optimization
For details see: WebLogic Server 12.1.3 New JTA feature description -XA Transaction without Transaction Logs- and Optimization

XA transaction layout sample ( run this in parallel )
Instance 1: xa_start DM xa_end  BR 001 
Instance 2: xa_start DM xa_end  BR 002
Instance 3: xa_start DM xa_end  BR 103  ( Note we flag branch with 10 )
--> This means Instance 3 is our Determiner Resource Manager flagged with branch prefix 10

Now run on the  NON-Determiner Resource Manager Instance 1 and Instance 2 the xa_prepare ( run this in parallel )
Instance 1: xa_prepare  BR 001 
Instance 2: xa_prepare  BR 002

Now run prepare on the Determiner RM
Instance 3: xa_prepare BR 103

--> When this goes through we that we need to commit the TX even the last prepare crashes before returning.

After the  crash we need to call xa_recover on all RMs. 
When we get back a flagged pending transaction we need to commit the data running  xa_commit. 

Questions: Do we need call xa_commit on all instances as on RAC we only nned to commit the last branch? 
           Need test whether this us true for a recovered TX.

Note If we don't get back a flagged TX from a Determiner RM we need to rollback.

Short Code explanation of TXProducer and XAWorker class

TXProducer has a generic Queue and spawns up to 2 XAWorker threads for handling 2 XA branches in parallel. 
The XAWorkers inherits the global queue receiver RQ from TXProducer and use that queue to send back status messages to the 
TXProducer after running an specific XA operation. The local XAWorker queues are named LQ1 and LQ2. 
The XAWorkers are reading from their local queue where they are waiting on worker requests assigned from the TXProducer thread .
Typical message protocol:
Loop :
TXProducer --->  write message  |     local Queue LQ1 created by XAWorker 1     | ---> XAWorker 1 reads from LQ1 
           --->  write message  |     local Queue LQ2 created by XAWorker 2     | ---> XAWorker 2 reads from LQ2 
                              ++++   --> After XAWorker processed their job <--- ++++
TXProducer <--- reads  messages | global Receive Queue RQ created by TXProdcuer | <--- XAWorker 1 writes processing status to RQ 
                                | global Receive Queue RQ created by TXProdcuer | <--- XAWorker 2 writes processing status to RQ
                              +++ TXProduce reads XA status message and assigns next worker actions +++  

To test clusterwide XA transaction XAWorker xaw[0] should  connect to Instance 1 and  XAWorker xaw[1] should   connect to Instance 2.
But for performance testing you can point both connections to a single instance too.

TXProducer holds a message array: in_mesg[0] and in_mesg[1]. 
XAWorker process xaw[0] uses message array mesg[0] and  xaw[1] uses message array mesg[2] - don't change the thread_id in these specific messages.
Each XAWorker has its own queue to communicate with the TXProducer instance. The TXProducer use write_message(Message out) to send a message 
the the XAWorker. 
Depending on the m.thread_id() the TXProducer picks up the right XAWorker queue by calling 
  s = xaw[ out.get_thread_id()-1].get_queue();
This way we can guarante the the write_message call will send the message to the correct queue. 
  write_message(in_mesg[0]) ->  in_mesg[0].get_thread_id() returns 1 --> Queue 1 --> Thread 1
  write_message(in_mesg[1]) ->  in_mesg[0].get_thread_id() returns 2 --> Queue 2 --> Thread 1
For generic messages m  you need to use m.set_thread_id() ( 1 or 2 ) to address the right queue
Note the  write_message() API is not clustered and need to invoked for every worker thread you want to send an action.  

The read_message() is different. As we need all branches for a XA transaction sychronized at a certain point ( after xa_end, after xa_prepare ) 
before we can schedule the next XA processing  step the  read message API needs to make sure that  all outstanding  branches/instances have 
returned a corecet XA status code to the TXProducer thread. 
A typical scenario
   TXProducer  sends a xa_prepare message mesg[0]  to xa_worker xaw[0] by using queue  xaw[0].get_queue() and message in_mesg[0]
   TXProducer  sends a xa_prepare message mesg[1]  to xa_worker xaw[1] by using queue  xaw[1].get_queue() and message in_mesg[1]
After the XAWorkers have finished their work they are sending back their reponse by using the global resceiver queue RQ.

This way of working is true for xa_processing ( thread of xa_start() DML xa_end () )  and for the xa_prepare step.
For the xa_commit step we send only a commit message to that node where have get return XA_OK (= 0 ) from  the xa_prepare ( RAC specific )
We don't send an xa_commit to that instance returning   XA_RDONLY  ( -3 )

Testing XID affinity accross Instances

One major goal of this program is to test the impact when a XA transaction branch is processed and prepared by a different instance 
XA transaction layout using 2 Branches Br1,Br2 and switching XIDs using 2PC
        Instance1                               Instance 2
        xa_start Br1                            xa_start Br2                           
        DML      Br1                            DML      Br2
        xa_end   Br1                            xa_end   Br2                       
                      <--- call switch_xid() ----> 
        xa_prepare Br2 -> XA_OK                 xa_prepare Br1 --> XA_RDONLY
        xa_commit  Br2                          -- do nothing here -- 

To see how the code works set debug level 1 and run only with a single transcation: 
[oracle@gract1 PERF]$ java Xatest_sw xa on 2 1 1 jdbc:oracle:thin:@gract1:1521/ERP jdbc:oracle:thin:@gract2:1521/ERP
..
 + log_XA()  0x1025-00000001.01  Thr_ID:1 Inst:ERP_3 Xa_err:0 Xa_prep:0  xa_end
 + log_XA()  0x1025-00000001.02  Thr_ID:2 Inst:ERP_1 Xa_err:0 Xa_prep:0  xa_end
  ------------- Switching XIDs before XA_prepare  
 + log_XA()  0x1025-00000001.02  Thr_ID:1 Inst:ERP_3 Xa_err:0 Xa_prep:0  xa_prepare
 + log_XA()  0x1025-00000001.01  Thr_ID:2 Inst:ERP_1 Xa_err:0 Xa_prep:3  xa_prepare
 + log_XA()  0x1025-00000001.02  Thr_ID:1 Inst:ERP_3 Xa_err:0 Xa_prep:0  xa_commit
--> The processiong step for XID 0x1025-00000001.01 occurs on Instance ERP_3  whereas the prepare step for the same XID happens on Instance ERP_1  
    The processiong step for XID 0x1025-00000001.02 occurs on Instance ERP_1  whereas the prepare step for the same XID happens on Instance ERP_3  

Here we can see that the XA transaction is switching from the XA branche for the prepare  
The follwing lines are active ( 
         // Here comes only 2PC-XA processing 
         {
         stats.dump_info(" ------------- Switching XIDs before XA_prepare  " , 1);
         switch_xids(in_mesg[0],  in_mesg[1] ); 

If you comment the line ( around line 325 ) the following output should be seen
       // Here comes only 2PC-XA processing 
         {
         // stats.dump_info(" ------------- Switching XIDs before XA_prepare  " , 1);
         // switch_xids(in_mesg[0],  in_mesg[1] );

 + log_XA()  0x1025-00000001.01  Thr_ID:1 Inst:ERP_3 Xa_err:0 Xa_prep:0  xa_end
 + log_XA()  0x1025-00000001.02  Thr_ID:2 Inst:ERP_1 Xa_err:0 Xa_prep:0  xa_end
 + log_XA()  0x1025-00000001.01  Thr_ID:1 Inst:ERP_3 Xa_err:0 Xa_prep:0  xa_prepare
 + log_XA()  0x1025-00000001.02  Thr_ID:2 Inst:ERP_1 Xa_err:0 Xa_prep:3  xa_prepare
 + log_XA()  0x1025-00000001.01  Thr_ID:1 Inst:ERP_3 Xa_err:0 Xa_prep:0  xa_commit
--> Here XID 0x1025-00000001.01 is processed by instance 1 for the xa processing, xa_prepare and xa_commit step
         XID 0x1025-00000001.02 is processed by instance 2 for the xa processing, xa_prepare step

Parameters usage   and GRANTs

$ java Xatest_sw xa on 2 5 1 jdbc:oracle:thin:@gract1:1521/ERP jdbc:oracle:thin:@gract2:1521/ERP
                 |  |  | | | |                                 +--> URL for Branch 1
                 |  |  | | | +--  URL for Branch 2
                 |  |  | | +----  Debug level : 0,1,2,3,4
                 |  |  | +------  Number of Tansactions  
                 |  |  +--------- Number of Transaction Brances: Valid : 1 or 2
                 |  +------------ 10046 tracing ON/OFF
                 +--------------- Mode : xa/sql  - run XA transaction / run pure SQL tests  
Needed Grants:
 grant select on v_$instance to scott;
 grant select on v_$statname to scott;
 grant select on v_$mystat to scott;
 grant alter session to scott;

Error handling

  - Errors sould be printed out as soon as the occcur using JAVA excpetions : e.getErrorCode() and e.getMessage () 
  - Set message status to error :  m.set_status("error") 
  - TXProducers read_message() methode will check for errors and terminated the program if needed 
  - *** Not yet implemented: Here we should rollback/recover failed XA transactions (  ORA-1591 errors )

 

Reference 

 

One thought on “TPM for testing oracle clusterwide XA transactions”

  1. Amazing blog! what a nice blog and informative to read and share your knowledge with your reader. It’s great to found your blog and read your post. Keep writing, Regards

Leave a Reply

Your email address will not be published. Required fields are marked *