Transaction recovery in a RAC env

Problem: Transaction recovery: lock conflict caught and ignored in alert.log

Problem description:
Alert.log :
2PC commit crashes due to TEXT
Error 604 trapped in 2PC on transaction 39.4.8558636. Cleaning up.
Error stack returned to user:
ORA-00604: error occurred at recursive SQL level 1
ORA-06550: line 1, column 7:
PLS-00306: wrong number or types of arguments in call to 'SYNCRN'
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored
---
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored

dba_2pc_pending :
"local_tran_id"   "global_tran_id"                                 "state"         "mixed" "fail_time"                 "force_time"                    "retry_time"              "os_user" "os_terminal" "host" "db_user" "commit#"
79.5.15008388  1145324612.E8CC44F0C9769846A35406B43734A05500000000 forced rollback no 10.21.2014 23:07:59 10.21.2014 23:10:55 10.29.2014 05:43:28 epcx-qa NLWSL1124 DDNS\NLWSL1124 13756556056355
106.24.1303431 1145324612.04FB8C3C6874764FA5BEEA3449ADEB1E00000000 forced rollback no 10.22.2014 03:13:17 10.22.2014 03:25:55 10.29.2014 05:43:28 epcx-qa NLWSL1124 DDNS\NLWSL1124 13756628791127
31.10.19004224 1145324612.E4A2D008CCE7884AA82662FB01BA6C5700000000 forced rollback no 10.28.2014 10:35:39 10.28.2014 10:40:55 10.29.2014 04:25:16 irecruit-qa NLWSL172 DDNS\NLWSL172 13761020351676
--> Admin has forced rollback that session

Lets concentrate on the last TX local_tran_id  31.10.19004224
Tx failed at 10.28.2014 10:35:39
Tx forced at 10.28.2014 10:40:55    <- TX forced rollback
Tx retry   at 10.29.2014 04:25:16   <- Retry fails due to FORCED ROLLBACK

Checking transaction size and status for DEAD transactions  see  (Doc ID 1494886.1)
===========
ktuxeusn –  Undo Segment Number
ktuxeslt   –  Slot number
ktuxesqn –  Sequence
ktuxesta  –  State
ktuxesiz  –  Undo Blocks Remaining
ktuxecfl   –  Flag
================
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo from x$ktuxe
   where ktuxesta <> 'INACTIVE'
   and ktuxecfl like '%DEAD%'
   order by ktuxesiz asc;

      USN       SLOT        SEQ STATE                  UNDO
---------- ---------- ---------- ---------------- ----------
       43          4    8073724 ACTIVE                    1
       80         22   12620576 ACTIVE                    1
      192         21     102488 ACTIVE                    1
      124         16      58091 ACTIVE                    1
       85         30    3817135 ACTIVE                    1 

--> Seems we need only to recover one block !! but tx recovery fails due to some deadlock

Root cause :  Problem is that during a 2PC we are crashing due to TEXT bug + someone have rollbacked that TX with
              rollback force 'LOCAL_TRAN_ID'

C. installed patch: 
Bug 17501296   ORA-604 / PLS-306 attempting to delete rows from table with Text index after upgrade to 11.2.0.4

This should avoid new crashed in 2PC but the old TX are still not recovered.
In alert.log we can see  
Serial Transaction recovery caught exception 30319
Serial Transaction recovery caught exception 30319

Bug 17614504 : SMON: SERIAL TRANSACTION RECOVERY CAUGHT EXCEPTION 30319, HIGH CPU LOAD
 "Serial Transaction recovery caught exception 30319" is just an informative message and indicates that SMON 
 spent more that 2 and 1/2 minutes in txn recovery and the recovery has been interrupted as SMON has other pending
  tasks. This is common when instance is being recovered and there are very long to be recovered transactions.

Diagnostics :   
1) --> Set event : events(10015 and 30305) traces
see Bug 18526660 : SMON TRANSACTION RECOVERY IS VERY SLOW

0) select * from  x$ktuxe 
1) alter session set events 'immediate trace name ktpr_debug level 3';
 -> This should produce an ora* trace file, find it and upload it.
2) oradebug setospid <ospid of SMON process>
   oradebug unlimit
   oradebug dump errorstack 3
   oradebug event 10015 trace name context forever, level 10
   oradebug event 30305 trace name context forever, level 5
3) let it run for 15 mins with the trace events on
4) turn off the debug:
    oradebug setospid <ospid of SMON process>
    oradebug dump errorstack 3
    oradebug event 10015 trace name context off
    oradebug event 30305 trace name context off
5) select x$ktuxe 
  --> Set event : events(10015 and 30305) traces

Apply WA from Bug 18853638
Bug 18853638 - GV$FAST_START_TRANSACTIONS SHOWS STANDSTILL RECOVERING TRANSACTIONS (RCA)
The issue was not resolved until we suggest the customer to set '_smu_debug_mode' to 1024 on 5 nodes out
of the 6 to force the transaction to be recovered only in one instance.

If you have a 4-note cluster run on 3 nodes:
ALTER SYSTEM SET "_smu_debug_mode"=1024;
This will prevent SMON from performing dead-transaction recovery . They can enable txn recovery later at
a more convenient time. The good part is that any data modified by this dead transaction but required by
other transactions will be recovered on "as needed" basis.

To resume transaction recovery do the following:
ALTER SYSTEM SET "_smu_debug_mode"=0;

Reference

 

  • PROCEDURE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY Specification (Doc ID 67590.1)
  • ORA-30019 When Executing Dbms_transaction.Purge_lost_db_entry (Doc ID 290405.1)
  • How To Resolve Stranded DBA_2PC_PENDING Entries (Doc ID 401302.1)
  • Bug 1561125  SMON may stop recovering dead transactions causing sessions to hang
  • Master Note: Troubleshooting Database Transaction Recovery (Doc ID 1494886.1)
  • Bug 18853638 – GV$FAST_START_TRANSACTIONS SHOWS STANDSTILL RECOVERING TRANSACTIONS (RCA)
  • Bug 17614504 : SMON: SERIAL TRANSACTION RECOVERY CAUGHT EXCEPTION 30319, HIGH CPU LOAD
  • Bug 18526660 : SMON TRANSACTION RECOVERY IS VERY SLOW

Leave a Reply

Your email address will not be published. Required fields are marked *