How ASM fixes block corruption with NORMAL redundancy ?

Overview

ASM always use the primary AU for to read data. If the primary AU is corrupted then ASM will read the secondary AU. 
If the secondary AU is well then ASM tries to overwrite the corrupted primary AU using the secondary AU. If the 
corrupted primary AU is fixed then that AU will be the primary AU as always. If the corrupted primary AU can’t 
be overwritten then ASM tries to write the new AU to other location in the disk. If that write operation is successfully 
then that AU will be the new primary AU.

Prepare test case

Checking OS location for ASM AU
SQL> select PXN_KFFXP, -- physical extent number \
      XNUM_KFFXP, -- virtual extent number
      DISK_KFFXP, -- disk number
      AU_KFFXP,    -- allocation unit number
      decode(LXN_KFFXP,0,'Primary',1,'Secondary','header metadata') "AU type"
    from X$KFFXP
    where NUMBER_KFFXP=256 -- ASM file 272
    AND GROUP_KFFXP=4 -- group number 1
    order by 1;

SK_KFFXP   AU_KFFXP AU type
---------- ---------- ---------- ---------- ---------------
     0        0           0    144 Primary
     1        0           1    144 Secondary
     2        1           1    145 Primary
     3        1           0    145 Secondary
Summary 
--> Data block OFFset: 133
--> Database block size: 8k
--> ASM File number : 256
--> ASM DG : 4
--> AU size: 1Mbyte
--> ASM disks :  Disk# 0 :  /dev/asm_test_1G_disk1 - Disk# 1: /dev/asm_test_1G_disk2
--> ASM disk  /dev/asm_test_1G_disk2 is the Primary for AU 145 

Reading the Primary AU
[root@grac41 Desktop]#  dd if=/dev/asm_test_1G_disk2  bs=8k  count=1 skip=18565   | od -a
0017760 stx   A stx  bs   A   S   M   -   T   E   S   T soh ack   i   n

Reading Secondary AU 
[root@grac41 Desktop]#  dd if=/dev/asm_test_1G_disk1 bs=8k  count=1 skip=18565   | od -a
0017760 stx   A stx  bs   A   S   M   -   T   E   S   T soh ack   i   n

Erasing 8k block in our primary AU and verify deletiondd if=/dev/zero of=/dev/asm_test_1G_disk2  bs=8k  count=1 seek=18565 
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.0345691 s, 237 kB/s
#  dd if=/dev/asm_test_1G_disk2  bs=8k  count=1 skip=18565   | od -a
0000000 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
1+0 records in
1+0 records out

SQL> select * from test_tab;
     N NAME
---------- ----------------
     1 ASM-TEST 
--> Data is still valid - Is primary block already fixed by ASM ?

ot@grac41 Desktop]#  dd if=/dev/zero of=/dev/asm_test_1G_disk2  bs=8k  count=1 seek=18565
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.0279776 s, 293 kB/s
--> Block is still corrupted 

Flush buffer cache and monitor Database alert.log 
SQL> alter system flush buffer_cache;
System altered.

SQL> select * from test_tab;
         N NAME
---------- ----------------
         1 ASM-TEST

Checking alert.log 
Mon Jul 14 18:09:25 2014
ALTER SYSTEM: Flushing buffer cache
Mon Jul 14 18:09:42 2014
Hex dump of (file 7, block 133) in trace file /u01/app/oracle/diag/rdbms/grac4/grac41/trace/grac41_ora_29037.trc
Corrupt block relative dba: 0x01c00085 (file 7, block 133)
Completely zero block found during multiblock buffer read
Reading datafile '+TEST/grac4/datafile/test_ts.256.852905863' for corruption at rdba: 0x01c00085 (file 7, block 133)
Read datafile mirror 'TEST_0001' (file 7, block 133) found same corrupt data (no logical check)
Read datafile mirror 'TEST_0000' (file 7, block 133) found valid data
Hex dump of (file 7, block 133) in trace file /u01/app/oracle/diag/rdbms/grac4/grac41/trace/grac41_ora_29037.trc
Repaired corruption at (file 7, block 133)
--> block fixed by reading data from secondary disk 'TEST_0000' (file 7, block 133)

Verify fix using dd
[root@grac41 Desktop]#  dd if=/dev/asm_test_1G_disk2  bs=8k  count=1 skip=18565   | od -a
0000200 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
0017740 nul nul nul nul nul nul nul nul nul nul nul nul nul   , soh stx
0017760 stx   A stx  bs   A   S   M   -   T   E   S   T soh ack   i   n
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied0020000

Summary

 

  • ASM recovers automatically the primary AU if it is corrupted.
  • The secondary AU will not be used unless a disk fail occurs.
  • The secondary AU is used for recovering the primary AU.
  • If ASM can’t overwrite the primary AU it will write the new primary AU in other disk part.
  • ASM writes an entry in the alert log when a recovering process occurs.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *