Recover RAC database after loosing FRA diskgroup

FRA DG content

 

  • stroage for mulitplexed control file
  • storage for mulitplexed ONLINE REDO logs
  • storage for BCF file ( Block Change Tracking file )

Prepare test case

[grid@grac41 Desktop]$  asmcmd lsdg FRA
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     40952    31513                0           31513              0             N  FRA/
[grid@grac41 Desktop]$ asmcmd lsdsk -k
Total_MB  Free_MB  OS_MB  Name      Failgroup  Failgroup_Type  Library  Label  UDID  Product  Redund   Path
   20473    15758  20473  FRA_0000  FRA_0000  REGULAR         System                         UNKNOWN  /dev/asmdisk_fra1
   20479    15755  20479  FRA_0001  FRA_0001  REGULAR         System                         UNKNOWN  /dev/asmdisk_fra2
--> FRA DG is using EXTERNAL redundany with 2 disks ( /dev/asmdisk_fra1, /dev/asmdisk_fra2 )

Find related device UUID
[root@grac41 ~]# ./check_uuid.sh
/dev/sda  WWID:   1ATA_VBOX_HARDDISK_VBcd7c99fa-dc59f9dd
..
/dev/sdj  WWID:   1ATA_VBOX_HARDDISK_VB1726c4c7-b3bcaccd
/dev/sdk  WWID:   1ATA_VBOX_HARDDISK_VB17a025ba-62aae810
/dev/sdl  WWID:   1ATA_VBOX_HARDDISK_VB0cba64ab-3d0e1451
[root@grac41 ~]#  scsi_id --whitelisted --replace-whitespace --device=/dev/asmdisk_fra2
1ATA_VBOX_HARDDISK_VB1726c4c7-b3bcaccd
--> /dev/sdj is the disk device for FRA partition /dev/asmdisk_fra2

Verify this by reading major/minior device numbers 
[root@grac41 Desktop]# ls -l /dev/sdj
brw-rw----. 1 root disk 8, 144 Jul 12 09:42 /dev/sdj
[root@grac41 Desktop]#  ls -l /dev/asmdisk_fra2
brw-rw----. 1 grid asmadmin 8, 145 Jul 12 09:50 /dev/asmdisk_fra2

Disable I/O to the 2.nd FRA disk  ( do this on all instances ) echo offline > /sys/block/sdj/device/state
--> Instance crash - Alert log
WARNING: Write Failed. group:2 disk:1 AU:4497 offset:49152 size:16384
Errors in file /u01/app/oracle/diag/rdbms/grac4/grac41/trace/grac41_ckpt_6406.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 16384
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 321 in group 2 on disk 1 allocation unit 4497 
Errors in file /u01/app/oracle/diag/rdbms/grac4/grac41/trace/grac41_ckpt_6406.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+FRA/grac4/controlfile/current.321.852654927'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /u01/app/oracle/diag/rdbms/grac4/grac41/trace/grac41_ckpt_6406.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+FRA/grac4/controlfile/current.321.852654927'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
Sat Jul 12 09:52:20 2014
System state dump requested by (instance=1, osid=6406 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/grac4/grac41/trace/grac41_diag_6378_20140712095220.trc
CKPT (ospid: 6406): terminating the instance due to error 221

Note automatic Reboot failed with 
WARNING: Read Failed. group:2 disk:1 AU:4497 offset:16384 size:32768
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 321 in group [2.3426318794] from disk FRA_0001  allocation unit 4497 reason error; if possible, will try another mirror side
NOTE: dependency between database grac4 and diskgroup resource ora.DATA.dg is established
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '+FRA/grac4/controlfile/current.321.852654927'
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 9209888
Additional information: -1
ORA-205 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:13:25} */...
NOTE: dependency between database grac4 and diskgroup resource ora.FRA.dg is established

Check clusterware status
[root@grac41 Desktop]# crs
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------       
ora.FRA.dg                    ONLINE     ONLINE          grac41        
ora.FRA.dg                    ONLINE     ONLINE          grac42        
ora.FRA.dg                    ONLINE     ONLINE          grac43        
..      
ora.grac4.db                   ONLINE     OFFLINE         grac41       Instance Shutdown
--> Note it may take some time until FRA DG is dropped 

[grid@grac41 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  1048576     40944    18711            10236            4237              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576     40952    31431                0           31431              0             N  FRA/
MOUNTED  NORMAL  N         512   4096  4194304      6132     4960             2044            1458              0             Y  OCR/

Connect to ASM instance and check for ASM diks after ASM has dismounted the FRA DG
SQL> select dg.name dg_name,  dg.state dg_state,  dg.type,d.name, d.DISK_NUMBER dsk_no, d.MOUNT_STATUS, d.HEADER_STATUS, d.MODE_STATUS,
        d.STATE, d. PATH, d.FAILGROUP  FROM V$ASM_DISK d,  v$asm_diskgroup dg
     where dg.group_number(+)=d.group_number order by dg_name, dsk_no;

DG_NAME    DG_STATE   TYPE   NAME     DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH                 FAILGROUP
---------- ---------- ------ ---------- ------- ------- ------------ ------- -------- ------------------------------ ---------------
FRA       DISMOUNTED                  0 CLOSED    MEMBER       ONLINE  NORMAL   /dev/asmdisk_fra1
FRA       DISMOUNTED                  2 CLOSED    CANDIDATE    ONLINE  NORMAL   /dev/asmdisk_fra2

Verifying controlfile access :
[grid@grac41 ~]$ asmcmd ls +FRA/grac4/controlfile/current.321.852654927
ASMCMD-8002: entry 'current.321.852654927' does not exist in directory '+FRA/grac4/controlfile/'
[grid@grac41 ~]$ asmcmd ls
DATA/
OCR/
--> FRA top level directory was missing 

Recover database without FRA DG

Try to start the database manually

oracle@grac41 ~]$ sqlplus / as sysdba
Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1336176640 bytes
Fixed Size            2253024 bytes
Variable Size          469765920 bytes
Database Buffers      855638016 bytes
Redo Buffers            8519680 bytes
ORA-00205: error in identifying control file, check alert log for more info

SQL> show parameter control
NAME                           TYPE     VALUE
------------------------------ ----------- ------------------------------
control_files                  string     +DATA/grac4/controlfile/current.260.826111693, 
                                          +FRA/grac4/controlfile/current.321.8526549

As FRA DG isn't available anymore - remove the contolfile reference
SQL>  startup force nomount
SQL>  alter system set control_files='+DATA/grac4/controlfile/current.260.826111693' scope = spfile;

SQL> startup mount
ORA-01081: cannot start already-running ORACLE - shut it down first
SQL> startup force mount
ORA-01105: mount is incompatible with mounts by other instances
ORA-01104: number of control files (1) does not equal 2
--> As the remaining instance still have access to the FRA DG lets shutdown these instances
[oracle@grac41 trace]$  srvctl stop instance -d grac4 -i grac42
[oracle@grac41 trace]$  srvctl stop instance -d grac4 -i grac43

SQL> startup force mount 
SQL> show parameter control
NAME                     TYPE     VALUE
-------------------- ----------- ------------------------------
control_files         string     +DATA/grac4/controlfile/current.260.826111693

--> Mount status ok - let's try to open the database  
SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-19751: could not create the change tracking file
ORA-19750: change tracking file: '+FRA/bct.dbf'
ORA-17502: ksfdcre:1 Failed to create file +FRA/bct.dbf
ORA-17501: logical block size 4294967295 is invalid
ORA-17503: ksfdopn:2 Failed to open file +FRA/bct.dbf
ORA-15001: diskgroup "FRA" does not exist or is not mounted
ORA-15001: diskgroup "FRA" does not exist or is not mounted

Disable  BLOCK CHANGE TRACKING and try to open the database 
SQL>  ALTER DATABASE DISABLE BLOCK CHANGE TRACKING; 
Database altered.
SQL>  alter database open;
 alter database open
*
ERROR at line 1:
ORA-16038: log 6 sequence# 21 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 6 thread 3: '+DATA/grac4/onlinelog/group_6.269.852652289'
ORA-00312: online log 6 thread 3: '+FRA/grac4/onlinelog/group_6.306.852652301'

Verify REDO logs:
   THREAD#     GROUP#  SEQUENCE# STATUS       MEMBER                         TYPE    IS_RDF
---------- ---------- ---------- ---------------- -------------------------------------------------- ------- ------
     1        1          41 INACTIVE      +FRA/grac4/onlinelog/group_1.285.852650241         ONLINE  YES
     1        1          41 INACTIVE      +DATA/grac4/onlinelog/group_1.274.852650227         ONLINE  NO
     1        2          42 CURRENT      +DATA/grac4/onlinelog/group_2.273.852651533         ONLINE  NO
     1        2          42 CURRENT      +FRA/grac4/onlinelog/group_2.298.852651537         ONLINE  YES
     2        3          20 INACTIVE      +DATA/grac4/onlinelog/group_3.272.852652849         ONLINE  NO
     2        3          20 INACTIVE      +FRA/grac4/onlinelog/group_3.318.852652859         ONLINE  YES
     2        4          19 INACTIVE      +DATA/grac4/onlinelog/group_4.266.852652635         ONLINE  NO
     2        4          19 INACTIVE      +FRA/grac4/onlinelog/group_4.294.852652647         ONLINE  YES
     3        5          22 INACTIVE      +FRA/grac4/onlinelog/group_5.305.852652263         ONLINE  YES
     3        5          22 INACTIVE      +DATA/grac4/onlinelog/group_5.270.852652251         ONLINE  NO
     3        6          21 INACTIVE      +FRA/grac4/onlinelog/group_6.306.852652301         ONLINE  YES
     3        6          21 INACTIVE      +DATA/grac4/onlinelog/group_6.269.852652289         ONLINE  NO
---> ONLINE REDO logs still reference +FRA DG

Disable FRA and open database 
SQL>  ALTER SYSTEM SET DB_RECOVERY_FILE_DEST=''  SCOPE=BOTH  SID='*';
System altered.

SQL> alter database open;
Database altered.
Verify REDO logs
   THREAD#     GROUP#  SEQUENCE# STATUS       MEMBER                         TYPE    IS_RDF
---------- ---------- ---------- ---------------- -------------------------------------------------- ------- ------
     1        1          41 INACTIVE      +FRA/grac4/onlinelog/group_1.285.852650241         ONLINE  YES
     1        1          41 INACTIVE      +DATA/grac4/onlinelog/group_1.274.852650227         ONLINE  NO
     1        2          42 CURRENT      +DATA/grac4/onlinelog/group_2.273.852651533         ONLINE  NO
     1        2          42 CURRENT      +FRA/grac4/onlinelog/group_2.298.852651537         ONLINE  YES
     2        3          20 INACTIVE      +DATA/grac4/onlinelog/group_3.272.852652849         ONLINE  NO
     2        3          20 INACTIVE      +FRA/grac4/onlinelog/group_3.318.852652859         ONLINE  YES
     2        4          19 INACTIVE      +DATA/grac4/onlinelog/group_4.266.852652635         ONLINE  NO
     2        4          19 INACTIVE      +FRA/grac4/onlinelog/group_4.294.852652647         ONLINE  YES
     3        5          22 INACTIVE      +FRA/grac4/onlinelog/group_5.305.852652263         ONLINE  YES
     3        5          22 INACTIVE      +DATA/grac4/onlinelog/group_5.270.852652251         ONLINE  NO
     3        6          21 INACTIVE      +FRA/grac4/onlinelog/group_6.306.852652301         ONLINE  YES
     3        6          21 INACTIVE      +DATA/grac4/onlinelog/group_6.269.852652289         ONLINE  NO

Database is open ---> lets recreate FRA DG

Restore FRA DG

Try to mount
SQL> alter diskgroup FRA mount force;
alter diskgroup FRA mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "2" is missing from group number "2"
--> This is the expected error


Rereate FRA DG ( old DG +FRA - new DG : + FRA2 )
-> Completly erase our ASM disk headers so we can reuse disk !
[root@grac41 Desktop]# dd if=/dev/zero of=/dev/asmdisk_fra1 bs=1024 count=1024
[root@grac41 Desktop]# dd if=/dev/zero of=/dev/asmdisk_fra2 bs=1024 count=1024

Note drop with asmca only works after all related ASM disk header are cleanup.
As long disks are still members of the FRA DG you can't drop the DG.

Create and enable new FRA DG +FRA2 by using asmca and renable DB_RECOVERY_FILE_DEST

SQL> alter system set db_recovery_file_dest_size=40G  scope=both SID='*';
SQL> alter system set db_recovery_file_dest='+FRA2' scope=both SID='*';

Verify database  setup
SQL>  select log_mode from v$database;
LOG_MODE
------------
ARCHIVELOG


SQL>  archive log list
Database log mode             Archive Mode
Automatic archival            Enabled
Archive destination           USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     7
Next log sequence to archive   8
Current log sequence           8

SQL>  show parameter    db_recovery_file_dest
NAME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest             string     +FRA2
db_recovery_file_dest_size         big integer 40G

SQL> alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST';
System altered.
SQL> alter system switch logfile; 
System altered.

Verify that archive logs are in FRA 
[grid@grac41 ~]$ asmcmd ls -l +FRA2/GRAC4/ARCHIVELOG/2014_07_10
Type        Redund  Striped  Time             Sys  Name
ARCHIVELOG  UNPROT  COARSE   JUL 10 17:00:00  Y    thread_1_seq_8.258.852572669
ARCHIVELOG  UNPROT  COARSE   JUL 10 17:00:00  Y    thread_1_seq_9.259.852573253

Drop the logfile members pointing to the old +FRA DG 
   THREAD#     GROUP# STATUS           MEMBER
---------- ---------- ---------------- --------------------------------------------------
     1       11 INACTIVE           +FRA/grac4/onlinelog/group_11.1102.852485687
     1       11 INACTIVE           +DATA/grac4/onlinelog/group_11.271.852485683
     1       12 CURRENT            +FRA/grac4/onlinelog/group_12.1103.852485693
     1       12 CURRENT            +DATA/grac4/onlinelog/group_12.272.852485689
..
SQL>  ALTER DATABASE DROP LOGFILE MEMBER '+FRA/grac4/onlinelog/group_11.1102.852485687';
Database altered.

SQL> ALTER DATABASE DROP LOGFILE MEMBER '+FRA/grac4/onlinelog/group_12.1103.852485693';
ALTER DATABASE DROP LOGFILE MEMBER '+FRA/grac4/onlinelog/group_12.1103.852485693'
*
ERROR at line 1:
ORA-01609: log 12 is the current log for thread 1 - cannot drop members
ORA-00312: online log 12 thread 1: '+DATA/grac4/onlinelog/group_12.272.852485689'
ORA-00312: online log 12 thread 1: '+FRA/grac4/onlinelog/group_12.1103.852485693'

SQL> alter system switch logfile;
System altered.
SQL>  ALTER DATABASE DROP LOGFILE MEMBER '+FRA/grac4/onlinelog/group_12.1103.852485693';
Database altered.
....

Backup database and Validate backup

RMAN> run
{
set until time "to_date('2014-13-07:10:25:00','yyyy-dd-mm:hh24:mi:ss')";
restore database preview;
}

Multiplex REDO and controlfile for usage of newly created FRA DG

Verify FRA status

SQL> @cf

STATUS    NAME                           IS_RDF
------- -------------------------------------------------- ------
    +DATA/grac4/controlfile/current.260.826111693       NO
    +FRA2/grac4/controlfile/current.321.852654927       YES

SQL> select l.thread#, group#, sequence#, l.status,    member,type,  IS_RECOVERY_DEST_FILE is_rdf from v$logfile inner join v$log l
          using (group#)   order by  l.thread#, group#;
   THREAD#     GROUP#  SEQUENCE# STATUS       MEMBER                         TYPE    IS_RDF
---------- ---------- ---------- ---------------- -------------------------------------------------- ------- ------
     1        1          37 CURRENT      +FRA2/grac4/onlinelog/group_1.285.852650241         ONLINE  YES
     1        1          37 CURRENT      +DATA/grac4/onlinelog/group_1.274.852650227         ONLINE  NO
     1        2          36 INACTIVE      +DATA/grac4/onlinelog/group_2.273.852651533         ONLINE  NO
     1        2          36 INACTIVE      +FRA2/grac4/onlinelog/group_2.298.852651537         ONLINE  YES
     2        3          18 CURRENT      +DATA/grac4/onlinelog/group_3.272.852652849         ONLINE  NO
     2        3          18 CURRENT      +FRA2/grac4/onlinelog/group_3.318.852652859         ONLINE  YES
     2        4          17 INACTIVE      +DATA/grac4/onlinelog/group_4.266.852652635         ONLINE  NO
     2        4          17 INACTIVE      +FRA2/grac4/onlinelog/group_4.294.852652647         ONLINE  YES
     3        5          18 CURRENT      +FRA2/grac4/onlinelog/group_5.305.852652263         ONLINE  YES
     3        5          18 CURRENT      +DATA/grac4/onlinelog/group_5.270.852652251         ONLINE  NO
     3        6          17 INACTIVE      +FRA2/grac4/onlinelog/group_6.306.852652301         ONLINE  YES
     3        6          17 INACTIVE      +DATA/grac4/onlinelog/group_6.269.852652289         ONLINE  NO

SQL>  select * from V$RECOVERY_AREA_USAGE;

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROL FILE                .05             0                       1
REDO LOG                    .75             0                       6
ARCHIVED LOG                .54             0                      41
BACKUP PIECE              20.99             0                      18
IMAGE COPY                21.22             0                       6
FLASHBACK LOG                  0            0                       0
FOREIGN ARCHIVED LOG           0            0                       0

Recovering OCR DG when all disks are lost

OCR diskgroup content

  • Storage for ASM SPFile
  • Storage for Voting Disk
  • Storage for OCR repository

Destroy all ASM disk headers from our OCR diskgroup

[root@grac41 Desktop]# dd if=/dev/zero  of=/dev/asm_ocr_11204_2G_disk3 bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.00483026 s, 217 MB/s
[root@grac41 Desktop]# dd if=/dev/zero  of=/dev/asm_ocr_11204_2G_disk1 bs=1024 count=1024
[root@grac41 Desktop]# dd if=/dev/zero  of=/dev/asm_ocr_11204_2G_disk3 bs=1024 count=1024

CW status after restart
[grid@grac41 ~]$ crsi
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     OFFLINE                        
ora.cluster_interconnect.haip  ONLINE     OFFLINE                        
ora.crf                        ONLINE     ONLINE          grac41         
ora.crsd                       ONLINE     OFFLINE                        
ora.cssd                       ONLINE     OFFLINE         STARTING       
ora.cssdmonitor                ONLINE     ONLINE          grac41         
ora.ctssd                      ONLINE     OFFLINE                        
ora.diskmon                    OFFLINE    OFFLINE                        
ora.drivers.acfs               ONLINE     OFFLINE                        
ora.evmd                       ONLINE     OFFLINE                        
ora.gipcd                      ONLINE     ONLINE          grac41         
ora.gpnpd                      ONLINE     ONLINE          grac41         
ora.mdnsd                      ONLINE     ONLINE          grac41       
--> Cssd doesn't start as none of our voting disk are accessible anymore !

CW alert log reports :
[cssd(22327)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; 
Details at (:CSSNM00070:) in /u01/app/11204/grid/log/grac41/cssd/ocssd.log

Stop CRS and start crs in exclusiv mode:
# crsctl stop crs [-f] 

Verify that all CW processes were stopped
[root@grac41 Desktop]# ps -elf | grep d.bin
0 S root      6934 28649  0  80   0 - 25826 pipe_w 11:37 pts/5    00:00:00 grep d.bin
--> Note you may need to kill remaining CW processes

Start CW stack in exclusive mode without CRS 
A new option '-nocrs' has been introduced with 11.2.0.2, which prevents the start of the ora.crsd resource.
[root@grac41 Desktop]#  $GRID_HOME/bin/crsctl start crs -excl -nocrs

Connect to ASM instance an check ASM disk status
[grid@grac41 ~]$  sqlplus / as sysasm
SQL> select dg.name dg_name,  dg.state dg_state,  dg.type, d.DISK_NUMBER dsk_no, d.MOUNT_STATUS, d.HEADER_STATUS, d.MODE_STATUS,
        d.STATE, d. PATH, d.FAILGROUP  FROM V$ASM_DISK d,  v$asm_diskgroup dg
     where dg.group_number(+)=d.group_number order by dg_name, dsk_no;
DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH               FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ --------------
                   1 CLOSED  CANDIDATE      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2
                   2 CLOSED  CANDIDATE      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1
                   3 CLOSED  CANDIDATE      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3
--> As we have erased the ASM disk header with d the header status should be shown as CANDIDATE status

 

Recreate diskgroup , restore backup, configure ASM SPFile and Voting disks

SQL> CREATE DISKGROUP OCR NORMAL REDUNDANCY
  2    FAILGROUP OCR_000 DISK '/dev/asm_ocr_11204_2G_disk1' NAME disk1
  3    FAILGROUP OCR_001 DISK '/dev/asm_ocr_11204_2G_disk2' NAME disk2
  4    FAILGROUP OCR_002 DISK '/dev/asm_ocr_11204_2G_disk3' NAME disk3
  5    ATTRIBUTE 'au_size'='4M',
  6       'compatible.asm' = '11.2',
  7       'compatible.rdbms' = '11.2';
Diskgroup created

Verify OCR backup ( check on all nodes to get the most current backup ! )
[grid@grac41 ~]$ ocrconfig -showbackup auto
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy
grac41     2014/07/01 04:48:16     /u01/app/11204/grid/cdata/grac4/backup00.ocr
grac41     2014/07/01 00:48:12     /u01/app/11204/grid/cdata/grac4/backup01.ocr 
..

Restore the latest OCR backup
AS the CRS disk group is created & mounted the OCR can be restored - must be done as the root user:
[root@grac41 Desktop]#  $GRID_HOME/bin/ocrconfig -restore /u01/app/11204/grid/cdata/grac4/backup00.ocr

DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH               FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ ---------------
OCR       MOUNTED    NORMAL       0 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1      OCR_000
OCR       MOUNTED    NORMAL       1 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2      OCR_001
OCR       MOUNTED    NORMAL       2 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3      OCR_002
--> Now Disk header status has changed to MEMBER 

Add voting disks
[root@grac41 ~]#  $GRID_HOME/bin/crsctl replace votedisk +OCR
CRS-4602: Failed 27 to add voting file 112b412681a04fa1bfd99c0ed4dc991c.
CRS-4602: Failed 27 to add voting file 0fd25dd59d954f80bfbd4e9f0431fb11.
CRS-4602: Failed 27 to add voting file 36bf8226cc364faebf9b616999771bbc.
Failed to replace voting disk group with +OCR.

--> Need to set asm_diskstring 
SQL> alter system set asm_diskstring="/dev/asm*";
System altered.
SQL>  show parameter asm;
NAME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                 string     OCR
asm_diskstring                 string     /dev/asm*

[root@grac41 ~]#  $GRID_HOME/bin/crsctl replace votedisk +OCR
Successful addition of voting disk 7f78f5e770534f74bf4a00e542579bd1.
Successful addition of voting disk 7f75b3f24c064f43bfae9ca2d87329f1.
Successful addition of voting disk 8b2d9f3224c64fe1bf395647615ba722.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced

Recreate ASM SPFIE
SQL> create spfile='+OCR/grac4/asmparameterfile/spfileASM.ora'  from pfile='/home/grid/ASM_SPFILE/initasm.ora'   ;
File created.

Verify SPfile location, voting disks and OCR DG status after restore

--> Restart CRS 

Verify ASM SPFile location
[grid@grac41 grac41]$ asmcmd spget
+OCR/grac4/asmparameterfile/spfileASM.ora


[grid@grac41 grac41]$  crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   7f78f5e770534f74bf4a00e542579bd1 (/dev/asm_ocr_11204_2G_disk1) [OCR]
 2. ONLINE   7f75b3f24c064f43bfae9ca2d87329f1 (/dev/asm_ocr_11204_2G_disk2) [OCR]
 3. ONLINE   8b2d9f3224c64fe1bf395647615ba722 (/dev/asm_ocr_11204_2G_disk3) [OCR]
Located 3 voting disk(s).

Verify OCR DG 
[grid@grac41 grac41]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304      6132     5236             2044            1596              0             Y  OCR/

Recovering OCR DG when missing a single disk

OCR disk group content

  • tested with GRID 11.2.0.4
  • Storage for ASM SPFile
  • Storage for Voting Disk
  • Storage for OCR repository

Setup test case

Destroy ASM disk header from our OCR diskgroup
[root@grac41 Desktop]# dd if=/dev/zero  of=/dev/asm_ocr_11204_2G_disk3 bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.00483026 s, 217 MB/s

Verify CW status after startup 
[root@grac41 Desktop]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

Monitor CRS startup 
[grid@grac41 ~]$ watch crsi
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     INTERMEDIATE    grac41       OCR not started
ora.cluster_interconnect.haip  ONLINE     ONLINE          grac41         
ora.crf                        ONLINE     ONLINE          grac41         
ora.crsd                       ONLINE     OFFLINE                        
ora.cssd                       ONLINE     ONLINE          grac41         
ora.cssdmonitor                ONLINE     ONLINE          grac41         
ora.ctssd                      ONLINE     ONLINE          grac41       OBSERVER  
ora.diskmon                    OFFLINE    OFFLINE                        
ora.drivers.acfs               ONLINE     OFFLINE                        
ora.evmd                       ONLINE     INTERMEDIATE    grac41         
ora.gipcd                      ONLINE     ONLINE          grac41         
ora.gpnpd                      ONLINE     ONLINE          grac41         
ora.mdnsd                      ONLINE     ONLINE          grac41   

RAC Alertlog 
[/u01/app/11204/grid/bin/oraagent.bin(15374)]CRS-5019:All OCR locations are on ASM disk groups [OCR], and 
                                             none of these disksgroups are mounted. 
Details are at "(:CLSN00100:)" in "/u01/app/11204/grid/log/grac41/agent/ohasd/oraagent_grid/oraagent_grid.log".
Agent Log
2014-07-05 09:23:32.627: [ora.asm][1002428160]{0:0:2} [start] ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "1"

Check trace for errors:
[grid@grac41 log]$ fn.sh ORA- | egrep 'TraceFile|2014-07-05 09:'
TraceFileName: ./grac41/agent/ohasd/oraagent_grid/oraagent_grid.l01
2014-07-05 09:23:32.627: [ora.asm][1002428160]{0:0:2} [start] ORA-15032: not all alterations performed
2014-07-05 09:23:32.640: [ora.asm][1002428160]{0:0:2} [start] ORA-15100: invalid or missing diskgroup name

Check voting disks    
[grid@grac41 log]$  crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a4452602e3be4f42bf5467f41d96e46a (/dev/asm_ocr_11204_2G_disk1) [OCR]
 2. ONLINE   293e6baa9cbc4f90bf01f52b1b9019fa (/dev/asm_ocr_11204_2G_disk2) [OCR]
 3. OFFLINE  69e2423e2cff4f64bf16c1346a900803 () []
Located 3 voting disk(s).
--> Voting Disk 3 is OFFLINE

t@grac41 ~]# ocrcheck -local
Status of Oracle Local Registry is as follows :
     Version                  :          3
     Total space (kbytes)     :     262120
     Used space (kbytes)      :       2676
     Available space (kbytes) :     259444
     ID                       : 1855884304
     Device/File Name         : /u01/app/11204/grid/cdata/grac41.olr
                                    Device/File integrity check succeeded
     Local registry integrity check succeeded
     Logical corruption check succeeded
--> OLR ok

[root@grac41 ~]# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
[root@grac41 ~]# ocrcheck -config
Oracle Cluster Registry configuration is :
     Device/File Name         :       +OCR
--> Local OLR ok - Cluster OCR not ONLINE due to missing voting disk

Find the missing disk name of our 3.rd ASM disk 
SQL> select dg.name dg_name,  dg.state dg_state,  dg.type, d.DISK_NUMBER dsk_no, d.MOUNT_STATUS, d.HEADER_STATUS, d.MODE_STATUS,
  2      d.STATE, d. PATH, d.FAILGROUP  FROM V$ASM_DISK d,  v$asm_diskgroup dg
  3   where dg.group_number(+)=d.group_number order by dg_name, dsk_no;

DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH               FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ ---------------
OCR       DISMOUNTED           1 CLOSED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2
OCR       DISMOUNTED           2 CLOSED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1
OCR       DISMOUNTED           3 CLOSED  CANDIDATE   ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3

Check the ASM disk header this with kfed 
[root@grac41 Desktop]# kfed read /dev/asm_ocr_11204_2G_disk3
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F4C5C91B400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
--> ASM disk header erased 

Verify the disk header from the remaining disk
[root@grac41 Desktop]#  kfed read /dev/asm_ocr_11204_2G_disk1   | egrep 'type|name'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:                OCR_0003 ; 0x028: length=8
kfdhdb.grpname:                     OCR ; 0x048: length=3
kfdhdb.fgname:                 OCR_0003 ; 0x068: length=8
kfdhdb.capname:                         ; 0x088: length=0
[root@grac41 Desktop]#  kfed read /dev/asm_ocr_11204_2G_disk2 | egrep 'type|name'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:                OCR_0000 ; 0x028: length=8
kfdhdb.grpname:                     OCR ; 0x048: length=3
kfdhdb.fgname:                 OCR_0000 ; 0x068: length=8
kfdhdb.capname:                         ; 0x088: length=0
--> Disks asm_ocr_11204_2G_disk1 and asm_ocr_11204_2G_disk2 are ok !

Add the failed disk back again to OCR DG

Mount OCR DG with force option and re-add the repaired disk  

SQL>  alter diskgroup OCR mount force;
Diskgroup altered.

SQL>  @dg
DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH                           FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ ---------------
OCR       MOUNTED    NORMAL       0 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2      OCR_0000
OCR       MOUNTED    NORMAL       1 MISSING UNKNOWN     OFFLINE NORMAL                                    OCR_0001
OCR       MOUNTED    NORMAL       3 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1      OCR_0003                   
                                  3 CLOSED  CANDIDATE   ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3

SQL>  alter diskgroup OCR add disk '/dev/asm_ocr_11204_2G_disk3';
Diskgroup altered.

SQL> @dg

DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH               FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ ---------------
OCR       MOUNTED    NORMAL       0 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2      OCR_0000
OCR       MOUNTED    NORMAL       1 MISSING UNKNOWN     OFFLINE NORMAL                  OCR_0001
OCR       MOUNTED    NORMAL       2 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3      OCR_0002
OCR       MOUNTED    NORMAL       3 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1      OCR_0003

Testing mount/remount operation before CW restart 
SQL>  alter diskgroup OCR dismount force;
Diskgroup altered.
SQL>  alter diskgroup OCR mount;
Diskgroup altered.
SQL> @dg
DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH               FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ ---------------
OCR       MOUNTED    NORMAL       0 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2      OCR_0000
OCR       MOUNTED    NORMAL       1 MISSING UNKNOWN      OFFLINE FORCING                  OCR_0001
OCR       MOUNTED    NORMAL       2 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3      OCR_0002
OCR       MOUNTED    NORMAL       3 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1      OCR_0003

Restart CW and verify voting disk status 
- Note disk with FORCING state will be cleaned up 
- Missing Vote disk will be added automatically

DG_NAME    DG_STATE   TYPE    DSK_NO MOUNT_S HEADER_STATU MODE_ST STATE    PATH               FAILGROUP
---------- ---------- ------ ------- ------- ------------ ------- -------- ------------------------------ ---------------
OCR       MOUNTED    NORMAL       0 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk2      OCR_0000
OCR       MOUNTED    NORMAL       2 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk3      OCR_0002
OCR       MOUNTED    NORMAL       3 CACHED  MEMBER      ONLINE  NORMAL   /dev/asm_ocr_11204_2G_disk1      OCR_0003

[root@grac41 Desktop]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a4452602e3be4f42bf5467f41d96e46a (/dev/asm_ocr_11204_2G_disk1) [OCR]
 2. ONLINE   293e6baa9cbc4f90bf01f52b1b9019fa (/dev/asm_ocr_11204_2G_disk2) [OCR]
 3. ONLINE   a0967baeb88b4f45bf3ae5da678fecc4 (/dev/asm_ocr_11204_2G_disk3) [OCR]