Reconfigure OCR after an OCR corruption

Overview

  • Collect all the data by running script  :  check_OCR.sh
  • If a PINNED node was found – unpin the node : pin-and-unpin-a-node/
  • All your old CRS information will be lost – as we assume that your CRS is/was corrupted anyway

Reconfigure OCR

Current status 
[grid@grac43 ~]$ olsnodes -n -i -s -t
grac42    1    192.168.1.223    Active    Unpinned
grac43    2    192.168.1.225    Active    Unpinned

[grid@grac42 cluvfy]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
     Version                  :          3
     Total space (kbytes)     :     262120
     Used space (kbytes)      :       4040
     Available space (kbytes) :     258080
     ID                       :  630679368
     Device/File Name         :       +OCR
                                    Device/File

Before going forward double that your OCR diskgroup only stores OCRFILE and ASM SPFILE - IF not stop here !!! 
[grid@grac42 cluvfy]$ asmcmd ls +OCR/grac4/OCRFILE/
REGISTRY.255.828888017
[grid@grac42 cluvfy]$  asmcmd ls +OCR/grac4/asmparameterfile/
REGISTRY.253.850819057
spfileASM.ora
--> These files will be deleted later !
Deconfigure GRID stack on grac43
[root@grac43 app]# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11204/grid/crs/install/crsconfig_params
Network exists: 1/192.168.1.0/255.255.255.0/eth1, type dhcp
VIP exists: /192.168.1.249/192.168.1.249/192.168.1.0/255.255.255.0/eth1, hosting node grac42
VIP exists: /192.168.1.213/192.168.1.213/192.168.1.0/255.255.255.0/eth1, hosting node grac43
..
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

Deconfigure GRID stack on grac42
[root@grac42 ~]#  $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11204/grid/crs/install/crsconfig_params
Network exists: 1/192.168.1.0/255.255.255.0/eth1, type dhcp
VIP exists: /192.168.1.249/192.168.1.249/192.168.1.0/255.255.255.0/eth1, hosting node grac42
GSD exists
ONS exists: Local port 6100, remote port 6200, EM port 2016
..
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

Once the above command finishes on all remote nodes, on local node, as root execute:
[root@grac42 ~]#  $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode
Using configuration parameter file: /u01/app/11204/grid/crs/install/crsconfig_params
...

Either /etc/oracle/olr.loc does not exist or is not readable
Make sure the file exists and it has read and execute access
Either /etc/oracle/olr.loc does not exist or is not readable
Make sure the file exists and it has read and execute access
Failure in execution (rc=-1, 256, No such file or directory) for command /etc/init.d/ohasd deinstall
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node

Here you should try as a first step  to install a backup of your OCR.

If this OCR backup was corrupted too we need to erase the OCR disk with dd.
It is your responsibility to have a backup of all the data before going forward !
At least in my case I restored a one week old backup and this still shows the corruption of an already 
deleted node with status  Inactive - Pinned.
   For details see : Bug 16788764 : 12.1SH0506: UNABLE TO RUN ADDNODE OF AN INACTIVE/PINNED NODE 

After restore of OCR from backup the corruption was still there.
[grid@grac42 bin]$ olsnodes -n -i -s -t
grac41    1    192.168.1.250    Inactive    Pinned

--> RECREATE OCR by erasing OCR diskgroup !
Check that CRS is down on all nodes and cleanup OCR disk group
[root@grac42 ~]#  ls /dev/asm_ocr*
/dev/asm_ocr_2G_disk1  /dev/asm_ocr_2G_disk2  /dev/asm_ocr_2G_disk3

Verify current disk header 
[root@grac42 ~]#   kfed read  /dev/asm_ocr_2G_disk1 | egrep 'type|name'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:                OCR_0000 ; 0x028: length=8
kfdhdb.grpname:                     OCR ; 0x048: length=3
kfdhdb.fgname:                 OCR_0000 ; 0x068: length=8
kfdhdb.capname:                         ; 0x088: length=0

Cleanup OCR disk ( use asmcmd ls to check that this DG is storing only for  OCRFILE and ASM SPFile ) 
[root@grac42 ~]#  dd if=/dev/zero    of=/dev/asm_ocr_2G_disk1   bs=8192  count=1000
[root@grac42 ~]#  dd if=/dev/zero    of=/dev/asm_ocr_2G_disk2  bs=8192  count=1000
[root@grac42 ~]#  dd if=/dev/zero    of=/dev/asm_ocr_2G_disk3  bs=8192  count=1000

Verify ( [Invalid OSM block type means we have successfully erased  the disk )
[root@grac42 ~]#  kfed read  /dev/asm_ocr_2G_disk1 | egrep 'type|name'
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

Run config.sh and reconfigure OCR with  +OCR diskgroup 
[grid@grac42 ~]$  $GRID_HOME/crs/config/config.sh
[root@grac42 app]# /u01/app/11204/grid/root.sh
..
ASM created and started successfully.
Disk Group OCR created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk 6ae38f2006a74f32bf59078c45dfadf6.
Successful addition of voting disk 8f368f3c57344f8fbf83224de1d22882.
Successful addition of voting disk 1888d9ab520a4fa6bf788c2bf04788be.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   6ae38f2006a74f32bf59078c45dfadf6 (/dev/asm_ocr_2G_disk1) [OCR]
 2. ONLINE   8f368f3c57344f8fbf83224de1d22882 (/dev/asm_ocr_2G_disk2) [OCR]
 3. ONLINE   1888d9ab520a4fa6bf788c2bf04788be (/dev/asm_ocr_2G_disk3) [OCR]
Located 3 voting disk(s).

...
CRS-2672: Attempting to start 'ora.asm' on 'grac42'
CRS-2676: Start of 'ora.asm' on 'grac42' succeeded
CRS-2672: Attempting to start 'ora.OCR.dg' on 'grac42'
CRS-2676: Start of 'ora.OCR.dg' on 'grac42' succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@grac43 Desktop]# /u01/app/11204/grid/root.sh
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node grac42, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Reference

  • root.sh Fails With “Unable to get VIP info for new node at” (Doc ID 1454413.1)
  • How to Deconfigure/Reconfigure(Rebuild OCR) or Deinstall Grid Infrastructure (Doc ID 1377349.1)

One thought on “Reconfigure OCR after an OCR corruption”

Leave a Reply

Your email address will not be published. Required fields are marked *