Debugging a Node Eviction using OSWatcher

Table of Contents

Overview

Note as OSWatcher is not running with RT privs ( like CHM ) – we may miss a lot of interesting records
OSWatcher utils ( vmstat , iostat , ..) may not get scheduled if we have CPU / Paging/ Swapping problems
Always check the OSWatcher vmstat file for missing records
If we have missing records for our Eviction Time we can only look in the past before eviction time
Always check CHM data as here we get much more details about our system status during the Eviction Time
Use the graphical tool of OSWatcher to check for high count of blocking process

Create an OSWatcher Analyzer report

Locate OSWatcher files
% find . -name archive
./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive
./tds-2014-03-22/14387023_grac42.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac42/u01/app/11204/grid/oswbb/archive
./tds-2014-03-22/14386169_grac43.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac43/u01/app/11204/grid/oswbb/archive

Unzip OSWAtcher archives
% gunzip -r ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive
% gunzip -r ./tds-2014-03-22/14387023_grac42.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac42/u01/app/11204/grid/oswbb/archive
% gunzip -r ./tds-2014-03-22/14386169_grac43.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac43/u01/app/11204/grid/oswbb/archive

Create an analyzer file  
%  java -jar /home/hhutzler/Tools/SupportBundle_v1_3_1/oswbb/oswbba.jar \
     -i ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive \
     -S grac41.txt -B Mar 22 9:00:00 2014 -E Mar 22 11:00:00 2014

Does OSWatcher provide enough data to analyze the problem ?

% grep zzz grac41.example.com_vmstat_14.03.22.0900.dat
....
zzz ***Sat Mar 22 09:57:04 CET 2014
zzz ***Sat Mar 22 09:57:34 CET 2014
zzz ***Sat Mar 22 09:58:09 CET 2014

% grep zzz  grac41.example.com_vmstat_14.03.22.1000.dat
zzz ***Sat Mar 22 10:09:35 CET 2014
zzz ***Sat Mar 22 10:10:05 CET 2014
zzz ***Sat Mar 22 10:10:35 CET 2014
zzz ***Sat Mar 22 10:11:05 CET 2014

We don’t have enough data during the eviction so we may not able to find the root cause of the problem
OSWatcher records missing from 09:58:09 – 10:09:35

Read and interpret OSWatcher Analyzer Data

Analyzing System Status

############################################################################
# Section 1: System Status
#
# This section lists the status of each major subsystem. Status values are:
# Critical: The subsystem requires immediate attention
# Warning: The subsystem detailed findings should be reviewed
# OK: The subsystem was found to be okay
# Unknown: The status of the subsystem could not be determined
#
#
Subsystem Status
------------------------
CPU CRITICAL
MEMORY WARNING
I/O WARNING
NET WARNING
--> Need to review all subsytems

############################################################################
# Section 2.0: System Slowdown Summary Ordered By Impact
#
# This section lists the times when the OS started to slowdown. oswbba is
# able to measure this by looking at the timestamps in the individual files
# it collects. It compares the time between the snapshots and looks to see
# how this time differs from the expected timestamp which will be the oswbb
# $ Snapshot Freq value listed at the top of this file. Any slowdowns listed
# in this section will be ordered by the slowdown Secs column.The subsystem
# most likely responsible for the slowdown will be identified here.
#
SnapTime Variance Secs Flags SubSystem
-----------------------------------------------------------------
Sat Mar 22 09:56:33 1.5 45 0020-00-01 Memory
Sat Mar 22 09:55:48 1.3 39 2200-00-00 CPU
Sat Mar 22 09:55:09 1.1 35 2200-00-00 Memory
Sat Mar 22 09:58:09 1.1 35 2200-00-01 Memory
--> Both CPU and Memory problem are reported as root cause for system slowdown

Report Summary
SnapTime Variance Secs Flags Cause(Most Likely)
-----------------------------------------------------------------
Sat Mar 22 09:58:09 1.1 35 2200-30-01 1: System paging memory
2: Large Run Queue
>>>Looking for cause of problem 1: System paging memory
Advise: The OS is paging memory.
Reasons: 1. The system is under stress with respect to memory
>>>Looking for cause of problem 2: Large Run Queue
Advise: Check why run queue is so large PERCENT
Reasons: 1. Possible login storm
2. Possible mutex issue in database (Examine AWR)
--> Above reports confirms that CPU run queue is large and System is paging

############################################################################
# Section 3: System General Findings
#
# This section lists all general findings that require attention. Each
# finding has a status along with a subsystem. Further advice may also
# available regarding the finding.
#

CRITICAL: CPU Run Queue observed very high spikes.
Advise: Check why run queue is so large.
Check: The number of processes for possible login storm
Check: AWR for possible mutex issue in database (Examine AWR)

CRITICAL: CPU Running in System Mode observed to be high.
Advise: Check why large amount of cpu is running in kernel mode.
Check: Output of top command top to see what processes are running and using kernel cpu
Check: If the system is undersized with respect to CPU capacity

WARNING: Memory high paging rate observed.
Advise: The OS is low on free memory.
Check: The system is under stress with respect to memory

WARNING : Disk heavy utilization observed.
Advise: Check disks to see why utilization is so high.
Check: Hot disk: I/O distribution should be evaluated
Check: The system is undersized with respect to I/O capacity
Check: AWR for SQL regression causing more I/O

WARNING : Disk high service time observed.
Advise: Check disks to see why service time is so high.
Check: Hot disk: I/O distribution should be evaluated
Check: Disk may be defective

WARNING : Network UDP errors observed.
Advise: UDP protocol only relevant for RAC. Ignore for Non-RAC
Advise: Avoid any dropped packets in UDP protocol
Check: UDP socket receive buffer on the local machine too small
Check: The application not reading the data fast enough
Check: Section 7.3 below for more details

Analyzing CPU data

############################################################################
# Section 4.1: CPU RUN QUEUE:
# Run queue should not exceed (Value/#CPU > 3) for any long period of time.
# Below lists the number of times (NUMBER) and percent of the number of times
# (PERCENT) that run queue was High (>3) or Very High (>6). Pay attention to
# high spanning multiple snaps as this represents the number of times ru  PERCENT
# queue remained high in back to back snapshots
#
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>3)                                  12     5.61
Very High (>6)                              7     3.27
High spanning multiple snaps                3      1.4
The following snaps recorded very high run queue values:
SnapTime                      Value   Value/#CPU
------------------------------------------------
Sat Mar 22 09:55:09 UTC 2014     29           14
Sat Mar 22 09:55:48 UTC 2014     20           10
Sat Mar 22 09:57:04 UTC 2014    117           58
Sat Mar 22 09:58:09 UTC 2014     45           22
--> At 09:57:04 58 process per CPU are waiting - this is way to much

############################################################################
# Section 4.2: CPU UTILIZATION: PERCENT BUSY
# CPU utilization should not be high over long periods of time. The higher
# the cpu utilization the longer it will take processes to run.  Below lists
# the number of times (NUMBER) and percent of the number of times (PERCENT)
# that cpu percent busy was High (>95%) or Very High (100%). Pay attention
# to high spanning multiple snaps as this represents the number of times cpu
# percent busy remained high in back to back snapshots
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>95%)                                 5     2.34
Very High (100%)                            4     1.87
High spanning multiple snaps                2     0.93
CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy:
SnapTime
------------------------------
Sat Mar 22 09:55:09 UTC 2014
Sat Mar 22 09:55:48 UTC 2014
Sat Mar 22 09:58:09 UTC 2014
-->  CPU utilization is too high before Node Eviction occurs at 10:03
     We can't say anything about CPU usage at Evicition time but it can be expected that CPUs usage remains
        high for the missing OSWatcher monitor records  


############################################################################
 # Section 4.3:CPU UTILIZATION: PERCENT SYS
 # CPU utilization running in SYSTEM mode should not be greater than 30% over
 # long periods of time. The higher system cpu utilization the longer it will
 # take processes to run. Pay attention to high spanning multiple snaps as it
 # is important that cpu utilization not stay persistently high (>30%)
 #
                                        NUMBER  PERCENT
 Snaps captured in archive                 28   100.00
 High (>30%)                                5    17.86
 Very High (50%)                            2     7.14
 High spanning multiple snaps               1     3.57
 High values for SYSTEM Mode ( > 30% ) could be related to 
   - High Paging/Swapping activities
   - High Disk or Network I/O
   - Wild running processes running a lot of system calls

 CPU UTILIZATION: The following snaps recorded very high percent 
 SnapTime                      Percent
 -----------------------------------
 Sat Mar 22 09:54:34 PDT 2014     53
 Sat Mar 22 09:56:33 PDT 2014     59

CPU UTILIZATION: The following snaps recorded ROOT processes using high percent cpu:
SnapTime                          Pid   CPU   Command
-----------------------------------------------------
Sat Mar 22 09:47:33 UTC 2014     2867  94.8   mp_stress
Sat Mar 22 09:48:03 UTC 2014     3554  91.4   mp_stress
Sat Mar 22 09:48:37 UTC 2014     3554  42.8   mp_stress
Sat Mar 22 09:49:32 UTC 2014     4738  37.1   tfactl.pl
Sat Mar 22 09:49:32 UTC 2014     4946  35.1   tfactl.pl
Sat Mar 22 09:55:11 UTC 2014    14181 328.9   mp_stress
Sat Mar 22 09:55:59 UTC 2014    14181 104.6   mp_stress
Sat Mar 22 09:57:04 UTC 2014    16174 219.0   mp_stress
Sat Mar 22 09:57:34 UTC 2014    16805  52.4   tfactl.pl
Sat Mar 22 10:12:36 UTC 2014    28518  66.5   tfactl.pl
-> Process mp_stress is eating up our CPU

Analyzing Memory Usage

############################################################################
# Section 5.3: MEMORY PAGE IN
# Page in values should be 0 or low. High values (> 25) indicate memory is
# under pressure and may be precursor to swapping. Pay attention to high
# spanning multiple snaps as this value should not stay persistently high
#
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>25)                                 31    14.49
High spanning multiple snaps               19     8.88

The following snaps recorded very high page in rates:
SnapTime                      Value
-----------------------------------
Sat Mar 22 09:51:33 UTC 2014     32
Sat Mar 22 09:54:34 UTC 2014    312
Sat Mar 22 09:55:09 UTC 2014     32
Sat Mar 22 09:56:33 UTC 2014    624
Sat Mar 22 09:57:04 UTC 2014    352
Sat Mar 22 09:57:34 UTC 2014    664
Sat Mar 22 09:58:09 UTC 2014    128
Sat Mar 22 10:09:35 UTC 2014    292
-> Paging is too high for 15 % of our snapshots before Node Eviction occurs at  10:03 

####################################################################################################################################
Section 5.5: Top 5 Memory Consuming Processes Beginning
# This section list the top 5 memory consuming processes at the start of the oswbba analysis. There will always be a top 5 process list.
# A process listed here does not imply this process is a problem only that it is a top consumer of memory.
SnapTime                             PID        USER    %CPU    %MEM         VSZ         RSS       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:00:52 UTC 2014        2566        root    0.40    6.20     1798816      273796   ../ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home
Sat Mar 22 09:00:52 UTC 2014       27215      oracle    0.00    4.30     1663316      187352  ora_dbw0_grac41
Sat Mar 22 09:00:52 UTC 2014       27131      oracle    0.50    3.90     1569328      171356  ora_lms0_grac41
Sat Mar 22 09:00:52 UTC 2014        5661        root    2.90    3.80      981288      168316  /u01/app/11204/grid/bin/ologgerd -M -d . /grac41
Sat Mar 22 09:00:52 UTC 2014       27221      oracle    0.00    3.20     1564988      143556  ora_smon_grac41

####################################################################################################################################
Section 5.6: Top 5 Memory Consuming Processes Ending
# This section list the top 5 memory consuming processes at the end of the oswbba analysis. There will always be a top 5 process list.
# A process listed here does not imply this process is a problem only that it is a top consumer of memory.
SnapTime                             PID        USER    %CPU    %MEM         VSZ         RSS       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 10:59:49 UTC 2014        2566        root    0.40    4.70     1798816      207060  .. /ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home
Sat Mar 22 10:59:49 UTC 2014        5661        root    3.00    3.90     1047852      170232  /u01/app/11204/grid/bin/ologgerd -M -d ../grac41
Sat Mar 22 10:59:49 UTC 2014       22565      oracle    0.00    3.10     1554224      135496  ora_mman_grac41
Sat Mar 22 10:59:49 UTC 2014        5283        grid    6.20    2.90     1128680      127744  /u01/app/11204/grid/bin/ocssd.bin
Sat Mar 22 10:59:49 UTC 2014       22578      oracle    0.00    2.60     1560896      114060  ora_smon_grac4
--> Be carefull here as our top consumer  process mp_stress is not shown  as the process was later started 
      and also preempts stops reaching the oswbba end period
    Always check section 8 for process related results !

Analyzing Disk IO

############################################################################
# Section 6: Disk Detailed Findings
# This section list only those device which have high percent busy, high service
# times or high wait times
#
############################################################################
# Section 6.1: Disk Percent Busy Findings
# (Only Devices With Percent Busy > 50% Reported)
#
DEVICE: sda PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>50%)                                21     9.81
Very High (>95%)                           17     7.94
High spanning multiple snaps               14     6.54
The following snaps recorded high percent busy for device: sda
SnapTime                           Value
-------------------------------------------
Sat Mar 22 09:48:36 UTC 2014      77.09
Sat Mar 22 09:49:32 UTC 2014       98.7
Sat Mar 22 09:50:02 UTC 2014      99.21
Sat Mar 22 09:50:32 UTC 2014      100.0
Sat Mar 22 09:51:03 UTC 2014       99.5

DEVICE: dm-0 PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>50%)                                17     7.94
Very High (>95%)                            9     4.21
High spanning multiple snaps                9     4.21

The following snaps recorded high percent busy for device: dm-0 ( our swap device ) 
SnapTime                           Value
-------------------------------------------
Sat Mar 22 09:48:36 UTC 2014      67.09
Sat Mar 22 09:49:32 UTC 2014       98.7
Sat Mar 22 09:50:02 UTC 2014      99.21
Sat Mar 22 09:50:32 UTC 2014       82.2
Sat Mar 22 09:51:03 UTC 2014       77.2

DEVICE: dm-1 PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>50%)                                17     7.94
Very High (>95%)                           16     7.48
High spanning multiple snaps               14     6.54

The following snaps recorded high percent busy for device: dm-1
SnapTime                           Value
-------------------------------------------
Sat Mar 22 09:48:36 UTC 2014      77.01
Sat Mar 22 09:49:32 UTC 2014      88.7
Sat Mar 22 09:50:02 UTC 2014      99.21
Sat Mar 22 09:50:32 UTC 2014      99.9
Sat Mar 22 09:51:03 UTC 2014      93.9

Here we need to know something about our partition layout 

For details check:  following link.


#  dmsetup ls --tree -o device
vg_oel64-lv_swap (252:1)
 +- (8:3)                         <-- Major, Minor number from /dev/sdX 
 +- (8:2)
vg_oel64-lv_root (252:0)
 +- (8:2)
Check /dev/mapper 
# ls -l  /dev/mapper/vg*
lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_root -> ../dm-0
lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_swap -> ../dm-1
Match major/minor number returned from above dmsetup output
# ls -l /dev/sda2 /dev/sda3
brw-rw----. 1 root disk 8, 2 Mar 24 09:07 /dev/sda2 
brw-rw----. 1 root disk 8, 3 Mar 24 09:07 /dev/sda3
--> Root partition and Swap partition are pointing to the same physical disk /dev/sda -> I/O contention
    For our swap partition we see high BUSY rates > 90 % around 09:50 -> Increased paging/swapping

Analyzing Processes

a  ############################################################################
 # Section 8.2: PS for Processes With Status = D, T or W Ordered By Time
 # In this section list all processes captured in the oswbb logs which have a
 # status of D, T or W
 # 
 SnapTime                             PID        USER         CPU    STATUS                 WCHAN  COMMAND
 -----------------------------------------------------------------------------------------------------------------------------------
 Sat Mar 22 09:49:32 PDT 2014        7573        grid         0.0         D                sleep_  asm_rbal_+ASM1
 Sat Mar 22 09:49:32 PDT 2014       31115      oracle         0.0         D                sleep_  ora_cjq0_grac41
 Sat Mar 22 09:49:32 PDT 2014       27487      oracle         0.0         D                sleep_  ora_lck0_grac41
 Sat Mar 22 09:49:32 PDT 2014        4915        root         0.0         D                sleep_  /u01/app/11204/grid/bin/./crsctl.bin stat res procwatcher
 Sat Mar 22 09:49:32 PDT 2014       27213      oracle         0.0         D                sleep_  ora_mman_grac41
 Sat Mar 22 09:49:32 PDT 2014       23293      oracle         0.0         D                sleep_  ora_pz99_grac41a 
...
--> At lot of processes are in disk wait status 
    For 2.6 kernels this could be either an IO problem or more likely a paging/swapping problem 

#######################################################################################
# Section 8.3: PS for (Processes with CPU > 0) When System Idle CPU < 30% Ordered By Time
# In this section list all processes captured in the oswbb logs with process cpu consumption
#  > 0 and system idle cpu < 30%
#
SnapTime                        IDLE_CPU         PID        USER         CPU    STATUS  COMMAND
----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:55:11 UTC 2014         0.0       14181        root      328.90         S  mp_stress                               
Sat Mar 22 09:55:59 UTC 2014         0.0       14181        root      104.60         S  mp_stress                               
Sat Mar 22 09:57:04 UTC 2014         9.0       16174        root      219.00         S  mp_stress 
-> process mp_stress is taking a lot of CPU - there is no IDLE CPU from 09:55:11 on 

#######################################################################################
# Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot
# In this section list all changes in virtual memory allocations per process
#
SnapTime                             PID        USER    %CPU    %MEM         VSZ      CHANGE   %CHANGE       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:55:59 UTC 2014       14181        root  205.00   18.50     1090096     +630036    +136.94  ./mp_stress -t 4 -m 5 -p 50 -c 50       
Sat Mar 22 09:56:33 UTC 2014       14181        root  165.00   22.40     1263176     +173080    +15.87   ./mp_stress -t 4 -m 5 -p 50 -c 50
--> Virtual memory for process   ./mp_stress is increasing a lot and +  CPU usage is also very high
    Increased CPU usage and Memory usage could be the root cause for a Node Eviction !

#######################################################################################
Section 8.5: Top RSS Processes Increasing Memory Per Snapshot
# In this section list all changes in resident memory allocations per process
#
SnapTime                             PID        USER    %CPU    %MEM         RSS      CHANGE   %CHANGE       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:55:59 UTC 2014       14181        root  205.00   18.50      805984     +630016    +358.02   ./mp_stress -t 4 -m 5 -p 50 -c 50       
Sat Mar 22 09:56:33 UTC 2014       14181        root  165.00   22.40      977540     +171556    +21.28    ./mp_stress -t 4 -m 5 -p 50 -c 50
-->  Resident  memory for process   ./mp_stress increases a lot 
     Problem could be either a Memory leak or things like a connection storm

Using grep to retrieve process priority from OSWatcher raw data

%  egrep 'zzz|mp_stress|PRI' grac41.example.com_ps_14.03.22.0900.dat 

    USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 09:56:33 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
root     14181  4270  90  165 22.4 1263176 977540 n_tty_ S 09:55:02 00:02:32 ./mp_stress -t 4 -m 5 -p 50 -c 50
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 09:57:04 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 09:57:34 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 10:00:17 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
root     16867  4270  90  108 68.3 4542216 2974540 futex_ S 09:57:38 00:03:44 ./mp_stress -t 4 -m 20 -p 50 -c 200
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
==> Priority is quite high 90 ( true as mp_stress is a RT process )
    CPU usage is high too
    Memory usage explodes from 22 % to 698 %

Summary

process mp_stress leaks memory and eats up all our CPU and is very likely the root cause of the problem
System is paging and lot of process are waiting on Disk I/O
CPU queue is high – after a while the most process migrate to the blocking queue
CPU usage is high – all the time
As all I/O is redirected to single physical disk we see high disk service time and disk waits
From the provided OSWatcher data we can’t pin point the root cause of the Node Eviction
Root cause could be either : CPU starvation, Paging/Swapping or slow disk I/O