Debugging a Node Eviction using OSWatcher

Overview

  • Note as OSWatcher is not running with RT privs ( like CHM ) –  we may miss a lot of interesting records
  • OSWatcher utils ( vmstat , iostat , ..) may not get scheduled if we have CPU / Paging/ Swapping problems
  • Always check the OSWatcher vmstat file for missing records
  • If we have missing records for our Eviction Time we can only look in the past before eviction time
  • Always check CHM data as here we get much more details about our system status during the Eviction Time
  • Use the graphical tool of OSWatcher to check for high count of blocking process

Create an OSWatcher Analyzer report

Locate OSWatcher files
% find . -name archive
./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive
./tds-2014-03-22/14387023_grac42.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac42/u01/app/11204/grid/oswbb/archive
./tds-2014-03-22/14386169_grac43.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac43/u01/app/11204/grid/oswbb/archive

Unzip OSWAtcher archives
% gunzip -r ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive
% gunzip -r ./tds-2014-03-22/14387023_grac42.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac42/u01/app/11204/grid/oswbb/archive
% gunzip -r ./tds-2014-03-22/14386169_grac43.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac43/u01/app/11204/grid/oswbb/archive

Create an analyzer file  
 java -jar /home/hhutzler/Tools/SupportBundle_v1_3_1/oswbb/oswbba.jar \
     -i ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive \
     -S grac41.txt -B Mar 22 9:00:00 2014 -E Mar 22 11:00:00 2014

 

Does OSWatcher provide enough data to analyze the problem ?

% grep zzz grac41.example.com_vmstat_14.03.22.0900.dat
....
zzz ***Sat Mar 22 09:57:04 CET 2014
zzz ***Sat Mar 22 09:57:34 CET 2014
zzz ***Sat Mar 22 09:58:09 CET 2014

% grep zzz  grac41.example.com_vmstat_14.03.22.1000.dat
zzz ***Sat Mar 22 10:09:35 CET 2014
zzz ***Sat Mar 22 10:10:05 CET 2014
zzz ***Sat Mar 22 10:10:35 CET 2014
zzz ***Sat Mar 22 10:11:05 CET 2014
  • We don’t have enough data during the eviction so we may not able to find the root cause of the problem
  • OSWatcher records missing from 09:58:09 – 10:09:35

 

Read and interpret OSWatcher Analyzer Data

Analyzing System Status

############################################################################
# Section 1: System Status
#
# This section lists the status of each major subsystem. Status values are:
# Critical: The subsystem requires immediate attention
# Warning:  The subsystem detailed findings should be reviewed
# OK:       The subsystem was found to be okay
# Unknown:  The status of the subsystem could not be determined
#
#
Subsystem       Status
------------------------
CPU             CRITICAL
MEMORY          WARNING
I/O             WARNING
NET             WARNING
--> Need to review all subsytems


############################################################################
# Section 2.0: System Slowdown Summary Ordered By Impact
#
# This section lists the times when the OS started to slowdown. oswbba is
# able to measure this by looking at the timestamps in the individual files
# it collects. It compares the time between the snapshots and looks to see
# how this time differs from the expected timestamp which will be the oswbb
# $ Snapshot Freq value listed at the top of this file. Any slowdowns listed
# in this section will be ordered by the slowdown Secs column.The subsystem
# most likely responsible for the slowdown will be identified here.
#
SnapTime        Variance   Secs      Flags    SubSystem
-----------------------------------------------------------------
Sat Mar 22 09:56:33  1.5     45  0020-00-01   Memory
Sat Mar 22 09:55:48  1.3     39  2200-00-00   CPU
Sat Mar 22 09:55:09  1.1     35  2200-00-00   Memory
Sat Mar 22 09:58:09  1.1     35  2200-00-01   Memory
-->  Both CPU and Memory problem are reported as root cause for system slowdown 

Report Summary
SnapTime        Variance   Secs      Flags   Cause(Most Likely)
-----------------------------------------------------------------
Sat Mar 22 09:58:09  1.1     35  2200-30-01   1: System paging memory
                                              2: Large Run Queue
>>>Looking for cause of problem 1: System paging memory
   Advise: The OS is paging memory.
   Reasons: 1. The system is under stress with respect to memory
>>>Looking for cause of problem 2: Large Run Queue
   Advise: Check why run queue is so large  PERCENT
   Reasons: 1. Possible login storm
            2. Possible mutex issue in database (Examine AWR)
--> Above reports confirms that CPU run queue is large and System is paging

############################################################################
 # Section 3: System General Findings
 #
 # This section lists all general findings that require attention. Each
 # finding has a status along with a subsystem. Further advice may also
 # available regarding the finding.
 #

 CRITICAL: CPU Run Queue observed very high spikes.
  Advise: Check why run queue is so large.
  Check:  The number of processes for possible login storm
  Check:  AWR for possible mutex issue in database (Examine AWR)

 CRITICAL: CPU Running in System Mode observed to be high.
  Advise: Check why large amount of cpu is running in kernel mode.
  Check:  Output of top command top to see what processes are running and using kernel cpu
  Check:  If the system is undersized with respect to CPU capacity

 WARNING: Memory high paging rate observed.
  Advise: The OS is low on free memory.
  Check:  The system is under stress with respect to memory

 WARNING : Disk heavy utilization observed.
  Advise: Check disks to see why utilization is so high.
  Check:  Hot disk: I/O distribution should be evaluated
  Check:  The system is undersized with respect to I/O capacity
  Check:  AWR for SQL regression causing more I/O

 WARNING : Disk high service time observed.
  Advise: Check disks to see why service time is so high.
  Check:  Hot disk: I/O distribution should be evaluated
  Check:  Disk may be defective

 WARNING : Network UDP errors observed.
  Advise: UDP protocol only relevant for RAC. Ignore for Non-RAC
  Advise: Avoid any dropped packets in UDP protocol
  Check:  UDP socket receive buffer on the local machine too small
  Check:  The application not reading the data fast enough
  Check:  Section 7.3 below for more details

Analyzing CPU data

############################################################################
# Section 4.1: CPU RUN QUEUE:
# Run queue should not exceed (Value/#CPU > 3) for any long period of time.
# Below lists the number of times (NUMBER) and percent of the number of times
# (PERCENT) that run queue was High (>3) or Very High (>6). Pay attention to
# high spanning multiple snaps as this represents the number of times ru  PERCENT
# queue remained high in back to back snapshots
#
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>3)                                  12     5.61
Very High (>6)                              7     3.27
High spanning multiple snaps                3      1.4
The following snaps recorded very high run queue values:
SnapTime                      Value   Value/#CPU
------------------------------------------------
Sat Mar 22 09:55:09 UTC 2014     29           14
Sat Mar 22 09:55:48 UTC 2014     20           10
Sat Mar 22 09:57:04 UTC 2014    117           58
Sat Mar 22 09:58:09 UTC 2014     45           22
--> At 09:57:04 58 process per CPU are waiting - this is way to much

############################################################################
# Section 4.2: CPU UTILIZATION: PERCENT BUSY
# CPU utilization should not be high over long periods of time. The higher
# the cpu utilization the longer it will take processes to run.  Below lists
# the number of times (NUMBER) and percent of the number of times (PERCENT)
# that cpu percent busy was High (>95%) or Very High (100%). Pay attention
# to high spanning multiple snaps as this represents the number of times cpu
# percent busy remained high in back to back snapshots
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>95%)                                 5     2.34
Very High (100%)                            4     1.87
High spanning multiple snaps                2     0.93
CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy:
SnapTime
------------------------------
Sat Mar 22 09:55:09 UTC 2014
Sat Mar 22 09:55:48 UTC 2014
Sat Mar 22 09:58:09 UTC 2014
-->  CPU utilization is too high before Node Eviction occurs at 10:03
     We can't say anything about CPU usage at Evicition time but it can be expected that CPUs usage remains
        high for the missing OSWatcher monitor records  


############################################################################
 # Section 4.3:CPU UTILIZATION: PERCENT SYS
 # CPU utilization running in SYSTEM mode should not be greater than 30% over
 # long periods of time. The higher system cpu utilization the longer it will
 # take processes to run. Pay attention to high spanning multiple snaps as it
 # is important that cpu utilization not stay persistently high (>30%)
 #
                                        NUMBER  PERCENT
 Snaps captured in archive                 28   100.00
 High (>30%)                                5    17.86
 Very High (50%)                            2     7.14
 High spanning multiple snaps               1     3.57
 High values for SYSTEM Mode ( > 30% ) could be related to 
   - High Paging/Swapping activities
   - High Disk or Network I/O
   - Wild running processes running a lot of system calls

 CPU UTILIZATION: The following snaps recorded very high percent 
 SnapTime                      Percent
 -----------------------------------
 Sat Mar 22 09:54:34 PDT 2014     53
 Sat Mar 22 09:56:33 PDT 2014     59

CPU UTILIZATION: The following snaps recorded ROOT processes using high percent cpu:
SnapTime                          Pid   CPU   Command
-----------------------------------------------------
Sat Mar 22 09:47:33 UTC 2014     2867  94.8   mp_stress
Sat Mar 22 09:48:03 UTC 2014     3554  91.4   mp_stress
Sat Mar 22 09:48:37 UTC 2014     3554  42.8   mp_stress
Sat Mar 22 09:49:32 UTC 2014     4738  37.1   tfactl.pl
Sat Mar 22 09:49:32 UTC 2014     4946  35.1   tfactl.pl
Sat Mar 22 09:55:11 UTC 2014    14181 328.9   mp_stress
Sat Mar 22 09:55:59 UTC 2014    14181 104.6   mp_stress
Sat Mar 22 09:57:04 UTC 2014    16174 219.0   mp_stress
Sat Mar 22 09:57:34 UTC 2014    16805  52.4   tfactl.pl
Sat Mar 22 10:12:36 UTC 2014    28518  66.5   tfactl.pl
-> Process mp_stress is eating up our CPU

Analyzing Memory Usage

############################################################################
# Section 5.3: MEMORY PAGE IN
# Page in values should be 0 or low. High values (> 25) indicate memory is
# under pressure and may be precursor to swapping. Pay attention to high
# spanning multiple snaps as this value should not stay persistently high
#
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>25)                                 31    14.49
High spanning multiple snaps               19     8.88

The following snaps recorded very high page in rates:
SnapTime                      Value
-----------------------------------
Sat Mar 22 09:51:33 UTC 2014     32
Sat Mar 22 09:54:34 UTC 2014    312
Sat Mar 22 09:55:09 UTC 2014     32
Sat Mar 22 09:56:33 UTC 2014    624
Sat Mar 22 09:57:04 UTC 2014    352
Sat Mar 22 09:57:34 UTC 2014    664
Sat Mar 22 09:58:09 UTC 2014    128
Sat Mar 22 10:09:35 UTC 2014    292
-> Paging is too high for 15 % of our snapshots before Node Eviction occurs at  10:03 

####################################################################################################################################
Section 5.5: Top 5 Memory Consuming Processes Beginning
# This section list the top 5 memory consuming processes at the start of the oswbba analysis. There will always be a top 5 process list.
# A process listed here does not imply this process is a problem only that it is a top consumer of memory.
SnapTime                             PID        USER    %CPU    %MEM         VSZ         RSS       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:00:52 UTC 2014        2566        root    0.40    6.20     1798816      273796   ../ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home
Sat Mar 22 09:00:52 UTC 2014       27215      oracle    0.00    4.30     1663316      187352  ora_dbw0_grac41
Sat Mar 22 09:00:52 UTC 2014       27131      oracle    0.50    3.90     1569328      171356  ora_lms0_grac41
Sat Mar 22 09:00:52 UTC 2014        5661        root    2.90    3.80      981288      168316  /u01/app/11204/grid/bin/ologgerd -M -d . /grac41
Sat Mar 22 09:00:52 UTC 2014       27221      oracle    0.00    3.20     1564988      143556  ora_smon_grac41

####################################################################################################################################
Section 5.6: Top 5 Memory Consuming Processes Ending
# This section list the top 5 memory consuming processes at the end of the oswbba analysis. There will always be a top 5 process list.
# A process listed here does not imply this process is a problem only that it is a top consumer of memory.
SnapTime                             PID        USER    %CPU    %MEM         VSZ         RSS       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 10:59:49 UTC 2014        2566        root    0.40    4.70     1798816      207060  .. /ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home
Sat Mar 22 10:59:49 UTC 2014        5661        root    3.00    3.90     1047852      170232  /u01/app/11204/grid/bin/ologgerd -M -d ../grac41
Sat Mar 22 10:59:49 UTC 2014       22565      oracle    0.00    3.10     1554224      135496  ora_mman_grac41
Sat Mar 22 10:59:49 UTC 2014        5283        grid    6.20    2.90     1128680      127744  /u01/app/11204/grid/bin/ocssd.bin
Sat Mar 22 10:59:49 UTC 2014       22578      oracle    0.00    2.60     1560896      114060  ora_smon_grac4
--> Be carefull here as our top consumer  process mp_stress is not shown  as the process was later started 
      and also preempts stops reaching the oswbba end period
    Always check section 8 for process related results ! 

Analyzing Disk IO

############################################################################
# Section 6: Disk Detailed Findings
# This section list only those device which have high percent busy, high service
# times or high wait times
#
############################################################################
# Section 6.1: Disk Percent Busy Findings
# (Only Devices With Percent Busy > 50% Reported)
#
DEVICE: sda PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>50%)                                21     9.81
Very High (>95%)                           17     7.94
High spanning multiple snaps               14     6.54
The following snaps recorded high percent busy for device: sda
SnapTime                           Value
-------------------------------------------
Sat Mar 22 09:48:36 UTC 2014      77.09
Sat Mar 22 09:49:32 UTC 2014       98.7
Sat Mar 22 09:50:02 UTC 2014      99.21
Sat Mar 22 09:50:32 UTC 2014      100.0
Sat Mar 22 09:51:03 UTC 2014       99.5

DEVICE: dm-0 PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>50%)                                17     7.94
Very High (>95%)                            9     4.21
High spanning multiple snaps                9     4.21

The following snaps recorded high percent busy for device: dm-0 ( our swap device ) 
SnapTime                           Value
-------------------------------------------
Sat Mar 22 09:48:36 UTC 2014      67.09
Sat Mar 22 09:49:32 UTC 2014       98.7
Sat Mar 22 09:50:02 UTC 2014      99.21
Sat Mar 22 09:50:32 UTC 2014       82.2
Sat Mar 22 09:51:03 UTC 2014       77.2

DEVICE: dm-1 PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 214   100.00
High (>50%)                                17     7.94
Very High (>95%)                           16     7.48
High spanning multiple snaps               14     6.54

The following snaps recorded high percent busy for device: dm-1
SnapTime                           Value
-------------------------------------------
Sat Mar 22 09:48:36 UTC 2014      77.01
Sat Mar 22 09:49:32 UTC 2014      88.7
Sat Mar 22 09:50:02 UTC 2014      99.21
Sat Mar 22 09:50:32 UTC 2014      99.9
Sat Mar 22 09:51:03 UTC 2014      93.9

Here we need to know something about our partition layout 

For details check:  following link.

 dmsetup ls --tree -o device
vg_oel64-lv_swap (252:1)
 +- (8:3)                         <-- Major, Minor number from /dev/sdX 
 +- (8:2)
vg_oel64-lv_root (252:0)
 +- (8:2)
Check /dev/mapper 
# ls -l  /dev/mapper/vg*
lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_root -> ../dm-0
lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_swap -> ../dm-1
Match major/minor number returned from above dmsetup output
# ls -l /dev/sda2 /dev/sda3
brw-rw----. 1 root disk 8, 2 Mar 24 09:07 /dev/sda2 
brw-rw----. 1 root disk 8, 3 Mar 24 09:07 /dev/sda3
--> Root partition and Swap partition are pointing to the same physical disk /dev/sda -> I/O contention
    For our swap partition we see high BUSY rates > 90 % around 09:50 -> Increased paging/swapping 

Analyzing Processes

a  ############################################################################
 # Section 8.2: PS for Processes With Status = D, T or W Ordered By Time
 # In this section list all processes captured in the oswbb logs which have a
 # status of D, T or W
 # 
 SnapTime                             PID        USER         CPU    STATUS                 WCHAN  COMMAND
 -----------------------------------------------------------------------------------------------------------------------------------
 Sat Mar 22 09:49:32 PDT 2014        7573        grid         0.0         D                sleep_  asm_rbal_+ASM1
 Sat Mar 22 09:49:32 PDT 2014       31115      oracle         0.0         D                sleep_  ora_cjq0_grac41
 Sat Mar 22 09:49:32 PDT 2014       27487      oracle         0.0         D                sleep_  ora_lck0_grac41
 Sat Mar 22 09:49:32 PDT 2014        4915        root         0.0         D                sleep_  /u01/app/11204/grid/bin/./crsctl.bin stat res procwatcher
 Sat Mar 22 09:49:32 PDT 2014       27213      oracle         0.0         D                sleep_  ora_mman_grac41
 Sat Mar 22 09:49:32 PDT 2014       23293      oracle         0.0         D                sleep_  ora_pz99_grac41a 
...
--> At lot of processes are in disk wait status 
    For 2.6 kernels this could be either an IO problem or more likely a paging/swapping problem 

#######################################################################################
# Section 8.3: PS for (Processes with CPU > 0) When System Idle CPU < 30% Ordered By Time
# In this section list all processes captured in the oswbb logs with process cpu consumption
#  > 0 and system idle cpu < 30%
#
SnapTime                        IDLE_CPU         PID        USER         CPU    STATUS  COMMAND
----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:55:11 UTC 2014         0.0       14181        root      328.90         S  mp_stress                               
Sat Mar 22 09:55:59 UTC 2014         0.0       14181        root      104.60         S  mp_stress                               
Sat Mar 22 09:57:04 UTC 2014         9.0       16174        root      219.00         S  mp_stress 
-> process mp_stress is taking a lot of CPU - there is no IDLE CPU from 09:55:11 on 

#######################################################################################
# Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot
# In this section list all changes in virtual memory allocations per process
#
SnapTime                             PID        USER    %CPU    %MEM         VSZ      CHANGE   %CHANGE       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:55:59 UTC 2014       14181        root  205.00   18.50     1090096     +630036    +136.94  ./mp_stress -t 4 -m 5 -p 50 -c 50       
Sat Mar 22 09:56:33 UTC 2014       14181        root  165.00   22.40     1263176     +173080    +15.87   ./mp_stress -t 4 -m 5 -p 50 -c 50
--> Virtual memory for process   ./mp_stress is increasing a lot and +  CPU usage is also very high
    Increased CPU usage and Memory usage could be the root cause for a Node Eviction !

#######################################################################################
Section 8.5: Top RSS Processes Increasing Memory Per Snapshot
# In this section list all changes in resident memory allocations per process
#
SnapTime                             PID        USER    %CPU    %MEM         RSS      CHANGE   %CHANGE       COMMAND
-----------------------------------------------------------------------------------------------------------------------------------
Sat Mar 22 09:55:59 UTC 2014       14181        root  205.00   18.50      805984     +630016    +358.02   ./mp_stress -t 4 -m 5 -p 50 -c 50       
Sat Mar 22 09:56:33 UTC 2014       14181        root  165.00   22.40      977540     +171556    +21.28    ./mp_stress -t 4 -m 5 -p 50 -c 50
-->  Resident  memory for process   ./mp_stress increases a lot 
     Problem could be either a Memory leak or things like a connection storm

 

Using grep to retrieve process priority  from OSWatcher raw data

%  egrep 'zzz|mp_stress|PRI' grac41.example.com_ps_14.03.22.0900.dat 

    USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 09:56:33 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
root     14181  4270  90  165 22.4 1263176 977540 n_tty_ S 09:55:02 00:02:32 ./mp_stress -t 4 -m 5 -p 50 -c 50
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 09:57:04 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 09:57:34 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
zzz ***Sat Mar 22 10:00:17 CET 2014
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
root     16867  4270  90  108 68.3 4542216 2974540 futex_ S 09:57:38 00:03:44 ./mp_stress -t 4 -m 20 -p 50 -c 200
USER       PID  PPID PRI %CPU %MEM    VSZ   RSS WCHAN  S  STARTED     TIME COMMAND
==> Priority is quite high 90 ( true as mp_stress is a RT process )
    CPU usage is high too
    Memory usage explodes from 22 % to 698 %

 

Summary

  • process mp_stress leaks memory and eats up all our CPU and is very likely the root cause of the problem
  • System is paging and lot of process are waiting on Disk I/O
  • CPU queue is high – after a while the most process migrate to the blocking queue 
  • CPU usage is high – all the time
  • As all I/O is redirected to single physical disk we see high disk service time and disk waits
  • From the  provided OSWatcher data we can’t pin point the root cause  of the Node Eviction
  • Root cause could be either : CPU starvation, Paging/Swapping or slow disk I/O

 

Leave a Reply

Your email address will not be published. Required fields are marked *