ASH LFS BACKGROUND PROCESS WAITS DURING WORST MINUTE ( from lfsdiag.sql Output )
APPROACH: What is LGWR doing when 'log file sync' waits are happening?
LMS info may be relevent for broadcaston commit and LNS data may be relevant for dataguard.
If more details are needed see the ASH DETAILS FOR WORST MINUTES section at the bottom of the report.
MINUTE INST PROGRAM EVENT TOTAL_WAIT_TIME WAITS AVG_TIME_WAITED
------------ ---- --------------------------------- ----------------------- -------------- ------ ------------ -
Dec02_1728 1 JDBC Thin Client log file sync 2945.003 82 35.915
Dec02_1728 1 oracle@grac41.example.com (LGWR) log file parallel write .040 1 .040
Dec02_1728 1 oracle@grac41.example.com (LGWR) .000 43 .000
Dec02_1728 1 oracle@grac41.example.com (LMS0) .000 2 .000
Dec02_1729 1 JDBC Thin Client log file sync 3819.957 105 36.381
Dec02_1729 1 oracle@grac41.example.com (LGWR) log file parallel write .108 2 .054
Dec02_1729 1 oracle@grac41.example.com (LGWR) .000 54 .000
Dec02_1729 1 oracle@grac41.example.com (LMS0) .000 2 .000
Dec02_1729 1 oracle@grac41.example.com (LMS1) .000 3 .000
Dec02_1731 1 JDBC Thin Client log file sync 995.457 19 52.392
Dec02_1731 1 oracle@grac41.example.com (LGWR) .000 10 .000
Dec02_1731 1 oracle@grac41.example.com (LMS0) .000 2 .000
Dec02_1732 1 JDBC Thin Client log file sync 3429.242 97 35.353
Dec02_1732 1 oracle@grac41.example.com (LGWR) log file parallel write 28.543 28.543
--> In minute 17:29 multiple JDBC clients wait 105 times with an average wait time of 36 ms for log file sync.
LGWR waits in minute 17:32 about 28 ms for doing an I/O (log file parallel write )
HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS ( from lfsdiag.sql Output )
APPROACH: Look at the wait distribution for log file sync waits by looking at "wait_time_milli". Look at the high wait times then
see if you can correlate those with other related wait events.
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
1 log file sync 1 427
1 log file sync 2 50841
1 log file sync 4 69587
1 log file sync 8 13165
1 log file sync 16 4267
1 log file sync 32 2811
1 log file sync 64 1278
1 log file sync 128 602
1 log file sync 256 187
1 log file sync 512 39
1 log file sync 1024 6
1 log file sync 2048 1
1 log file parallel write 1 112200
1 log file parallel write 2 130
1 log file parallel write 4 87
1 log file parallel write 8 75
1 log file parallel write 16 48
1 log file parallel write 32 15
1 log file parallel write 64 16
1 log file parallel write 128 6
1 log file parallel write 256 3
1 gcs log flush sync 2 1204
1 gcs log flush sync 4 182
1 gcs log flush sync 8 56
1 gcs log flush sync 16 30
1 gcs log flush sync 32 25
1 gcs log flush sync 64 15
1 gcs log flush sync 128 8
1 gcs log flush sync 256 3
--> gcs log flush sync and log file sync heavily trigger LGWR for writing data to disk
Generic write I/O stats don't look healthy - lot of writes over 8ms
- Gcs Log Flush Sync times are not acceptable – most of the Waits are > 8 ms
- Note: Only one session on a instance can write to the REDO logs – all the other sessions are waiting on LOG File SYNC event
- Excessive COMMITs and dirty GCS buffers can stress your REDO LOG performance
AWR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs: ( from lfsdiag.sql Output )
APPROACH: These are the AWR snaps where the average 'log file sync' times were the highest. Look at related waits at those times.
SNAP_ID INST NAME TOTAL_WAITS TOTAL_WAIT_TIME AVG_TIME_WAITED
---------- ---- ---------------------------------------- ----------- ----------------- -----------------
303 3 log file sync 7 2003.532 286.219
303 1 log file sync 18 3072.108 170.673
303 2 log file sync 9 271.443 30.160
303 1 gcs log flush sync 1956 18537.067 9.477
303 3 gcs log flush sync 2460 18970.100 7.711
303 2 gcs log flush sync 93 498.418 5.359
303 1 LGWR wait for redo copy 5 7.147 1.429
303 3 log file parallel write 278 193.133 .695
303 2 gc current block 2-way 3026 1789.197 .591
303 1 gc cr grant 2-way 2092 1192.904 .570
303 1 gc current block 2-way 1730 936.028 .541
303 3 gc current block 2-way 1949 998.319 .512
303 3 gc cr grant 2-way 695 305.589 .440
303 2 gc cr grant 2-way 46 19.798 .430
303 1 log file parallel write 365 34.946 .096
303 3 LGWR wait for redo copy 1 .037 .037
303 2 log file parallel write 146 4.058 .028
--> Gcs Log Flush sync is top Wait Event
This event triggers LGWR to write redo log ( log file parallel write )
Session details ( from lfsdiag.sql Output )
APPROACH: If you cannot determine the problem from the data above, you may need to look at the details of what each session
is doing during each 'bad' snap. Most likely you will want to note the times of the high log file sync waits, look at what
LGWR is doing at those times, and go from there...
SAMPLE_TIME INST SESSION_ID PROGRAM EVENT TIME_WAITED
------------------------- ---- ---------- --------------------------------------------- ------------------------------ -----------
P1 P2 P3
02-DEC-13 05.13.09.680 PM 1 5 JDBC Thin Client log file sync 9.284
buffer#: 1456 sync scn: 20387728 : 0
02-DEC-13 05.13.09.680 PM 1 99 JDBC Thin Client log file sync 6.774
buffer#: 1452 sync scn: 20387725 : 0
02-DEC-13 05.13.09.680 PM 1 193 JDBC Thin Client log file sync 7.175
buffer#: 1452 sync scn: 20387726 : 0
02-DEC-13 05.13.10.680 PM 1 5 JDBC Thin Client log file sync 4.070
buffer#: 1588 sync scn: 20388120 : 0
02-DEC-13 05.13.10.680 PM 1 96 oracle@grac41.example.com (LGWR) log file parallel write .905
files: 1 blocks: 1 requests: 1
02-DEC-13 05.13.10.680 PM 1 193 JDBC Thin Client log file sync 4.149
buffer#: 1589 sync scn: 20388122
..
02-DEC-13 05.13.12.780 PM 1 5 JDBC Thin Client log file sync 282.466
buffer#: 2523 sync scn: 20389585 : 0
02-DEC-13 05.13.12.780 PM 1 99 JDBC Thin Client log file sync 281.079
buffer#: 2523 sync scn: 20389586 : 0
02-DEC-13 05.13.12.780 PM 1 193 JDBC Thin Client log file sync 288.317
buffer#: 2520 sync scn: 20389583 : 0
: 0
--> Even LGWR is currently writing some redo records server one of our JDBC clients there are at
least 4 other clients waiting on LGWR.
This wait time is accounted to log file sync wait event
..
02-DEC-13 05.13.12.780 PM 1 5 JDBC Thin Client log file sync 282.466
buffer#: 2523 sync scn: 20389585 : 0
02-DEC-13 05.13.12.780 PM 1 99 JDBC Thin Client log file sync 281.079
buffer#: 2523 sync scn: 20389586 : 0
02-DEC-13 05.13.12.780 PM 1 193 JDBC Thin Client log file sync 288.317
buffer#: 2520 sync scn: 20389583 : 0
--> Here we can see a very high wait times log file sync. Again either LGWR doesn't get CPU
or the the I/O bandwidth is to low
Check your iostat output during heavy load timeframes
avg-cpu: %user %nice %system %iowait %steal %idle
7.68 0.00 29.55 0.72 0.00 62.04
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdc 1.20 0.00 14.34 0 72
sda 18.92 4.78 270.92 24 1360
sde 1.39 19.12 12.75 96 64
sdh 3.39 1.99 2.59 10 13
sdb 3.19 63.75 14.34 320 72
sdg 3.39 1.99 2.59 10 13
sdd 0.60 3.19 1.59 16 8
sdf 3.39 1.99 2.59 10 13
sdi 481.67 0.00 579.08 0 2907
sdj 513.55 0.00 620.12 0 3113
sdk 311.16 0.00 381.67 0 1916
dm-0 22.11 4.78 270.92 24 1360
-> Even sdi, sdkj and sdk shows an acceptable Redo log write performance all of these disks
are emulated via VirtualBox and use a single physical SSD disk allowing about 2000 tps.
ASM is spreading the IO across all available disks but number of physicial disk is the limiting factor.
Summary
- Check that I/O path for redo log is not used for other components (only LGWR should write and read from that disks )
- Moving Redo logs to SSD disks and provide a dedicated Redo log IO path for each instance
- Check lgwr trace files for warnings like :Warning: log write elapsed time 628ms, size 1KB
Reference
- Click to add to Favorites Intermittent Long ‘log file sync’ Waits, LGWR Posting Long Write Times, I/O Portion of Wait Minimal (Doc ID 1278149.1)
- Bug:10318123 – lgwr regularly stalls for 3 seconds at a time
- Event 10928