监控工具DPA发现海外一台Oracle数据库服务器DB Commit Time指标告警,超过红色告警线(40毫秒左右,黄色告警是10毫秒,红色告警线是20毫秒),如下截图所示,生成了对应的时段的AWR报告,发现Top 5 Timed Events里面,log file sync等待事件的平均等待时间为37毫秒,log file parallel write等待事件的平均等待时间为40毫秒
如果对Tanel Poder::Understanding LGWR, Log File Sync Waits and Commit Performance这篇文章所讲述的内容很熟悉的话(经典图如下),那么通过等待事件log file sync与log file parallel write的Avg Wait(ms)指标: 37ms & 40ms,基本上可以判断就是redo log所在的磁盘I/O出现了性能问题
然后在用lfsdiag.sql脚本分析一下详细的统计数据,如下所示:
HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
APPROACH: Look at the wait distribution for log file sync waits
by looking at "wait_time_milli". Look at the high wait times then
see if you can correlate those with other related wait events.
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
1 log file sync 1 4363
1 log file sync 2 835
1 log file sync 4 1650
1 log file sync 8 4937
1 log file sync 16 146252
1 log file sync 32 606674
1 log file sync 64 263377
1 log file sync 128 253254
1 log file sync 256 2
1 log file switch completion 1 124
1 log file switch completion 2 9
1 log file switch completion 4 19
1 log file switch completion 8 21
1 log file switch completion 16 35
1 log file switch completion 32 97
1 log file switch completion 64 133
1 log file switch completion 128 326
1 log file switch completion 256 1736
1 log file switch completion 512 3042
1 log file switch completion 1024 2020
1 log file parallel write 1 0
1 log file parallel write 2 0
1 log file parallel write 4 80
1 log file parallel write 8 2142
1 log file parallel write 16 170987
1 log file parallel write 32 779205
1 log file parallel write 64 311463
1 log file parallel write 128 79688
1 log file parallel write 256 42763
1 log file parallel write 512 13052
1 log file parallel write 1024 20468
1 log file parallel write 2048 14020
1 log file parallel write 4096 921
1 log file parallel write 8192 96
1 log file parallel write 16384 18
1 log file parallel write 32768 18
1 log file parallel write 65536 8
1 log file parallel write 131072 2
1 LGWR wait for redo copy 1 8516
1 LGWR wait for redo copy 2 20
1 LGWR wait for redo copy 4 19
1 LGWR wait for redo copy 8 20
1 LGWR wait for redo copy 16 11
ORDERED BY WAIT_TIME_MILLI
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
1 log file sync 1 4363
1 log file switch completion 1 124
1 log file parallel write 1 0
1 LGWR wait for redo copy 1 8516
1 log file sync 2 835
1 log file switch completion 2 9
1 log file parallel write 2 0
1 LGWR wait for redo copy 2 20
1 log file sync 4 1650
1 log file switch completion 4 19
1 log file parallel write 4 80
1 LGWR wait for redo copy 4 19
1 log file sync 8 4937
1 log file switch completion 8 21
1 log file parallel write 8 2142
1 LGWR wait for redo copy 8 20
1 log file sync 16 146252
1 log file switch completion 16 35
1 log file parallel write 16 170987
1 LGWR wait for redo copy 16 11
1 log file sync 32 606674
1 log file switch completion 32 97
1 log file parallel write 32 779205
1 log file sync 64 263377
1 log file switch completion 64 133
1 log file parallel write 64 311463
1 log file sync 128 253254
1 log file switch completion 128 326
1 log file parallel write 128 79688
1 log file sync 256 2
1 log file switch completion 256 1736
1 log file parallel write 256 42763
1 log file switch completion 512 3042
1 log file parallel write 512 13052
1 log file switch completion 1024 2020
1 log file parallel write 1024 20468
1 log file parallel write 2048 14020
1 log file parallel write 4096 921
1 log file parallel write 8192 96
1 log file parallel write 16384 18
1 log file parallel write 32768 18
1 log file parallel write 65536 8
1 log file parallel write 131072 2
REDO WRITE STATS
"redo write time" in centiseconds (100 per second)
11.1: "redo write broadcast ack time" in centiseconds (100 per second)
11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
VERSION INST_ID NAME VALUE MILLISECONDS
----------------- ---------- ---------------------------------------- --------------------- -------------------
10.2.0.5.0 1 redo write time 9551524 95515240.000
10.2.0.5.0 1 redo writer latching time 51
10.2.0.5.0 1 redo writes 1434931
AWR WORST AVG LOG FILE SYNC SNAPS: