log buffer及日志管理深入分析及性能调整(三)-技术开发专区

log buffer及日志管理深入分析及性能调整(三)

作者：IT168 hanson 编辑：晓熊 2007-03-19 11:29

3.2 log buffer的等待事件
3.2.1 log file sync等待事件
当用户发出提交或回滚语句时会触发LGWR将重做记录写入联机日志文件，这种触发LGWR的方式叫做同步写（sync writes）触发，而其他剩下的触发LGWR的方式叫做后台写（background writes）。log file sync等待事件只与sync writes有关，而log file parallel write等待事件只与background writes有关。

举例来说，一个用户进程可能进行一个非常大的事务，该事务会产生非常多的重做记录，从而引起很多的background writes。不过，用户session所运行的事务永远不会等待这些background writes完成以后才继续进行。一旦用户进程提交或回滚了事务，那么用户进程将触发LGWR，并且等待log file sync等待事件，一直等到LGWR将当前的重做记录，包括提交或回滚标记全都写入联机日志文件里为止。在这个日志同步（log synchronization）的过程中（将日志缓冲区中的重做记录写入联机日志文件的过程通常也叫做日志同步过程），LGWR进程等待sync write结束，这时它的等待事件为：log file parallel write，而用户session也在等待sync write结束，而这时它等待的等待事件为：log file sync事件。

一旦进程开始等待log file sync事件，用两种方式退出该等待：
1) 当日志缓冲区里的重做记录都写入联机日志文件以后，由LGWR触发，告诉前台进程。

2) 当等待事件超时(通常是1秒)时，前台进程检查当前的log SCN，来判断当前进程的提交是否已经将重做记录写到了联机日志文件里了。如果已经写好了，前台进程继续进行，否则前台进程继续等待log file sync事件。

通常，log file sync等待事件是可以不用过多考虑的。但是如果很多session都在等待该事件的话，就会影响系统的整体响应时间了。我们来模拟一下这两个等待事件。

SQL> create table t1(id number,name varchar2(20)) as select rownum as rid,'abcdef'
2 from dba_objects where rownum<=40000;
SQL> connect / as sysdba
Connected.
SQL> startup force

在运行测试脚本前，先看看统计信息和等待事件各是什么。

SQL> select name,value from v$sysstat where name in('redo entries','redo size',
2 'redo synch writes','redo wastage','redo write time','redo writes') order by name;
NAME VALUE
---------------------------------------------------------------- ---------- 
redo entries 375
redo size 185160
redo synch writes 14
redo wastage 9552
redo write time 131
redo writes 41
SQL> select SID,EVENT,TOTAL_WAITS WAIT_CLASS from v$session_event
2 where sid in(159,166) and event in ('log file sync','log file parallel write');
no rows selected
SQL> begin
2 for x in(select rowid rid,t1.* from t1)
3 loop
4 update t1 set name=lower(name) where rowid=x.rid;
5 commit;
6 end loop;
7 end;
8 /
在运行测试脚本以后，再来看看统计信息和等待事件发生了哪些变化。
SQL> select name,value from v$sysstat where name in('redo entries','redo size',
2 'redo synch writes','redo wastage','redo write time','redo writes') order by name;
NAME VALUE
---------------------------------------------------------------- ---------- 
redo entries 49586
redo size 20031044
redo synch writes 18
redo wastage 2382452
redo write time 996
redo writes 5986
SQL> select distinct sid from v$mystat;
SID
---------- 
159
SQL> select c.sid,a.name from v$bgprocess a ,v$process b , v$session c
2 where a.paddr=b.addr and b.addr = c.paddr and a.name='LGWR';
SID NAME
---------- ----- 
166 LGWR
SQL> select SID,EVENT,TOTAL_WAITS WAIT_CLASS from v$session_event
2 where sid in(159,166) and event in ('log file sync','log file parallel write');
SID EVENT TOTAL_WAITS WAIT_CLASS
------ --------------------- ----------- ---------- 
159 log file sync 3 Commit
166 log file parallel write 6007 System I/O

    从上面的测试，我们可以明显看到，通过在循环中不断进行提交，出现了3次log file sync等待事件。而实际上，过于频繁的提交也正是log file sync等待事件出现的主要原因。而log file parallel write等待事件则总是会出现的，只要LGWR开始刷新日志缓冲区，该进程就会等待log file parallel write等待事件，是不可避免的，不用过多关注该等待事件。同时，我们可以看到，由于过多的提交，导致生成的重做记录的数量非常巨大，redo size为19845884（20031044-185160）字节，也就相当于将近19M。
注意：10g以后在v$system_event和v$session_event中会由字段wait_class显示等待事件所属的类别。也就是该等待事件是由于什么原因引起的。上面我们可以看到log file sync是由于提交引起的，log file parallel write是由于进行物理I/O引起的。

    我们来修改一下上面的脚本，将commit移出循环以后，再按照上面的测试步骤执行一遍。可以发现log file sync等待事件只会出现1次，也就是说减小了一半的log file sync的等待。同时，通过将commit移出循环，所产生的重做记录的数量也下降到大约10M，也就是说减少了将近一半的日志量。

    由此，我们也就可以知道，要减小log file sync的主要方法就是减少提交或回滚的次数。同时，过于频繁的提交不光会引起log file sync等待事件，而且很有可能引起ora-01555：snapshot too old错误。我们可以使用下面的SQL语句来查找log file sync等待所占的总共等待时间的百分比，以及每个session平均每个小时提交多少次，进而可以找到该session执行了哪些SQL语句引起了过于频繁的提交。

select a.sid,
a.event,
a.time_waited,
a.time_waited / c.sum_time_waited * 100 pct_wait_time,
d.value user_commits,
round((sysdate - b.logon_time) * 24) hours_connected
from v$session_event a, v$session b, v$sesstat d,
(select sid, sum(time_waited) sum_time_waited
from v$session_event
where event not in ('Null event',
'client message',
'KXFX: Execution Message Dequeue - Slave',
'PX Deq: Execution Msg',
'KXFQ: kxfqdeq - normal deqeue',
'PX Deq: Table Q Normal',
'Wait for credit - send blocked',
'PX Deq Credit: send blkd',
'Wait for credit - need buffer to send',
'PX Deq Credit: need buffer',
'Wait for credit - free buffer',
'PX Deq Credit: free buffer',
'parallel query dequeue wait',
'PX Deque wait',
'Parallel Query Idle Wait - Slaves',
'PX Idle Wait',
'slave wait',
'dispatcher timer',
'virtual circuit status',
'pipe get',
'rdbms ipc message',
'rdbms ipc reply',
'pmon timer',
'smon timer',
'PL/SQL lock timer',
'SQL*Net message from client',
'WMON goes to sleep')
having sum(time_waited) > 0 group by sid) c
where a.sid = b.sid
and a.sid = c.sid
and a.sid = d.sid
and d.statistic# = (select statistic#
from v$statname
where name = 'user commits')
and a.time_waited > 10000
and a.event = 'log file sync'
order by pct_wait_time, hours_connected;

如果应用程序已经确保了使用正确的频率进行提交，则还是发现很严重的log file sync等待事件，则比较常见的原因主要包括：
1) 联机日志文件放在一个很慢的磁盘上。可以用前面我们提到的log file parallel write的平均等待时间应该小于等于10个毫秒来判断，磁盘速度是否过慢。

2) 联机日志文件与其他随机写入的文件放在了同一个磁盘上。从前面已经知道，联机日志文件都是顺序写入的。不能够将其与其他随机写入的文件放在一起，需要存放在单个的磁盘上。

3) 联机日志文件放在了RAID 5的磁盘组里。RAID 5适合频繁的读取，比如数据仓库类的应用等，但不适合频繁写入。

最好能够将每个联机日志文件都放在单独的小的、转速快的磁盘上，与数据文件所在的磁盘阵列分开。
如果可以的话，最好不要使用文件系统来管理磁盘，直接使用RAW设备，直接交给oracle来管理磁盘。而且，oracle中最适合放在RAW设备上的文件就是联机日志文件。

第1页：log buffer的优化第2页：log buffer的统计信息第3页：log buffer的等待事件第4页：其他相关的等待事件第5页：log buffer的设置第6页：减少生成的重做记录的方法第7页：归档

关注我们