凌晨收到同事电话,反馈应用程序访问Oracle数据库时报错,当时现场现象确认:
1. 应用程序访问不了数据库,使用SQL Developer测试发现访问不了数据库。报ORA-12570 TNS:packet reader failure
2. 使用lsnrctl status检查监听,一直没有响应,这个是极少见的情况。
3. 检查数据库状态为OPEN,使用nmon检查系统资源。如下一张截图所示,CPU利用率不高,但是CPU Wait%非常高。这意味着I/O不正常。可能出现了IO等待和争用(IO waits and contention)
CPU Wait%:显示采集间隔内所有CPU处于空闲且等待I/O完成的时间比例,Wait%是CPU空闲状态的一种,当CPU处于空闲状态而又有进程处于D状态(不可中断睡眠)时,系统会统计这时的时间,并计算到Wait%里,Wait%不是一个时间值,而是时间的比例,因此在同样I/O Wait时间下,服务器CPU越多,Wait%越低,它体现了I/O操作与计算操作之间的比例。对I/O密集型的应用来说一般Wait%较高.)
4.打开邮件发现收到大量的监控告警日志作业发出的邮件,检查告警日志,发现里面有大量ORA错误信息,部分内容如下:
3 | | ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5166'
10 | | ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5166'
17 | | ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5166'
24 | | ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5166'
31 | | ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5166'
38 | | ORA-00239: timeout waiting for control file enqueue: held by 'inst 1, osid 5166' for more than 900 seconds
41 | | ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5166'
48 | | ORA-00239: timeout waiting for control file enqueue: held by 'inst 1, osid 5166' for more than 900 seconds