ORA-00333: redo log read error block 8194 .
的有关信息介绍如下:ORA-00333: redo log read error block 8194 ..
报错情景:应用无法连接,查看数据库发现oracle无法启动,执行启动命令报错ORA-00333,日志文件损坏。非常规修复之后,把数据库起来,过段时间数据库又宕机。每次宕机之后都需要手动重启
报错根本原因:服务器两块硬盘坏道
恢复步骤:
1:数据库启动报错ORA-00333
2:设置参数“_allow_resetlogs_corruption” 并重启数据库导mount
3:执行recover database until cancel;
4:创建pfile文件并启动数据库
5:关闭数据库并编辑pfile文件
6:启动数据库并创建undo表空间,然后关闭数据库
7:编辑pfile文件并重新启动数据库,然后创建spfile文件
问题定位:
1:查看系统日志
1:数据库启动报错ORA-00333
SQL> select status from v$instance;
STATUS
------------
MOUNTED
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00333: redo log read error block 8194 count 8192
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 117247 NO CURRENT
3 117246 NO ACTIVE
2 117245 NO INACTIVE
SQL> select group#,member from v$logfile;
GROUP# MEMBER
------ --------------------------------------------------
3 /home/oracle/app/oracle/datafile/orcl/redo03.log
2 /home/oracle/app/oracle/datafile/orcl/redo02.log
1 /home/oracle/app/oracle/datafile/orcl/redo01.log
查看alert日志,group 1日志文件文件损坏:
ORA-00312: online log 1 thread 1: '/home/oracle/app/oracle/datafile/orcl/redo01.log'
ORA-27072: File I/O error
Additional information: 4
Additional information: 8194
Additional information: 1084416
可以看到损坏的是当前的联机日志。
但是这个数据库没有备份,没有开归档,只能使用非常规恢复!
2:设置参数“_allow_resetlogs_corruption” 并重启数据库导mount
查看隐含参数
select a.ksppinm name, b.ksppstvl value, a.ksppdesc description
from x$ksppi a, x$ksppcv b
where a.indx = b.indx and a.ksppinm like '_allow%';
_allow_resetlogs_corruption FALSE allow resetlogs even if it will cause corruption
可以看到这个参数是 数据丢失情况下允许resetlogs 。修改参数默认值:
SQL> Alter system set "_allow_resetlogs_corruption"=true scope=spfile;
System altered.
关闭数据库:
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
启动数据库导mount状态:
SQL> startup mount;
ORACLE instance started.
Total System Global Area 4175568896 bytes
Fixed Size 2234960 bytes
Variable Size 1644168624 bytes
Database Buffers 2516582400 bytes
Redo Buffers 12582912 bytes
Database mounted.
3:执行recover database until cancel;
同时查看数据文件:
SQL> col checkpoint_change# for 9999999999999999
SQL> select file#,checkpoint_change# from v$datafile;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 14462303120211
2 14462303120211
3 14462303120211
4 14462303120211
5 14462303120211
6 14462303120211
7 14462303120211
可以看到数据库的SCN是一致的。
执行recover database until cancel命令:
SQL> recover database until cancel;
ORA-00279: change 14462303120211 generated at 09/17/2015 22:02:15 needed for thread 1
ORA-00289: suggestion : /home/oracle/app/oracle/archive/orcl/1_117246_814995340.dbf
ORA-00280: change 14462303120211 for thread 1 is in sequence #117246
Specify log: {
CANCEL ------->选择cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below 这里警告:recover成功但是OPEN RESETLOGS会报错
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/app/oracle/datafile/orcl/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs; 执行open resetlogs果然报错
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 2 with name "_SYSSMU2_111974964$" too small
Process ID: 28092
Session ID: 1522 Serial number: 3
4:创建pfile文件并启动数据库
创建pfile文件,可以看到_allow_resetlogs_corruption参数为true
SQL> create pfile='/home/oracle/pfile.ora' from spfile;
File created.
关闭数据库:
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
参数文件内容:
orcl:/home/oracle@jkdb>cat pfile.ora
orcl.__db_cache_size=2516582400
orcl.__java_pool_size=16777216
orcl.__large_pool_size=16777216
orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment
orcl.__pga_aggregate_target=1073741824
orcl.__sga_target=4194304000
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=1593835520
orcl.__streams_pool_size=16777216
*._allow_resetlogs_corruption=true
*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'
*.audit_trail='NONE'
*.compatible='11.2.0.0.0'
*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='orcl'
*.diagnostic_dest='/home/oracle/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'
*.open_cursors=300
*.pga_aggregate_target=1073741824
*.processes=1500
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=1655
*.sga_target=4194304000
*.undo_tablespace='UNDOTBS1'
orcl:/home/oracle@jkdb>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 21 10:22:47 2015
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/home/oracle/pfile.ora';
ORACLE instance started.
Total System Global Area 4175568896 bytes
Fixed Size 2234960 bytes
Variable Size 1644168624 bytes
Database Buffers 2516582400 bytes
Redo Buffers 12582912 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 30240
Session ID: 1522 Serial number: 3
SQL> select status from v$instance;
STATUS
------------
OPEN
SQL> conn sa/ednns
ERROR:
ORA-00600: internal error code, arguments: , [], [], [], [], [], [], [],
[], [], [], []
虽然数据处于open状态,但是用户连接报错,4193错误通常是因为恢复时redo与undo不一致所导致
5:关闭数据库并编辑pfile文件
orcl:/home/oracle@jkdb>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 21 10:45:03 2015
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected.
SQL> shutdown abort
ORACLE instance shut down.
因为shutdown immediate 不能关闭,只能通过shutdown abort。
修改参数文件:
orcl:/home/oracle@jkdb>cat pfile.ora
orcl.__db_cache_size=2516582400
orcl.__java_pool_size=16777216
orcl.__large_pool_size=16777216
orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment
orcl.__pga_aggregate_target=1073741824
orcl.__sga_target=4194304000
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=1593835520
orcl.__streams_pool_size=16777216
*._allow_resetlogs_corruption=true
*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'
*.audit_trail='NONE'
*.compatible='11.2.0.0.0'
*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='orcl'
*.diagnostic_dest='/home/oracle/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'
*.open_cursors=300
*.pga_aggregate_target=1073741824
*.processes=1500
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=1655
*.sga_target=4194304000
#*.undo_tablespace='UNDOTBS1'
*.undo_management='MANUAL'
*.rollback_segments='SYSTEM'
6:启动数据库并创建undo表空间,然后关闭数据库
SQL> startup mount pfile='/home/oracle/pfile.ora';这时候必须先启动到mount状态
ORACLE instance started.
Total System Global Area 4175568896 bytes
Fixed Size 2234960 bytes
Variable Size 1644168624 bytes
Database Buffers 2516582400 bytes
Redo Buffers 12582912 bytes
Database mounted.
SQL> alter database open;
Database altered.
SQL> create undo tablespace undotbs2 datafile '/home/oracle/app/oracle/datafile/orcl/users02.dbf' size 100m autoextend on next 50m maxsize unlimited; 创建undo表空间
Tablespace created.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
7:编辑pfile文件并重新启动数据库,然后创建spfile文件
orcl:/home/oracle@jkdb>cat pfile.ora
orcl.__db_cache_size=2516582400
orcl.__java_pool_size=16777216
orcl.__large_pool_size=16777216
orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment
orcl.__pga_aggregate_target=1073741824
orcl.__sga_target=4194304000
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=1593835520
orcl.__streams_pool_size=16777216
*._allow_resetlogs_corruption=false
*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'
*.audit_trail='NONE'
*.compatible='11.2.0.0.0'
*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='orcl'
*.diagnostic_dest='/home/oracle/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'
*.open_cursors=300
*.pga_aggregate_target=1073741824
*.processes=1500
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=1655
*.sga_target=4194304000
*.undo_tablespace='UNDOTBS2'
*.undo_management='AUTO'
#*.rollback_segments='SYSTEM'
这时候将_allow_resetlogs_corruption改为false,指定模式undo表空间为UNDOTBS2,并设置为自动管理
SQL> startup pfile='/home/oracle/pfile.ora';
ORACLE instance started.
Total System Global Area 4175568896 bytes
Fixed Size 2234960 bytes
Variable Size 1644168624 bytes
Database Buffers 2516582400 bytes
Redo Buffers 12582912 bytes
Database mounted.
Database opened.
SQL> conn zxx/zxx 连接用户正常
Connected.
SQL> conn / as sysdba
Connected.
SQL> create spfile from pfile='/home/oracle/pfile.ora';
注意:虽然创建spfile文件,但是由于我碰到服务器又宕机,数据库不能通过spfile启动,只能每次通过pfile文件启动。
像这种因为磁盘问题导致的数据库问题,最好将现有数据库导出来,导入到其他服务器上。
1:查看系统日志
查看系统/var/log/messages,发现都是samba服务报错,怀疑是samba服务导致服务器不断重启。
将samba服务关闭:
[root@jkdb ~]# chkconfig --list | grep smb
smb 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:启用 6:关闭
[root@jkdb ~]# chkconfig --level 345 smb off
[root@jkdb ~]# chkconfig --list | grep smb
smb 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
service smb stop
关闭之后,数据库正常,第二天发现数据库又宕机了。直接用pfile文件启动数据库,并查看系统日志发现以下报错:
Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code
Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d4 58 00 01 00 00
Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code
Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d5 00 00 00 08 00
Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code
Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d5 00 00 00 08 00
怀疑是磁盘坏道导致,到机房查看服务器,果然有两块硬盘报黄灯!,所以需要把数据库中的数据导出来,导入到其他服务器的数据库中。