ORA-00333: redo log read error block 8194 .-去问十二

ORA-00333: redo log read error block 8194 .

的有关信息介绍如下：

ORA-00333: redo log read error block 8194 .

ORA-00333: redo log read error block 8194 ..

报错情景：应用无法连接，查看数据库发现oracle无法启动，执行启动命令报错ORA-00333，日志文件损坏。非常规修复之后，把数据库起来，过段时间数据库又宕机。每次宕机之后都需要手动重启

报错根本原因：服务器两块硬盘坏道

恢复步骤：

1：数据库启动报错ORA-00333

2：设置参数“_allow_resetlogs_corruption” 并重启数据库导mount

3：执行recover database until cancel;

4：创建pfile文件并启动数据库

5：关闭数据库并编辑pfile文件

6：启动数据库并创建undo表空间，然后关闭数据库

7：编辑pfile文件并重新启动数据库，然后创建spfile文件

问题定位：

1：查看系统日志

1：数据库启动报错ORA-00333

SQL> select status from v$instance;

STATUS

------------

MOUNTED

SQL> alter database open;

alter database open

ERROR at line 1:

ORA-00333: redo log read error block 8194 count 8192

SQL> select group#,sequence#,archived,status from v$log;

GROUP# SEQUENCE# ARC STATUS

---------- ---------- --- ----------------

1 117247 NO CURRENT

3 117246 NO ACTIVE

2 117245 NO INACTIVE

SQL> select group#,member from v$logfile;

GROUP# MEMBER

------ --------------------------------------------------

3 /home/oracle/app/oracle/datafile/orcl/redo03.log

2 /home/oracle/app/oracle/datafile/orcl/redo02.log

1 /home/oracle/app/oracle/datafile/orcl/redo01.log

查看alert日志，group 1日志文件文件损坏：

ORA-00312: online log 1 thread 1: '/home/oracle/app/oracle/datafile/orcl/redo01.log'

ORA-27072: File I/O error

Additional information: 4

Additional information: 8194

Additional information: 1084416

可以看到损坏的是当前的联机日志。

但是这个数据库没有备份，没有开归档，只能使用非常规恢复！

2：设置参数“_allow_resetlogs_corruption” 并重启数据库导mount

查看隐含参数

select a.ksppinm name, b.ksppstvl value, a.ksppdesc description

from x$ksppi a, x$ksppcv b

where a.indx = b.indx and a.ksppinm like '_allow%';

_allow_resetlogs_corruption FALSE allow resetlogs even if it will cause corruption

可以看到这个参数是数据丢失情况下允许resetlogs 。修改参数默认值：

SQL> Alter system set "_allow_resetlogs_corruption"=true scope=spfile;

System altered.

关闭数据库：

SQL> shutdown immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

启动数据库导mount状态：

SQL> startup mount;

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

3：执行recover database until cancel;

同时查看数据文件：

SQL> col checkpoint_change# for 9999999999999999

SQL> select file#,checkpoint_change# from v$datafile;

FILE# CHECKPOINT_CHANGE#

---------- ------------------

1 14462303120211

2 14462303120211

3 14462303120211

4 14462303120211

5 14462303120211

6 14462303120211

7 14462303120211

可以看到数据库的SCN是一致的。

执行recover database until cancel命令：

SQL> recover database until cancel;

ORA-00279: change 14462303120211 generated at 09/17/2015 22:02:15 needed for thread 1

ORA-00289: suggestion : /home/oracle/app/oracle/archive/orcl/1_117246_814995340.dbf

ORA-00280: change 14462303120211 for thread 1 is in sequence #117246

Specify log: {=suggested | filename | AUTO | CANCEL}

CANCEL ------->选择cancel

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below 这里警告：recover成功但是OPEN RESETLOGS会报错

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: '/home/oracle/app/oracle/datafile/orcl/system01.dbf'

ORA-01112: media recovery not started

SQL> alter database open resetlogs; 执行open resetlogs果然报错

alter database open resetlogs

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 2 with name "_SYSSMU2_111974964$" too small

Process ID: 28092

Session ID: 1522 Serial number: 3

4：创建pfile文件并启动数据库

创建pfile文件，可以看到_allow_resetlogs_corruption参数为true

SQL> create pfile='/home/oracle/pfile.ora' from spfile;

File created.

关闭数据库：

SQL> shutdown immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

参数文件内容：

orcl:/home/oracle@jkdb>cat pfile.ora

orcl.__db_cache_size=2516582400

orcl.__java_pool_size=16777216

orcl.__large_pool_size=16777216

orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment

orcl.__pga_aggregate_target=1073741824

orcl.__sga_target=4194304000

orcl.__shared_io_pool_size=0

orcl.__shared_pool_size=1593835520

orcl.__streams_pool_size=16777216

*._allow_resetlogs_corruption=true

*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'

*.audit_trail='NONE'

*.compatible='11.2.0.0.0'

*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'

*.db_block_size=8192

*.db_domain=''

*.db_name='orcl'

*.diagnostic_dest='/home/oracle/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'

*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=1500

*.remote_login_passwordfile='EXCLUSIVE'

*.sessions=1655

*.sga_target=4194304000

*.undo_tablespace='UNDOTBS1'

orcl:/home/oracle@jkdb>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 21 10:22:47 2015

Connected to an idle instance.

SQL> startup pfile='/home/oracle/pfile.ora';

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

ORA-03113: end-of-file on communication channel

Process ID: 30240

Session ID: 1522 Serial number: 3

SQL> select status from v$instance;

STATUS

------------

OPEN

SQL> conn sa/ednns

ERROR:

ORA-00600: internal error code, arguments: , [], [], [], [], [], [], [],

[], [], [], []

虽然数据处于open状态，但是用户连接报错，4193错误通常是因为恢复时redo与undo不一致所导致

5：关闭数据库并编辑pfile文件

orcl:/home/oracle@jkdb>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 21 10:45:03 2015

Connected.

SQL> shutdown abort

ORACLE instance shut down.

因为shutdown immediate 不能关闭，只能通过shutdown abort。

修改参数文件：

orcl:/home/oracle@jkdb>cat pfile.ora

orcl.__db_cache_size=2516582400

orcl.__java_pool_size=16777216

orcl.__large_pool_size=16777216

orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment

orcl.__pga_aggregate_target=1073741824

orcl.__sga_target=4194304000

orcl.__shared_io_pool_size=0

orcl.__shared_pool_size=1593835520

orcl.__streams_pool_size=16777216

*._allow_resetlogs_corruption=true

*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'

*.audit_trail='NONE'

*.compatible='11.2.0.0.0'

*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'

*.db_block_size=8192

*.db_domain=''

*.db_name='orcl'

*.diagnostic_dest='/home/oracle/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'

*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=1500

*.remote_login_passwordfile='EXCLUSIVE'

*.sessions=1655

*.sga_target=4194304000

#*.undo_tablespace='UNDOTBS1'

*.undo_management='MANUAL'

*.rollback_segments='SYSTEM'

6：启动数据库并创建undo表空间，然后关闭数据库

SQL> startup mount pfile='/home/oracle/pfile.ora';这时候必须先启动到mount状态

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

SQL> alter database open;

Database altered.

SQL> create undo tablespace undotbs2 datafile '/home/oracle/app/oracle/datafile/orcl/users02.dbf' size 100m autoextend on next 50m maxsize unlimited; 创建undo表空间

Tablespace created.

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> exit

7：编辑pfile文件并重新启动数据库，然后创建spfile文件

orcl:/home/oracle@jkdb>cat pfile.ora

orcl.__db_cache_size=2516582400

orcl.__java_pool_size=16777216

orcl.__large_pool_size=16777216

orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment

orcl.__pga_aggregate_target=1073741824

orcl.__sga_target=4194304000

orcl.__shared_io_pool_size=0

orcl.__shared_pool_size=1593835520

orcl.__streams_pool_size=16777216

*._allow_resetlogs_corruption=false

*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'

*.audit_trail='NONE'

*.compatible='11.2.0.0.0'

*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'

*.db_block_size=8192

*.db_domain=''

*.db_name='orcl'

*.diagnostic_dest='/home/oracle/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'

*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=1500

*.remote_login_passwordfile='EXCLUSIVE'

*.sessions=1655

*.sga_target=4194304000

*.undo_tablespace='UNDOTBS2'

*.undo_management='AUTO'

#*.rollback_segments='SYSTEM'

这时候将_allow_resetlogs_corruption改为false，指定模式undo表空间为UNDOTBS2，并设置为自动管理

SQL> startup pfile='/home/oracle/pfile.ora';

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

Database opened.

SQL> conn zxx/zxx 连接用户正常

Connected.

SQL> conn / as sysdba

Connected.

SQL> create spfile from pfile='/home/oracle/pfile.ora';

注意：虽然创建spfile文件，但是由于我碰到服务器又宕机，数据库不能通过spfile启动，只能每次通过pfile文件启动。

像这种因为磁盘问题导致的数据库问题，最好将现有数据库导出来，导入到其他服务器上。

1：查看系统日志

查看系统/var/log/messages，发现都是samba服务报错，怀疑是samba服务导致服务器不断重启。

将samba服务关闭：

[root@jkdb ~]# chkconfig --list | grep smb

smb 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:启用 6:关闭

[root@jkdb ~]# chkconfig --level 345 smb off

[root@jkdb ~]# chkconfig --list | grep smb

smb 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭

service smb stop

关闭之后，数据库正常，第二天发现数据库又宕机了。直接用pfile文件启动数据库，并查看系统日志发现以下报错：

Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code

Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d4 58 00 01 00 00

Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code

Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d5 00 00 00 08 00

Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code

Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d5 00 00 00 08 00

怀疑是磁盘坏道导致，到机房查看服务器，果然有两块硬盘报黄灯！，所以需要把数据库中的数据导出来，导入到其他服务器的数据库中。