您的位置首页生活百科

ORA-00333: redo log read error block 8194 .

ORA-00333: redo log read error block 8194 .

的有关信息介绍如下:

ORA-00333: redo log read error block 8194 .

ORA-00333: redo log read error block 8194 ..

报错情景:应用无法连接,查看数据库发现oracle无法启动,执行启动命令报错ORA-00333,日志文件损坏。非常规修复之后,把数据库起来,过段时间数据库又宕机。每次宕机之后都需要手动重启

报错根本原因:服务器两块硬盘坏道

恢复步骤:

1:数据库启动报错ORA-00333

2:设置参数“_allow_resetlogs_corruption” 并重启数据库导mount

3:执行recover database until cancel;

4:创建pfile文件并启动数据库

5:关闭数据库并编辑pfile文件

6:启动数据库并创建undo表空间,然后关闭数据库

7:编辑pfile文件并重新启动数据库,然后创建spfile文件

问题定位:

1:查看系统日志

1:数据库启动报错ORA-00333

SQL> select status from v$instance;

STATUS

------------

MOUNTED

SQL> alter database open;

alter database open

*

ERROR at line 1:

ORA-00333: redo log read error block 8194 count 8192

SQL> select group#,sequence#,archived,status from v$log;

GROUP# SEQUENCE# ARC STATUS

---------- ---------- --- ----------------

1 117247 NO CURRENT

3 117246 NO ACTIVE

2 117245 NO INACTIVE

SQL> select group#,member from v$logfile;

GROUP# MEMBER

------ --------------------------------------------------

3 /home/oracle/app/oracle/datafile/orcl/redo03.log

2 /home/oracle/app/oracle/datafile/orcl/redo02.log

1 /home/oracle/app/oracle/datafile/orcl/redo01.log

查看alert日志,group 1日志文件文件损坏:

ORA-00312: online log 1 thread 1: '/home/oracle/app/oracle/datafile/orcl/redo01.log'

ORA-27072: File I/O error

Additional information: 4

Additional information: 8194

Additional information: 1084416

可以看到损坏的是当前的联机日志。

但是这个数据库没有备份,没有开归档,只能使用非常规恢复!

2:设置参数“_allow_resetlogs_corruption” 并重启数据库导mount

查看隐含参数

select a.ksppinm name, b.ksppstvl value, a.ksppdesc description

from x$ksppi a, x$ksppcv b

where a.indx = b.indx and a.ksppinm like '_allow%';

_allow_resetlogs_corruption FALSE allow resetlogs even if it will cause corruption

可以看到这个参数是 数据丢失情况下允许resetlogs 。修改参数默认值:

SQL> Alter system set "_allow_resetlogs_corruption"=true scope=spfile;

System altered.

关闭数据库:

SQL> shutdown immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

启动数据库导mount状态:

SQL> startup mount;

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

3:执行recover database until cancel;

同时查看数据文件:

SQL> col checkpoint_change# for 9999999999999999

SQL> select file#,checkpoint_change# from v$datafile;

FILE# CHECKPOINT_CHANGE#

---------- ------------------

1 14462303120211

2 14462303120211

3 14462303120211

4 14462303120211

5 14462303120211

6 14462303120211

7 14462303120211

可以看到数据库的SCN是一致的。

执行recover database until cancel命令:

SQL> recover database until cancel;

ORA-00279: change 14462303120211 generated at 09/17/2015 22:02:15 needed for thread 1

ORA-00289: suggestion : /home/oracle/app/oracle/archive/orcl/1_117246_814995340.dbf

ORA-00280: change 14462303120211 for thread 1 is in sequence #117246

Specify log: {=suggested | filename | AUTO | CANCEL}

CANCEL ------->选择cancel

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below 这里警告:recover成功但是OPEN RESETLOGS会报错

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: '/home/oracle/app/oracle/datafile/orcl/system01.dbf'

ORA-01112: media recovery not started

SQL> alter database open resetlogs; 执行open resetlogs果然报错

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 2 with name "_SYSSMU2_111974964$" too small

Process ID: 28092

Session ID: 1522 Serial number: 3

4:创建pfile文件并启动数据库

创建pfile文件,可以看到_allow_resetlogs_corruption参数为true

SQL> create pfile='/home/oracle/pfile.ora' from spfile;

File created.

关闭数据库:

SQL> shutdown immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

参数文件内容:

orcl:/home/oracle@jkdb>cat pfile.ora

orcl.__db_cache_size=2516582400

orcl.__java_pool_size=16777216

orcl.__large_pool_size=16777216

orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment

orcl.__pga_aggregate_target=1073741824

orcl.__sga_target=4194304000

orcl.__shared_io_pool_size=0

orcl.__shared_pool_size=1593835520

orcl.__streams_pool_size=16777216

*._allow_resetlogs_corruption=true

*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'

*.audit_trail='NONE'

*.compatible='11.2.0.0.0'

*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'

*.db_block_size=8192

*.db_domain=''

*.db_name='orcl'

*.diagnostic_dest='/home/oracle/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'

*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=1500

*.remote_login_passwordfile='EXCLUSIVE'

*.sessions=1655

*.sga_target=4194304000

*.undo_tablespace='UNDOTBS1'

orcl:/home/oracle@jkdb>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 21 10:22:47 2015

Copyright (c) 1982, 2011, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup pfile='/home/oracle/pfile.ora';

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

ORA-03113: end-of-file on communication channel

Process ID: 30240

Session ID: 1522 Serial number: 3

SQL> select status from v$instance;

STATUS

------------

OPEN

SQL> conn sa/ednns

ERROR:

ORA-00600: internal error code, arguments: , [], [], [], [], [], [], [],

[], [], [], []

虽然数据处于open状态,但是用户连接报错,4193错误通常是因为恢复时redo与undo不一致所导致

5:关闭数据库并编辑pfile文件

orcl:/home/oracle@jkdb>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 21 10:45:03 2015

Copyright (c) 1982, 2011, Oracle. All rights reserved.

Connected.

SQL> shutdown abort

ORACLE instance shut down.

因为shutdown immediate 不能关闭,只能通过shutdown abort。

修改参数文件:

orcl:/home/oracle@jkdb>cat pfile.ora

orcl.__db_cache_size=2516582400

orcl.__java_pool_size=16777216

orcl.__large_pool_size=16777216

orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment

orcl.__pga_aggregate_target=1073741824

orcl.__sga_target=4194304000

orcl.__shared_io_pool_size=0

orcl.__shared_pool_size=1593835520

orcl.__streams_pool_size=16777216

*._allow_resetlogs_corruption=true

*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'

*.audit_trail='NONE'

*.compatible='11.2.0.0.0'

*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'

*.db_block_size=8192

*.db_domain=''

*.db_name='orcl'

*.diagnostic_dest='/home/oracle/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'

*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=1500

*.remote_login_passwordfile='EXCLUSIVE'

*.sessions=1655

*.sga_target=4194304000

#*.undo_tablespace='UNDOTBS1'

*.undo_management='MANUAL'

*.rollback_segments='SYSTEM'

6:启动数据库并创建undo表空间,然后关闭数据库

SQL> startup mount pfile='/home/oracle/pfile.ora';这时候必须先启动到mount状态

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

SQL> alter database open;

Database altered.

SQL> create undo tablespace undotbs2 datafile '/home/oracle/app/oracle/datafile/orcl/users02.dbf' size 100m autoextend on next 50m maxsize unlimited; 创建undo表空间

Tablespace created.

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> exit

7:编辑pfile文件并重新启动数据库,然后创建spfile文件

orcl:/home/oracle@jkdb>cat pfile.ora

orcl.__db_cache_size=2516582400

orcl.__java_pool_size=16777216

orcl.__large_pool_size=16777216

orcl.__oracle_base='/home/oracle/app/oracle'#ORACLE_BASE set from environment

orcl.__pga_aggregate_target=1073741824

orcl.__sga_target=4194304000

orcl.__shared_io_pool_size=0

orcl.__shared_pool_size=1593835520

orcl.__streams_pool_size=16777216

*._allow_resetlogs_corruption=false

*.audit_file_dest='/home/oracle/app/oracle/admin/orcl/adump'

*.audit_trail='NONE'

*.compatible='11.2.0.0.0'

*.control_files='/home/oracle/app/oracle/datafile/orcl/control01.ctl','/home/oracle/app/oracle/datafile/orcl/control02.ctl'

*.db_block_size=8192

*.db_domain=''

*.db_name='orcl'

*.diagnostic_dest='/home/oracle/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'

*.log_archive_dest_1='location=/home/oracle/app/oracle/archive/orcl'

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=1500

*.remote_login_passwordfile='EXCLUSIVE'

*.sessions=1655

*.sga_target=4194304000

*.undo_tablespace='UNDOTBS2'

*.undo_management='AUTO'

#*.rollback_segments='SYSTEM'

这时候将_allow_resetlogs_corruption改为false,指定模式undo表空间为UNDOTBS2,并设置为自动管理

SQL> startup pfile='/home/oracle/pfile.ora';

ORACLE instance started.

Total System Global Area 4175568896 bytes

Fixed Size 2234960 bytes

Variable Size 1644168624 bytes

Database Buffers 2516582400 bytes

Redo Buffers 12582912 bytes

Database mounted.

Database opened.

SQL> conn zxx/zxx 连接用户正常

Connected.

SQL> conn / as sysdba

Connected.

SQL> create spfile from pfile='/home/oracle/pfile.ora';

注意:虽然创建spfile文件,但是由于我碰到服务器又宕机,数据库不能通过spfile启动,只能每次通过pfile文件启动。

像这种因为磁盘问题导致的数据库问题,最好将现有数据库导出来,导入到其他服务器上。

1:查看系统日志

查看系统/var/log/messages,发现都是samba服务报错,怀疑是samba服务导致服务器不断重启。

将samba服务关闭:

[root@jkdb ~]# chkconfig --list | grep smb

smb 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:启用 6:关闭

[root@jkdb ~]# chkconfig --level 345 smb off

[root@jkdb ~]# chkconfig --list | grep smb

smb 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭

service smb stop

关闭之后,数据库正常,第二天发现数据库又宕机了。直接用pfile文件启动数据库,并查看系统日志发现以下报错:

Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code

Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

Sep 22 09:30:13 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d4 58 00 01 00 00

Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code

Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

Sep 22 09:30:24 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d5 00 00 00 08 00

Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] Unhandled error code

Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

Sep 22 09:30:34 jkdb kernel: sd 0:2:0:0: [sda] CDB: Read(10): 28 00 25 8c d5 00 00 00 08 00

怀疑是磁盘坏道导致,到机房查看服务器,果然有两块硬盘报黄灯!,所以需要把数据库中的数据导出来,导入到其他服务器的数据库中。