本文主要测试OracleDataGuard三种保护模式是最大的保护、最大的可用性和最大的性能配置和事务同步场景。
这种保护模式在不影响主库可用性的情况下提供了最高级别的数据保护。主库只需要交易redo至少至少一个同步备库standby log事务只有在(取决于配置)之后才会提交。
如果主数据库不能redo如果写入至少一个同步备用数据库,主库模式将转换为最高性能模式,以保持主数据库的可用性,直到它能再次使用redo写入备库。
这种保护模式尽可能保证零数据丢失,除非有一些双重故障,护模式可以尽可能保证零数据丢失。
这是默认保护模式。在主备模式下提供最高性能。无需等待备库确认,即可立即提交主库事务。redo也会写入一个或多个备库,但与主库提交异步相比,主数据库的性能不受影响redo日志写入备库延迟的影响。
该保护模式提供的数据保护略低于最大可用性模式,对主数据库性能的影响最小。
最大模式可以确保在主库出现故障时不会丢失数据。为了提供这一级别的保护,事务提交redo日志必须在事务提交之前写入至少一个同步备库。为了确保数据不会丢失,如果主库事务提交的redo日志不能写入至少一个同步库,数据库会shutdown,不继续处理事务。
SYNC:主库同步同步redo log到备库。
ASYN:主库采用异步同步的方式redo log到备库。
AFFIRM:提交主库事务后,需要确认主库redo log传输到仓库并写入standby log 在磁盘上返回提交。
NOAFFIRM:提交主库事务后,只需确认主库事务redo log将其传输到备库,无需确认写入磁盘即可提交。
查看当前的保护模式,默认最大性能
1* select database_role,protection_mode,protection_level from v$database DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL -------------------------------- ---------------------------------------- ---------------------------------------- PRIMARY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
注意通过命令切换到最大可用模式log_archive_dest_n要设置SYNC AFFIRM或者SYNC NOAFFIRM
--执行切换最大可用模式 ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY; --查看当前的保护模式,可以找到当前的保护模式protection_level为RESYNCHRONIZATION,不是MAXIMUM AVAILABILITY --log_archive_dest_n 没有设置sync,需要log_archive_dest_n设置为sync SQL> select database_role,protection_mode,protection_level from v$database; DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL -------------------------------- ---------------------------------------- ---------------------------------------- PRIMARY MAXIMUM AVAILABILITY RESYNCHRONIZATION --调整log_archive_dest_n alter system set log_archive_dest_2='service=orcldg LGWR SYNC delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 net_timeout=10 valid_for=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=orcldg' scope=both sid='*'; --为了再次查看当前的保护模式,MAXIMUM AVAILABILITY 1 select database_role,protection_mode,protection_level from v$database DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL -------------------------------- ---------------------------------------- ---------------------------------------- PRIMARY MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY --同步链路的状态为SYNCHRONIZED LGWR (PID:9029): LAD:2 is SYNCHRONIZED LGWR (PID:9029): SRL selected to archive T-1.S-28 LGWR (PID:9029): SRL selected for T-1.S-28 for LAD:2
同步测试事务(模拟主要网络故障,传输异常)
--删除数据,正常提交事务 SQL> delete from test where rownum<2; 1 row deleted. Elapsed: 00:00:00.00 SQL> commit; Commit complete. Elapsed: 00:00:00.00 SQL> --现在模拟网络故障,备库网卡down ifdown ens38 --删除数据,提交事务hang住,直到超时,才完成提交 SQL> delete from test where rownum<2; 1 row deleted. Elapsed: 00:00:00.00 SQL> commit; Commit complete. Elapsed: 00:00:08.14 --后台lgwr进程,网络超时10秒(net_timeout参数控制) *** 2022-07-10T22:10:43.446647 08:00 *** 2022-07-10 22:10:43.446615 [krsw.c:3617] krsw_ksr_receive: Current time <07/10/2022 22:10:33> exceeds begin time <07/10/2022 22:10:33> by <10> seconds, return timed out error *** 2022-07-10 22:10:43.447048 [krsh.c:6348] ORA-16198: Received timed out error from KSR *** 2022-07-10 22:10:43.447162 [krsh.c:6348] Attempting LAD:2 network reconnect (16198) *** 2022-07-10 22:10:43.447385 [krsh.c:6348] LAD:2 network reconnect abandoned krsw_ksr_status: Received error 16198 on receiving channel messages <error barrier> at 0x7ffc2e3db788 placed krsl.c@7092 ORA-16198: Timeout incurred on internal channel during remote archival --查看当前的保护模式,现在可以找到protection_level降级为RESYNCHRONIZATION SQL> select database_role,protection_mode,protection_level from v$database; DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL -------------------------------- ---------------------------------------- ---------------------------------------- PRIMARY MAXIMUM AVAILABILITY RESYNCHRONIZATION
同步测试事务(模拟备库应用异常)
--直接停止备库mrp进程 SQL> alter database recover managed standby database cancel; Database altered. SQL> select open_mode from v$database; OPEN_MODE -------------------
READ ONLY
SQL>
--主库提交事务,没有影响,因为事务redo log还是可以正常传输到备库的standby log
SQL> select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
SQL>
SQL>
SQL> delete from test where rownum<2;
1 row deleted.
SQL>
commit;
Commit complete.
SQL>
保护模式重新恢复到MAXIMUM AVAILABILITY
--在主库重新enable链路,等待主备重新实时同步之后,模式就自动切换为MAXIMUM AVAILABILITY
SQL> alter system set log_archive_dest_state_2='enable';
System altered.
Elapsed: 00:00:00.01
1 select database_role,protection_mode,protection_level from v$database
2*
DATABASE_ROLE PROTECTION_MODE
-------------------------------- ----------------------------------------
PROTECTION_LEVEL
----------------------------------------
PRIMARY MAXIMUM AVAILABILITY
MAXIMUM AVAILABILITY
--重新升级为sync
*** 2022-07-10T22:14:14.479490+08:00
*** 2022-07-10 22:14:14.479485 [krsw.c:4550]
krsw_nss_init: NSS2 (PID:28725) for mode SYNC has been re-initialized
krsw_nss_init: Performing a channel reset to ignore previous responses
krsw_nss_init: Successfully reused NSS2 (PID:28725) for service orcldg mode SYNC ocis = 0x7ffb4f2bd140
*** 2022-07-10 22:14:14.509428 [krsu.c:7534]
krsu_nss_upi_ahm: upiahm connect done status is 0
通过命令方式切换为最大性能
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;
Database altered.
SQL> select database_role,protection_mode,protection_level from v$database
2 ;
DATABASE_ROLE PROTECTION_MODE
-------------------------------- ----------------------------------------
PROTECTION_LEVEL
----------------------------------------
PRIMARY MAXIMUM PERFORMANCE
MAXIMUM PERFORMANCE
配置最大模式,log_archive_dest_N注意设置为ASYNC+NOAFFIRM,如果还是设置为SYNC模式,即使是最大性能模式,还是会进行sync复制,如果出现备库网络问题,依然会导致主库提交出现等待,直到网络超时
--测试在最大性能下,由于设置了SYNC,备库网络问题,依然会导致主库提交出现等待,直到网络超时
select database_role,protection_mode,protection_level from v$database
2*
DATABASE_ROLE PROTECTION_MODE
-------------------------------- ----------------------------------------
PROTECTION_LEVEL
----------------------------------------
PRIMARY MAXIMUM PERFORMANCE
MAXIMUM PERFORMANCE
SQL> show parameter log_archive_dest_2
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
log_archive_dest_2 string service=orcldg LGWR SYNC NOAFF
IRM delay=0 optional compress
ion=disable max_failure=0 max_
connections=1 reopen=300 net_
timeout=10 valid_for=(ONLINE_L
OGFILES,PRIMARY_ROLE) db_uniq
ue_name=orcldg
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务,事务hang住,直到超时,才完成提交
SQL> delete from test where rownum<2;
1 row deleted.
Elapsed: 00:00:00.00
SQL> commit;
Commit complete.
Elapsed: 00:00:07.43
设置为ASYNC+NOAFFIRM,即使备库网络异常,也不会影响主库
--log_archive_dest_n设置为ASYNC+NOAFFIRM
alter system set log_archive_dest_2='service=orcldg LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 net_timeout=10 valid_for=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=orcldg' scope=both sid='*';
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务正常,不需要备库返回
SQL> delete from test where rownum<2;
1 row deleted.
Elapsed: 00:00:00.00
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL>
通过命令方式切换为最大可用模式,注意log_archive_dest_n要设置SYNC+AFFIRM
--设置最大保护,数据库需要启动到mount,不能open
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PROTECTION;
--切换为最大保护模式
1* select database_role,protection_mode,protection_level from v$database
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM PROTECTION MAXIMUM PROTECTION
--在最大模式下,不能在线更改不满足最大保护的参数,例如:修改为ASYNC
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-16159: Cannot change protected standby destination attributes
--在最大模式下,至少要存在一个备库,或者不能正常open
ORA-16072: a minimum of one standby database destination is required
对事务同步进行测试
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务,事务hang住
SQL> delete from test where rownum<2;
1 row deleted.
SQL> delete from test where rownum<2;
1 row deleted.
SQL> commit;
---------->>>hang 住
--并且会一直尝试和备库重连,期间整个数据库处于hang住状态,直到于备库恢复同步
GWR (PID:33181): ORA-16198: Received timed out error from KSR
LGWR (PID:33181): Error 16198 attaching to RFS for reconnect
2022-07-10T22:55:35.002269+08:00
***********************************************************************
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.242)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=ol7db1)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 19.0.0.0.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Version 19.10.0.0.0
Time: 10-JUL-2022 22:55:35
Tracing not turned on.
Tns error struct:
ns main err code: 12543
TNS-12543: TNS:destination host unreachable
ns secondary err code: 12560
nt main err code: 513
TNS-00513: Destination host unreachable
nt secondary err code: 113
nt OS err code: 0
2022-07-10T22:55:38.010715+08:00
***********************************************************************
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.242)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=ol7db1)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 19.0.0.0.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Version 19.10.0.0.0
Time: 10-JUL-2022 22:55:38
Tracing not turned on.
Tns error struct:
ns main err code: 12543
TNS-12543: TNS:destination host unreachable
ns secondary err code: 12560
nt main err code: 513
TNS-00513: Destination host unreachable
nt secondary err code: 113
nt OS err code: 0
2022-07-10T22:55:38.011765+08:00
NSS2 (PID:33919): Error 12543 received logging on to the standby
2022-07-10T22:55:39.159327+08:00
Starting background process NSS2
2022-07-10T22:55:39.169416+08:00
NSS2 started with pid=44, OS id=33941
--恢复备库网络之后
ifup ens33
--数据库重新恢复正常
*** 2022-07-10T23:00:10.711866+08:00
*** 2022-07-10 23:00:10.711851 [krsw.c:4550]
krsw_nss_init: NSS2 (PID:34615) for mode SYNC has been initialized
krsw_nss_init: Performing a channel reset to ignore previous responses
krsw_nss_init: Successfully started NSS2 (PID:34615) for service orcldg mode SYNC ocis = 0x7f2a5631b140
*** 2022-07-10 23:00:10.760257 [krsu.c:7534]
krsu_nss_upi_ahm: upiahm connect done status is 0
对事务同步进行测试(模拟备库应用异常)
--直接停掉备库mrp进程
SQL> alter database recover managed standby database cancel;
Database altered.
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ ONLY
SQL>
--主库提交事务,没有影响,因为事务redo log还是可以正常传输到备库的standby log
1* select database_role,protection_mode,protection_level from v$database
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM PROTECTION MAXIMUM PROTECTION
SQL> delete from test where rownum<2;
1 row deleted.
SQL> commit;
Commit complete.
SQL>
从测试来看,最大性能模式下数据库的性能最好,不受备库的同步影响,最大保护模式下,如果主备网络同步出现问题,主库将出现hang住的情况,对业务影响较大,而最大可用模式下,数据库的性能以及数据保护都得到兼备,在备库同步出现问题的情况下,可以进行降级,避免对生产业务造成影响。