资讯详情

为什么Grid Infrastructure Rebootless节点防护失败(文档 ID 1502282.1)

适用于:

Oracle Server - Enterprise Edition - Version11.2.0.2 and later Information in this document applies to anyplatform.

用途:

Rebootless防护在11.2.0.2 GridInfrastructure当驱逐发生时,它将试图在驱逐节点正常停止GI,而不是重新启动节点,以避免重新启动节点。如果重新引导防护失败,驱逐节点将重新启动。本文列出了重新引导防护故障的常见原因。

详细信息:

1.资源无法停止。

一个或多个资源不能停止的,rebootless fencing节点将失败并重新启动。

在这种情况下,节点2脑裂后rebootless fencing失败,node2将重启:

驱逐节点<GI_HOME>/log/<node>/alert<node>.log

.. 2012-09-11 12:04:34.363 [cssd(18834)]CRS-1610:Network communication with node racnode1 (1) missing for90% of timeout interval. Removal of this node from cluster in 2.020seconds 2012-09-11 12:04:36.379 [cssd(18834)]CRS-1609:This node is unable to communicate with other nodes inthe cluster and is going down to preserve cluster integrity; details at(:CSSNM00008:) in /ocw/grid/log/racnode2/cssd/ocssd.log. 2012-09-11 12:04:36.379 [cssd(18834)]CRS-1656:The CSS daemon is terminating due to a fatal error;Details at (:CSSSC00012:) in /ocw/grid/log/racnode2/cssd/ocssd.log 2012-09-11 12:04:36.399 [cssd(18834)]CRS-1652:Starting clean up of CRSD resources. 2012-09-11 12:04:36.586 [crsd(26115)]CRS-5833:Cleaning resource 'zDRMON.sh.racnode2 1 1' failed as partof reboot-less node fencing 2012-09-11 12:04:36.588 [cssd(18834)]. 2012-09-11 12:04:37.042 [ohasd(16821)]CRS-2765:Resource 'ora.evmd' has failed on server 'racnode2'. 2012-09-11 12:04:37.052 [/ocw/grid/bin/scriptagent.bin(27696)]CRS-5822:Agent'/ocw/grid/bin/scriptagent_oracle' disconnected from server. Details at(:CRSAGF00117:) {0:4:10} in/ocw/grid/log/racnode2/agent/crsd/scriptagent_oracle/scriptagent_oracle.log. 2012-09-11.062 [ohasd(16821)]CRS-2765:Resource 'ora.crsd' has failed on server'racnode2'. 2012-09-11.356 [ohasd(16677)]CRS-2112:The OLR service started on node racnode2. 2012-09-11 12:10:47.521 [ohasd(16677)]CRS-1301:Oracle High Availability Service started on noderacnode2. 2012-09-11 12:10:47.539 [ohasd(16677)]CRS-8011:reboot advisory message from host: racnode2,component:, with time stamp: L-2012-09-11-12:04:37.140 [ohasd(16677)] 2012-09-11 12:10:47.594 [ohasd(16677)]CRS-8011:reboot advisory message from host: racnode2, component:cssmonit, with time stamp: L-2012-09-11-12:04:37.139 [ohasd(16677)]CRS-8013:reboot advisory message text: clsnomon_status: need toreboot, unexpected failure 8 received from CSS 2012-09-11 12:10:47.605 [ohasd(16677)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory logfiles, 2 were announced and 0 errors occurred

当资源无法停止时,cssdagent或cssdmonitor以下是样本日志,两者都将试图重新引导节点。

<GI_HOME>/agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log

2012-09-11 12:04:36.400: [ USRTHRD][1095805248]clsnpollmsg_main: got posted 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: shutdowninitiated by CSS, requested to sync 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: got HB signal 2012-09-11 12:04:36.400: [ USRTHRD][1097382208] clsnwork_process_work: callingsync 2012-09-11 12:04:36.413: [ USRTHRD][1097382208] clsnwork_process_work: synccompleted 2012-09-11 12:04:37.035: [ CSSCLNT][1095805248]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 22 2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 27 2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread 2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop 2012-09-11 12:04:37.035: [GIPCXCPT][109859168]gipcInternalSend: connection notvalid for send operation endp 0x8e3e60 [00000000000001b7] { gipcEndpoint :localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=3165a05b-7e7139a5-18801))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=7e7139a5-3165a05b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, ret gipcretConnectionLost(12) 2012-09-11 12:04:37.035: [ USRTHRD][1097382208] clsnwork_process_work: callingsync 2012-09-11 12:04:37.035: [ CSSCLNT][1077418304]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 1 2012-09-11 12:04:37.036: [ CSSCLNT][1077418304]clssgsGroupGetStatus: communications failed (0/3/-1)

2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clssgsGroupGetStatus: returning 8

2012-09-11 12:04:37.036: [ USRTHRD][1077418304]clsnomon_status: Communications failure with CSS detected. Waiting for sync tocomplete... 2012-09-11 12:04:37.036: [GIPCXCPT][1098959168]gipcSendSyncF [clsssServerRPC :clsss.c : 6272]: EXCEPTION[ ret gipcretConnectionLost (12) ]  failed tosend on endp 0x8e3e60 [00000000000001b7] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=3165a05b-7e7139a5-18801))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=7e7139a5-3165a05b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, addr 0000000000000000, buf0x4180bd80, len 80, flags 0x8000000 2012-09-11 12:04:37.036: [ CSSCLNT][1098959168]clsssServerRPC: send failed witherr 12, msg type 7

2012-09-11 12:04:37.036: [CSSCLNT][1098959168]clsssCommonClientExit: RPC failure, rc 3

2012-09-11 12:04:37.139: [ USRTHRD][1097382208]clsnwork_process_work: sync completed 2012-09-11 12:04:37.139: [ USRTHRD][1097382208] clsnSyncComplete: posting omon

<GI_HOME>/agent/ohasd/oracssdagent_root/oracssdagent_root.log

2012-09-11 12:04:36.400: [ USRTHRD][1095805248]clsnpollmsg_main: got posted 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: shutdowninitiated by CSS, requested to sync 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop 2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: got HB signal 2012-09-11 12:04:36.400: [ USRTHRD][1097382208] clsnwork_process_work: callingsync 2012-09-11 12:04:36.413: [ USRTHRD][1097382208] clsnwork_process_work: synccompleted 2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 27 2012-09-11 12:04:37.035: [ CSSCLNT][1095805248]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 22 2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcInternalSend: connection notvalid for send operation endp 0x2aaab4014900 [00000000000001c0] { gipcEndpoint :localAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=561e3f6b-a0a3602e-18817))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=a0a3602e-561e3f6b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, ret gipcretConnectionLost(12) 2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread 2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop 2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcSendSyncF [clsssServerRPC :clsss.c : 6272]: EXCEPTION[ ret gipcretConnectionLost (12) ]  failed tosend on endp 0x2aaab4014900 [00000000000001c0] { gipcEndpoint : localAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=561e3f6b-a0a3602e-18817))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=a0a3602e-561e3f6b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, addr 0000000000000000, buf0x4180bd80, len 80, flags 0x8000000 2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssServerRPC: send failed witherr 12, msg type 7

2012-09-11 12:04:37.035: [CSSCLNT][1098959168]clsssCommonClientExit: RPC failure, rc 3

2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clsssRecvMsg: got a disconnect from the server whilewaiting for message type 1 2012-09-11 12:04:37.036: [ CSSCLNT][1077418304]clssgsGroupGetStatus: communications failed (0/3/-1)

2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clssgsGroupGetStatus: returning 8

2012-09-11 12:04:37.036: [ USRTHRD][1077418304]clsnomon_status: Communications failure with CSS detected. Waiting for sync tocomplete... 2012-09-11 12:04:37.036: [ USRTHRD][1097382208] clsnwork_process_work: callingsync

由于CRSD资源(用户资源)无法停止,crsd.log可以作为进一步调试的起点。

标签: omon安全继电器

锐单商城拥有海量元器件数据手册IC替代型号,打造 电子元器件IC百科大全!

 锐单商城 - 一站式电子元器件采购平台  

 深圳锐单电子有限公司