文章目录
- 前言
- 环境描述
- 监视器配置
- 单个监视器
- 两个监视器
- 一点题外话
- 三个监视器
- 整体状态
- 关闭1号节点
- 恢复1号节点
- 一点推论
- 总结
- 遗留问题
前言
在数据库集群设计中,脑裂纹是一个古老而经典的场景。在早期的数据库集群架构中,仲裁通常被用来帮助根据前一个状态判断主要角色,然后决定如何切换然而,单一仲裁的设计导致了两个孤岛网络中主要集群的双重问题。 对于这类问题,早期集群,如MySQL的MHA,将使用多节点、多路径检测来最大限度地澄清节点丢失的状态,放弃切换动作,虽然避免了双主的问题,但也放弃了在特定场景下自动切换的能力 在后期的架构中,引入了各种结构paxos协议作为基本共识协议的变种和改进版本,包括raft在各种分布式场景下,保证协议最主流的一致性,其主要改进在于LOG确保优化顺序和优化顺序LEADER加强地位,避免任期波动对服务层的影响,DM在确认监视器层中也引入了8raft协议,试着通过测试来理解它的工作原理。
环境描述
测试基于2节点DW配合3节点监视器,共用3台机器测试监视器raft协议工作原理如下
内部IP | 外部IP | 端口 | 用途 |
---|---|---|---|
10.30.5.17 | 192.168.56.7 | 52141 | DMWATCHER_1 |
- | - | 8341 | MONITOR_3 |
10.30.5.18 | 192.168.56.8 | 52142 | DMWATCHER_2 |
- | - | 8340 | MONITOR_2 |
10.30.5.24 | 192.168.56.24 | 8341 | MONITOR_1 |
监视器配置
MONITOR_1
MON_DW_CONFIRM = 1 MON_LOG_PATH = /opt/dw/log MON_LOG_INTERVAL = 60 MON_LOG_FILE_SIZE = 32 MON_LOG_SPACE_LIMIT = 0 MON_INST_NUM = 3 MON_HB_INTERVAL = 60 MON_BRO_INTERVAL = 100 MON_VOTE_INTERVAL = 100 MON_ID = 1 MON_MID = 45614 [GRP1] MON_INST_OGUID = 453331 MON_DW_IP = 192.168.56.7:52141 MON_DW_IP = 192.168.56.8:52142 [MON1] MON_HOST = 192.168.56.24 MON_PORT = 8339 MON_INST_ID = 1 [MON2] MON_HOST = 192.168.56.8 MON_PORT = 8340 MON_INST_ID = 2 [MON3] MON_HOST = 192.168.56.7 MON_PORT = 8341 MON_INST_ID = 3
MONITOR_2
MON_DW_CONFIRM = 1 MON_LOG_PATH = /opt/rt_02/DAMENG MON_LOG_INTERVAL = 60 MON_LOG_FILE_SIZE = 32 MON_LOG_SPACE_LIMIT = 0 MON_INST_NUM = 3 MON_HB_INTERVAL = 60 MON_BRO_INTERVAL = 100 MON_VOTE_INTERVAL = 100 MON_ID = 2 MON_MID = 45614 [GRP1] MON_INST_OGUID = 453331 MON_DW_IP = 192.168.56.7:52141 MON_DW_IP = 192.168.56.8:52142 [MON1] MON_HOST = 192.168.56.24 MON_PORT = 8339 MON_INST_ID = 1 [MON2] MON_HOST = 192.168.56.8 MON_PORT = 8340 MON_INST_ID = 2 [MON3] MON_HOST = 192.168.56.7 MON_PORT = 8341 MON_INST_ID = 3
MONITOR_3
MON_DW_CONFIRM = 1 MON_LOG_PATH = /opt/rt_01/DAMENG MON_LOG_INTERVAL = 60 MON_LOG_FILE_SIZE = 32 MON_LOG_SPACE_LIMIT = 0 MON_INST_NUM = 3 MON_HB_INTERVAL = 60 MON_BRO_INTERVAL = 100 MON_VOTE_INTERVAL = 100 MON_ID = 3 MON_MID = 45614 [GRP1] MON_INST_OGUID = 453331 MON_DW_IP = 192.168.56.7:52141 MON_DW_IP = 192.168.56.8:52142 [MON1] MON_HOST = 192.168.56.24 MON_PORT = 8339 MON_INST_ID = 1 [MON2] MON_HOST = 192.168.56.8 MON_PORT = 8340 MON_INST_ID = 2 [MON3] MON_HOST = 192.168.56.7 MON_PORT = 8341 MON_INST_ID = 3
单个监视器
首先启动MONITOR_观察相关行为
[dmdba@dmdw0 config]$ /opt/dw/dmdbms/bin/dmmonitor path=dmmonitor.ini [monitor] 2022-04-09 07:28:14: DMMONITOR[4.0] V8 [monitor] 2022-04-09 07:28:14: DMMONITOR[4.0] IS READY. show monitor [monitor] 2022-04-09 07:29:35: The monitor is not LEADER show state 2022-04-09 07:29:38 #--------------------------------------------------------------------------------# GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO. MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 3018 ms MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT MON1 Active 1 CANDIDATE 192.168.56.24 8339 MON2 Active 2 NOT LEADER 192.168.56.8 8340 MON3 Active 3 NOT LEADER 192.168.56.7 8341 #--------------------------------------------------------------------------------#
只有一个节点启动时才不满意raft当选协议超过一半票数的前提,因此节点只能停留在CANDIDATE同时引入状态raft协议后,非LEADER节点将无法启动show state其他命令自然没有切换能力 ,启动后会生成相应的1号节点raft日志信息如下
[dmdba@dmdw0 log]$ vi dm_raft\[mon1_45614\]_202204.log 2022-04-09 07:28:14.616 [INFO] raft P0000004554 T0000000000000004554 ECS EP XSIT POOL : guid [48003] HB interval [60s]
2022-04-09 07:28:14.618 [INFO] raft P0000004554 T0000000000000004554 ECS AP XSITE POOL : guid [520993] HB interval [60s]
2022-04-09 07:28:17.155 [INFO] raft P0000004554 T0000000000000004561 raft[1] election starting: 2450 2547, term: 305, currentIdx: 20253
2022-04-09 07:28:17.155 [INFO] raft P0000004554 T0000000000000004561 raft[1] becoming candidate
2022-04-09 07:28:17.155 [INFO] raft P0000004554 T0000000000000004560 raft[1] sending requestVote to 3, currentTerm: 306, last_index: 20253, last_term: 305
2022-04-09 07:28:17.156 [INFO] raft P0000004554 T0000000000000004559 raft[1] sending requestVote to 2, currentTerm: 306, last_index: 20253, last_term: 305
2022-04-09 07:28:17.156 [ERROR] raft P0000004554 T0000000000000004560 Can't connect to DM server on '192.168.56.7' port(8341) errno(111)
2022-04-09 07:28:17.156 [ERROR] raft P0000004554 T0000000000000004559 Can't connect to DM server on '192.168.56.8' port(8340) errno(111)
2022-04-09 07:28:17.156 [INFO] raft P0000004554 T0000000000000004560
......
从日志中可以观察到这样一些典型的raft协议选举信息 1.本集群没有找到LEADER,所以会不断发起新一轮选举,观察到term的推进,以及向集群内其他节点发起投票请求的信息。 2.由于没有LEADER节点的存在,不存在log写入行为,所以index不会推进
观察本地端口
[root@dmdw0 ~]# netstat -an|grep 52142
[root@dmdw0 ~]# netstat -an|grep 52141
并不会向DW建立连接,所以非LEADER节点实际上并不与DW建立连接
尝试抓包端口
[root@dmdw0 ~]# tcpdump -s0 -e -nn -vvv -i enp0s8 port 8339 -X -xx
tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes
此时该端口不会有任何数据通信,这部分会在后续测试中说明
两个监视器
启动MONITOR_2来观察现象
show state
2022-04-09 07:43:15
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 1821 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON2 Active 2 FOLLOWER 192.168.56.8 8340
MON1 Active 1 LEADER 192.168.56.24 8339
MON3 Active 3 NOT LEADER 192.168.56.7 8341
#--------------------------------------------------------------------------------#
本节点成为了FOLLOWER,MONITOR_1节点成为了LEADER
2号节点raft日志
[root@dmdsc1 log]# vi dm_raft\[mon2_45614\]_202204.log
2022-04-09 07:42:50.043 [INFO] raft P0000003132 T0000000000000003132 ECS EP XSITE POOL : guid [926484] HB interval [60s]
2022-04-09 07:42:50.045 [INFO] raft P0000003132 T0000000000000003132 ECS AP XSITE POOL : guid [632227] HB interval [60s]
2022-04-09 07:42:51.553 [INFO] raft P0000003132 T0000000000000003136 xmal_cache_esite site(0x7fed64002098) site_type(1) esite_guid(688974)
2022-04-09 07:42:51.553 [INFO] raft P0000003132 T0000000000000003136 xmal_ep2ap_conn_process success, inout_type(1) esite_guid(688974) asite(22557168238924) asite_type(0)
2022-04-09 07:42:51.554 [INFO] raft P0000003132 T0000000000000003136 xmal_ep2ap_conn_process success, inout_type(0) esite_guid(688974) asite(22557168238924) asite_type(0)
2022-04-09 07:42:51.555 [INFO] raft P0000003132 T0000000000000003147 raft[2] raft_process_request_vote from node[1]
2022-04-09 07:42:51.555 [INFO] raft P0000003132 T0000000000000003147 raft[2] becoming follower
2022-04-09 07:42:51.555 [INFO] raft P0000003132 T0000000000000003147 raft[2] node request vote: 1 replying: granted
2022-04-09 07:42:51.656 [ERROR] raft P0000003132 T0000000000000003147 raft[2] AppendEntries no log at prev_idx 20253
2022-04-09 07:42:51.755 [INFO] raft P0000003132 T0000000000000003147 raft[2] raft_process_snapshot from node[1]
2022-04-09 07:42:51.756 [INFO] raft P0000003132 T0000000000000003147 raft[2] becoming follower
2022-04-09 07:42:51.756 [INFO] raft P0000003132 T0000000000000003147 raft[2] node: 1 snapshot replying: 471
2022-04-09 07:42:54.659 [INFO] raft P0000003132 T0000000000000003147 Extend rflog from 10 to 20
这里2号节点在处理了1号节点发来的投票请求后,成为FOLLOWER并尝试补齐缺失的日志,由于此前没有LEADER,所以index尚未推进,也就直接完成了这一步骤并切换为FOLLOWER
1号节点raft日志
2022-04-09 07:42:51.573 [INFO] raft P0000005252 T0000000000000005259 raft[1] election starting: 2447 2548, term: 470, currentIdx: 20253
2022-04-09 07:42:51.573 [INFO] raft P0000005252 T0000000000000005259 raft[1] becoming candidate
2022-04-09 07:42:51.573 [INFO] raft P0000005252 T0000000000000005258 raft[1] sending requestVote to 3, currentTerm: 471, last_index: 20253, last_term: 305
2022-04-09 07:42:51.573 [INFO] raft P0000005252 T0000000000000005257 raft[1] sending requestVote to 2, currentTerm: 471, last_index: 20253, last_term: 305
2022-04-09 07:42:51.573 [ERROR] raft P0000005252 T0000000000000005258 Can't connect to DM server on '192.168.56.7' port(8341) errno(111)
2022-04-09 07:42:51.574 [INFO] raft P0000005252 T0000000000000005258 xlnk_ep2ap_conn_create fail,code(-650), site(0x7fe244002098) site_type(0) conn_type(1) address(192.168.56.7:8341) guid(688974) fail_lnk(nth:0, type:OUT)
2022-04-09 07:42:51.575 [INFO] raft P0000005252 T0000000000000005257 xlnk_ep2ap_conn_create success, site(22557168238924) site_type(0) conn_type(0) address(192.168.56.8:8340) guid(688974) n_lnk(1)
2022-04-09 07:42:51.575 [INFO] raft P0000005252 T0000000000000005257 xmal_cache_asite site(0x7fe240002098) site_type(0) address(192.168.56.8:8340) guid(688974)
2022-04-09 07:42:51.576 [INFO] raft P0000005252 T0000000000000005257 raft[1] node[2] responded to requestvote status: granted
2022-04-09 07:42:51.576 [INFO] raft P0000005252 T0000000000000005257 raft[1] becoming leader term: 471, bro_timeout: 99
2022-04-09 07:42:51.678 [ERROR] raft P0000005252 T0000000000000005258 Can't connect to DM server on '192.168.56.7' port(8341) errno(111)
2022-04-09 07:42:51.678 [INFO] raft P0000005252 T0000000000000005258 xlnk_ep2ap_conn_create fail,code(-650), site(0x7fe244002098) site_type(0) conn_type(1) address(192.168.56.7:8341) guid(688974) fail_lnk(nth:0, type:OUT)
2022-04-09 07:42:51.678 [WARNING] raft P0000005252 T0000000000000005257 raft[1] send AppendEntries(res_index: 20241) failed. Retry
2022-04-09 07:42:51.779 [ERROR] raft P0000005252 T0000000000000005258 Can't connect to DM server on '192.168.56.7' port(8341) errno(111)
2022-04-09 07:42:51.779 [INFO] raft P0000005252 T0000000000000005258 xlnk_ep2ap_conn_create fail,code(-650), site(0x7fe244002098) site_type(0) conn_type(1) address(192.168.56.7:8341) guid(688974) fail_lnk(nth:0, type:OUT)
在没有连接到2号/3号节点前,1号节点始终处于发起选举请求投票状态,在成功连接到2号节点后,由于获得了足够的票数当选为LEADER,并向FOLLOWER发送log以补齐。
1号节点由于成为了LEADER,所以向DW建立了连接来获取数据库集群节点状态信息
[root@dmdw0 log]# netstat -an|grep 52142
tcp 0 0 192.168.56.24:55558 192.168.56.8:52142 ESTABLISHED
[root@dmdw0 log]# netstat -an|grep 52141
tcp 0 0 192.168.56.24:37838 192.168.56.7:52141 ESTABLISHED
2号节点作为FOLLOWER,不会向DW建立连接
[root@dmdsc1 ~]# netstat -an|grep 52142
tcp6 0 0 :::52142 :::* LISTEN
tcp6 0 0 192.168.56.8:52142 192.168.56.24:55558 ESTABLISHED //watcher 响应LEADER连接
tcp6 0 0 10.30.5.18:52142 10.30.5.17:39666 ESTABLISHED //尝试通过MAL连接对端
[root@dmdsc1 ~]# netstat -an|grep 52141
这里可以推断出只有LEADER节点会承载实际确认监视器的工作,与未引入raft协议的单监视器工作模式相同,而非LEADER节点的作用将在接下来从端口消息进行测试
LEADER节点会向FOLLOWER节点MONITOR配置中的目的端口建立连接,那么传输了些什么呢
[root@dmdsc1 log]# netstat -an|grep 8340
tcp6 0 0 :::8340 :::* LISTEN
tcp6 0 0 192.168.56.8:8340 192.168.56.24:37338 ESTABLISHED
tcp6 0 0 192.168.56.8:8340 192.168.56.24:37336 ESTABLISHED
LEADER节点自身端口依旧没有消息
[root@dmdw0 ~]# tcpdump -s0 -e -nn -vvv -i enp0s3 port 8339 -X -xx -w 1
tcpdump: listening on enp0s3, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
而FOLLOWER端口有消息通过
[root@dmdsc1 log]# tcpdump -s0 -e -nn -vvv -i enp0s3 port 8340 -X -xx -w 1
tcpdump: listening on enp0s3, link-type EN10MB (Ethernet), capture size 262144 bytes
128 packets captured
148 packets received by filter
0 packets dropped by kernel
由于具体传输协议格式不明,在此仅展示部分具备可读性文本
08:49:33.797459 08:00:27:77:7d:83 > 08:00:27:31:de:84, ethertype IPv4 (0x0800), length 2032: (tos 0x0, ttl 64, id 372, offset 0, flags [DF], proto TCP (6), length 2018)
192.168.56.24.37336 > 192.168.56.8.8340: Flags [P.], cksum 0xf945 (incorrect -> 0x674c), seq 418:2384, ack 1, win 58, options [nop,nop,TS val 30975863 ecr 35426473], length 1966
0x0000: 4500 07e2 0174 4000 4006 4031 c0a8 3818 E....t@.@.@1..8.
0x0010: c0a8 3808 91d8 2094 ce5f b1bd 0cc0 f514 ..8......_......
0x0020: 8018 003a f945 0000 0101 080a 01d8 a777 ...:.E.........w
0x0030: 021c 90a9 ae07 0000 6f00 0100 0000 0000 ........o.......
0x0040: cb00 0000 0000 8f82 1174 2eb2 0000 d701 .........t......
0x0050: 0000 0000 0000 0100 0000 006e 0000 0000 ...........n....
0x0060: 0000 d701 0000 0000 0000 0100 0000 006e ...............n
0x0070: 0000 0000 0000 016e 0000 0000 0000 d701 .......n........
0x0080: 0000 0000 0000 0000 0000 5407 0000 4752 ..........T...GR
0x0090: 5031 0000 0000 2d73 7d56 e27f 0000 0000 P1....-s}V......
0x00a0: 0000 0000 0000 0000 0000 3807 0000 fe2d ..........8....-
0x00b0: 00d3 ea06 0047 5250 3100 0000 0000 0000 .....GRP1.......
0x00c0: 0000 0000 0079 3dd4 0132 0700 0000 0000 .....y=..2......
0x00d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00e0: 0000 0000 0000 0000 0000 0a00 4752 5031 ............GRP1
0x00f0: 5f52 545f 3031 3230 3232 2d30 342d 3039 _RT_012022-04-09
0x0100: 2030 383a 3439 3a33 3320 0000 0000 0100 .08:49:33.......
0x0110: 0000 0000 0000 0000 0000 0000 6400 0000 ............d...
0x0120: 0100 0102 0a00 3c00 0000 0a00 0100 1800 ......<.........
0x0130: 2f6f 7074 2f72 745f 3031 2f44 414d 454e /opt/rt_01/DAMEN
0x0140: 472f 646d 2e69 6e69 0000 1c00 2f6f 7074 G/dm.ini..../opt
0x0150: 2f64 7363 2f64 6d64 626d 732f 6269 6e2f /dsc/dmdbms/bin/
0x0160: 646d 7365 7276 6572 0000 0000 0000 0000 dmserver........
0x0170: 0100 0000 0000 0000 0000 0100 0000 32c4 ..............2.
0x0180: 5062 0000 0000 0000 0000 0300 0000 ffff Pb..............
0x0190: ffff 7d00 5072 696d 6172 7920 696e 7374 ..}.Primary.inst
0x01a0: 616e 6365 2847 5250 315f 5254 5f30 3129 ance(GRP1_RT_01)
0x01b0: 2061 7263 6820 7374 6174 7573 2074 6f20 .arch.status.to.
0x01c0: 696e 7374 616e 6365 2847 5250 315f 5254 instance(GRP1_RT
0x01d0: 5f30 3229 2069 7320 5641 4c49 442c 2072 _02).is.VALID,.r
0x01e0: 6563 6f76 6572 7920 6f66 2069 6e73 7461 ecovery.of.insta
0x01f0: 6e63 6528 4752 5031 5f52 545f 3032 2920 nce(GRP1_RT_02).
0x0200: 6973 206e 6f74 206e 6563 6573 7361 7279 is.not.necessary
0x0210: 2101 000a 0047 5250 315f 5254 5f30 3101 !....GRP1_RT_01.
0x0220: 0400 8d7d 0000 0000 002c 2300 0000 0000 ...}.....,#.....
0x0230: 00a4 cb00 0000 0000 002d 2300 0000 0000 .........-#.....
0x0240: 00a5 cb00 0000 0000 0000 0000 00ff ffff ................
0x0250: ffff ffff ff0c 0031 3932 2e31 3638 2e35 .......192.168.5
0x0260: 362e 3702 0001 0000 0147 5250 315f 5254 6.7......GRP1_RT
0x0270: 5f30 3200 0000 0000 0047 5250 315f 5254 _02......GRP1_RT
0x0280: 5f30 3200 0000 0000 0001 0000 0000 0000 _02.............
0x0290: 0000 0000 0000 0000 4500 7365 6e64 2061 ........E.send.a
0x02a0: 7263 6820 746f 2073 6974 6528 4752 5031 rch.to.site(GRP1
0x02b0: 5f52 545f 3032 2920 7375 6363 6573 732c _RT_02).success,
0x02c0: 2062 6567 696e 206c 736e 3a35 3231 3333 .begin.lsn:52133
0x02d0: 2c20 656e 6420 6c73 6e3a 3532 3133 3340 ,.end.lsn:52133@
0x02e0: 0000 009f 0600 0000 0000 007c 3e10 0000 ...........|>...
0x02f0: 0000 00f9 0b00 0000 0000 0064 611c 0000 ...........da...
0x0300: 0000 00e6 1600 0000 0000 0007 d450 6200 .............Pb.
0x0310: 0000 004e 0500 0004 8802 00a5 cb00 0000 ...N............
0x0320: 0000 0000 8100 0000 0000 0040 0000 0000 ...........@....
0x0330: 0000 007d 2701 0000 0000 0004 0200 0001 ...}'...........
0x0340: 0000 0095 0200 0000 0000 00a5 cb00 0000 ................
0x0350: 0000 00a5 cb00 0000 0000 001c d850 6200 .............Pb.
0x0360: 0000 001c d850 6200 0000 0000 0040 002a .....Pb......@.*
0x0370: b795 3dfd 6851 00bb 62c6 2b92 7b4e ec20 ..=.hQ..b.+.{N..
0x0380: dc67 094d 0469 fdb7 02e6 896c 23b5 c146 .g.M.i.....l#..F
0x0390: 45ab 35fc 16e9 8926 f4ac c137 6323 63bb E.5....&...7c#c.
0x03a0: ccc9 3055 9818 9460 7027 a8b8 4f49 0000 ..0U...`p'..OI..
0x03b0: 0001 0100 0001 0001 08ef 9626 fdf8 403f ...........&..@?
0x03c0: 0100 fdf8 403f 0000 2c23 0000 0000 0000 ....@?..,#......
0x03d0: a4cb 0000 0000 0000 0000 4a03 0000 0000 ..........J.....
0x03e0: 0009 000c 0047 5250 315f 5254 5f30 315f .....GRP1_RT_01_
0x03f0: 3101 0000 0000 0000 00e6 0704 0600 0000 1...............
0x0400: 0000 00e8 0300 4752 5031 5f52 545f 3031 ......GRP1_RT_01
0x0410: 0000 0000 0000 4752 5031 5f52 545f 3031 ......GRP1_RT_01
0x0420: 0000 0000 0000 fdf8 403f fdf8 403f 0100 ........@?..@?..
0x0430: ab0e 0000 0000 0000 9d53 0000 0000 0000 .........S......
0x0440: 0c00 4752 5031 5f52 545f 3031 5f32 0200 ..GRP1_RT_01_2..
0x0450: 0000 0000 0000 e607 0406 0932 2200 0000 ...........2"...
0x0460: e803 0047 5250 315f 5254 5f30 3100 0000 ...GRP1_RT_01...
0x0470: 0000 0047 5250 315f 5254 5f30 3100 0000 ...GRP1_RT_01...
0x0480: 0000 00fd f840 3ffd f840 3f01 0098 1100 .....@?..@?.....
0x0490: 0000 0000 00b7 8e00 0000 0000 000c 0047 ...............G
0x04a0: 5250 315f 5254 5f30 315f 3303 0000 0000 RP1_RT_01_3.....
0x04b0: 0000 00e6 0704 0611 0315 0000 00e8 0301 ................
0x04c0: 4752 5031 5f52 545f 3031 0000 0000 0000 GRP1_RT_01......
0x04d0: 4752 5031 5f52 545f 3031 0000 0000 0000 GRP1_RT_01......
0x04e0: fdf8 403f fdf8 403f 0100 c311 0000 0000 ..@?..@?........
0x04f0: 0000 af94 0000 0000 0000 0c00 4752 5031 ............GRP1
0x0500: 5f52 545f 3032 5f34 0400 0000 0000 0000 _RT_02_4........
0x0510: e607 0406 1108 3200 0000 e803 0147 5250 ......2......GRP
0x0520: 315f 5254 5f30 3100 0000 0000 0047 5250 1_RT_01......GRP
0x0530: 315f 5254 5f30 3200 0000 0000 00fd f840 1_RT_02........@
0x0540: 3fb4 9d53 7d01 0033 1200 0000 0000 0005 ?..S}..3........
0x0550: 9a00 0000 0000 000c 0047 5250 315f 5254 .........GRP1_RT
0x0560: 5f30 315f 3505 0000 0000 0000 00e6 0704 _01_5...........
0x0570: 0611 0b1b 0000 00e8 0301 4752 5031 5f52 ..........GRP1_R
0x0580: 545f 3032 0000 0000 0000 4752 5031 5f52 T_02......GRP1_R
0x0590: 545f 3031 0000 0000 0000 b49d 537d fdf8 T_01........S}..
0x05a0: 403f 0100 6412 0000 0000 0000 839f 0000 @?..d...........
0x05b0: 0000 0000 0c00 4752 5031 5f52 545f 3031 ......GRP1_RT_01
0x05c0: 5f36 0600 0000 0000 0000 e607 0406 1120 _6..............
0x05d0: 0d00 0000 e803 0147 5250 315f 5254 5f30 .......GRP1_RT_0
0x05e0: 3100 0000 0000 0047 5250 315f 5254 5f30 1......GRP1_RT_0
0x05f0: 3100 0000 0000 00fd f840 3ffd f840 3f01 1........@?..@?.
0x0600: 0002 1400 0000 0000 004e a700 0000 0000 .........N......
0x0610: 000c 0047 5250 315f 5254 5f30 325f 3707 ...GRP1_RT_02_7.
0x0620: 0000 0000 0000 00e6 0704 0615 3310 0000 ............3...
0x0630: 00e8 0301 4752 5031 5f52 545f 3031 0000 ....GRP1_RT_01..
0x0640: 0000 0000 4752 5031 5f52 545f 3032 0000 ....GRP1_RT_02..
0x0650: 0000 0000 fdf8 403f b49d 537d 0100 d516 ......@?..S}....
0x0660: 0000 0000 0000 03af 0000 0000 0000 0c00 ................
0x0670: 4752 5031 5f52 545f 3031 5f38 0800 0000 GRP1_RT_01_8....
0x0680: 0000 0000 e607 0407 081a 2100 0000 e803 ..........!.....
0x0690: 0147 5250 315f 5254 5f30 3200 0000 0000 .GRP1_RT_02.....
0x06a0: 0047 5250 315f 5254 5f30 3100 0000 0000 .GRP1_RT_01.....
0x06b0: 00b4 9d53 7dfd f840 3f01 0060 1a00 0000 ...S}..@?..`....
0x06c0: 0000 00b5 b800 0000 0000 000c 0047 5250 .............GRP
0x06d0: 315f 5254 5f30 315f 3909 0000 0000 0000 1_RT_01_9.......
0x06e0: 00e6 0704 0907 181d 0000 00e8 0301 4752 ..............GR
0x06f0: 5031 5f52 545f 3031 0000 0000 0000 4752 P1_RT_01......GR
0x0700: 5031 5f52 545f 3031 0000 0000 0000 fdf8 P1_RT_01........
0x0710: 403f fdf8 403f 0100 961c 0000 0000 0000 @?..@?..........
0x0720: c9bf 0000 0000 0000 0000 0200 0101 7700 ..............w.
0x0730: 0100 012e b200 0000 0000 0032 3032 322d ...........2022-
0x0740: 3034 2d30 3920 3037 3a34 323a 3531 2000 04-09.07:42:51..
0x0750: 0000 003a 3a66 6666 663a 3139 322e 3136 ...::ffff:192.16
0x0760: 382e 3536 2e32 3400 0000 0000 0000 0000 8.56.24.........
0x0770: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0780: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0790: 0000 0012 0044 4d4d 4f4e 4954 4f52 5b34 .....DMMONITOR[4
0x07a0: 2e30 5d20 5638 0a00 0000 002f 0049 6e73 .0].V8...../.Ins
0x07b0: 7461 6e63 6528 4752 5031 5f52 545f 3031 tance(GRP1_RT_01
0x07c0: 2920 6973 2061 6c72 6561 6479 2069 6e20 ).is.already.in.
0x07d0: 4f70 656e 2073 7461 7475 7321 0000 0000 Open.status!....
0x07e0: 0000
大体可以看出是LEADER向FOLLOWER传输了自己从DW处获取的数据库集群节点最后状态信息
一点题外话
在这个可读信息的数据标准包头后的正文头部,有一些有趣的信息,如下所示
5407 0000 4752 ..........T...GR
0x0090: 5031 0000 0000 2d73 7d56 e27f 0000 0000 P1....-s}V......
0x00a0: 0000 0000 0000 0000 0000 3807 0000 fe2d ..........8....-
0x00b0: 00d3 ea06 0047 5250 3100 0000 0000 0000 .....GRP1.......
0x00c0: 0000 0000 0079 3dd4 0132 0700 0000 0000 .....y=..2......
这个信息中我们可以看出MAL在传输协议正文开始,会包含下列信息 47525031 即MAL中的GROUP SECTION NAME: GRP1 d3ea06 即MAL中的MON_INST_OGUID: 453331 (little endian) 所以在MAL协议中标识其节点身份的方法,其实是在协议数据头部通过GROUP NAME和INST_OGUID共同进行的。
三个监视器
启动3号监视器来观察
show state
2022-04-09 08:39:28
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2559 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON3 Active 3 FOLLOWER 192.168.56.7 8341
MON1 Active 1 LEADER 192.168.56.24 8339
MON2 Active 2 NOT LEADER 192.168.56.8 8340
#--------------------------------------------------------------------------------#
有趣的是3号监视器启动后,2号监视器变为了NOT LEADER状态,而3号监视器成为了FOLLOWER
2节点raft日志并没有任何信息,直接变为了NOT LEADER,此处对于该角色的定义和判断逻辑尚需要进一步学习,目前尚不明确,但并不影响raft选举工作流程的说明
3号节点raft日志
本节点raft,其实也是follower
2022-04-09 08:33:50.891 [INFO] raft P0000008170 T0000000000000008170 ECS EP XSITE POOL : guid [940642] HB interval [60s]
2022-04-09 08:33:50.893 [INFO] raft P0000008170 T0000000000000008170 ECS AP XSITE POOL : guid [456120] HB interval [60s]
2022-04-09 08:33:50.974 [INFO] raft P0000008170 T0000000000000008174 xmal_cache_esite site(0x7fe994002098) site_type(1) esite_guid(688974)
2022-04-09 08:33:50.974 [INFO] raft P0000008170 T0000000000000008174 xmal_ep2ap_conn_process success, inout_type(1) esite_guid(688974) asite(22557168268799) asite_type(0)
2022-04-09 08:33:50.975 [INFO] raft P0000008170 T0000000000000008174 xmal_ep2ap_conn_process success, inout_type(0) esite_guid(688974) asite(22557168268799) asite_type(0)
2022-04-09 08:33:50.976 [INFO] raft P0000008170 T0000000000000008186 raft[3] becoming follower
2022-04-09 08:33:50.976 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.073 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.172 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.272 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.371 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.471 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.570 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.669 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.768 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.867 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:51.967 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.067 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.166 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.266 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.365 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.465 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.566 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.666 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 26290
2022-04-09 08:33:52.765 [INFO] raft P0000008170 T0000000000000008186 raft[3] raft_process_snapshot from node[1]
2022-04-09 08:33:52.765 [INFO] raft P0000008170 T0000000000000008186 raft[3] becoming follower
2022-04-09 08:33:52.766 [INFO] raft P0000008170 T0000000000000008186 raft[3] node: 1 snapshot replying: 471
2022-04-09 08:33:57.667 [INFO] raft P0000008170 T0000000000000008186 Extend rflog from 10 to 20
2022-04-09 09:33:24.967 [INFO] raft P0000008170 T0000000000000008177 raft[3] election starting: 2559 954977, term: 471, currentIdx: 31490
2022-04-09 09:33:24.967 [INFO] raft P0000008170 T0000000000000008177 raft[3] becoming candidate
2022-04-09 09:33:24.967 [INFO] raft P0000008170 T0000000000000008175 raft[3] sending requestVote to 1, currentTerm: 472, last_index: 31490, last_term: 471
2022-04-09 09:33:24.968 [INFO] raft P0000008170 T0000000000000008176 raft[3] sending requestVote to 2, currentTerm: 472, last_index: 31490, last_term: 471
2022-04-09 09:33:24.990 [INFO] raft P0000008170 T0000000000000008175 xlnk_ep2ap_conn_create success, site(35089882808321) site_type(0) conn_type(0) address(192.168.56.24:8339) guid(940642) n_lnk(1)
2022-04-09 09:33:24.990 [INFO] raft P0000008170 T0000000000000008175 xmal_cache_asite site(0x7fe99a7057e8) site_type(0) address(192.168.56.24:8339) guid(940642)
2022-04-09 09:33:24.992 [INFO] raft P0000008170 T0000000000000008175 raft[3] node[1] responded to requestvote status: not granted
2022-04-09 09:33:24.992 [INFO] raft P0000008170 T0000000000000008176 xlnk_ep2ap_conn_create success, site(35089882808322) site_type(0) conn_type(0) address(192.168.56.8:8340) guid(940642) n_lnk(1)
2022-04-09 09:33:24.992 [INFO] raft P0000008170 T0000000000000008176 xmal_cache_asite site(0x7fe99a6047e8) site_type(0) address(192.168.56.8:8340) guid(940642)
2022-04-09 09:33:25.000 [INFO] raft P0000008170 T0000000000000008176 raft[3] node[2] responded to requestvote status: not granted
2022-04-09 09:33:25.012 [INFO] raft P0000008170 T0000000000000008186 raft[3] becoming follower
2022-04-09 09:33:25.012 [ERROR] raft P0000008170 T0000000000000008186 raft[3] AppendEntries no log at prev_idx 31491
其实看起来和2号节点流程几乎相同,唯一区别是在于此时3个节点都是通的,所以会收到来1和2两个节点的投票结果,其中1返回 not granted,2 返回 not granted,最终成为FOLLOWER
3节点端口状态
[root@dmdsc0 log]# netstat -an|grep 52142
tcp 0 0 10.30.5.17:39666 10.30.5.18:52142 ESTABLISHED
[root@dmdsc0 log]# netstat -an|grep 52141
tcp6 0 0 :::52141 :::* LISTEN
tcp6 0 0 192.168.56.7:52141 192.168.56.24:37838 ESTABLISHED
再次证明非LEADER节点是不会与DW建立连接的
同样对3节点MONITOR配置端口抓包
[root@dmdsc0 log]# tcpdump -s0 -e -nn -vvv -i enp0s3 port 8341 -X -xx -w 1
tcpdump: listening on enp0s3, link-type EN10MB (Ethernet), capture size 262144 bytes
176 packets captured
204 packets received by filter
0 packets dropped by kernel
08:46:00.959030 08:00:27:77:7d:83 > 08:00:27:0d:d8:6b, ethertype IPv4 (0x0800), length 2032: (tos 0x0, ttl 64, id 42008, offset 0, flags [DF], proto TCP (6), length 2018)
192.168.56.24.58430 > 192.168.56.7.8341: Flags [P.], cksum 0xf944 (incorrect -> 0x9a4f), seq 9524:11490, ack 1, win 58, options [nop,nop,TS val 30763267 ecr 35247702], length 1966
0x0000: 4500 07e2 a418 4000 4006 9d8d c0a8 3818 E.....@.@.....8.
0x0010: c0a8 3807 e43e 2095 81dc 3260 2e5b ea6f ..8..>....2`.[.o
0x0020: 8018 003a f944 0000 0101 080a 01d5 6903 ...:.D........i.
0x0030: 0219 d656 ae07 0000 6f00 0100 0000 0000 ...V....o.......
0x0040: b200 0000 0000 8a76 7781 2eb2 0000 d701 .......vw.......
0x0050: 0000 0000 0000 0100 0000 5a6c 0000 0000 ..........Zl....
0x0060: 0000 d701 0000 0000 0000 0100 0000 5a6c ..............Zl
0x0070: 0000 0000 0000 5b6c 0000 0000 0000 d701 ......[l........
0x0080: 0000 0000 0000 0000 0000 5407 0000 4752 ..........T...GR
0x0090: 5031 0000 0000 2d73 7d56 e27f 0000 0000 P1....-s}V......
0x00a0: 0000 0000 0000 0000 0000 3807 0000 902d ..........8....-
0x00b0: 00d3 ea06 0047 5250 3100 0000 0000 0000 .....GRP1.......
0x00c0: 0000 0000 0000 3bd4 0132 0700 0000 0000 ......;..2......
0x00d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00e0: 0000 0000 0000 0000 0000 0a00 4752 5031 ............GRP1
0x00f0: 5f52 545f 3031 3230 3232 2d30 342d 3039 _RT_012022-04-09
0x0100: 2030 383a 3436 3a30 3020 0000 0000 0100 .08:46:00.......
0x0110: 0000 0000 0000 0000 0000 0000 6400 0000 ............d...
0x0120: 0100 0102 0a00 3c00 0000 0a00 0100 1800 ......<.........
0x0130: 2f6f 7074 2f72 745f 3031 2f44 414d 454e /opt/rt_01/DAMEN
0x0140: 472f 646d 2e69 6e69 0000 1c00 2f6f 7074 G/dm.ini..../opt
0x0150: 2f64 7363 2f64 6d64 626d 732f 6269 6e2f /dsc/dmdbms/bin/
0x0160: 646d 7365 7276 6572 0000 0000 0000 0000 dmserver........
0x0170: 0100 0000 0000 0000 0000 0100 0000 32c4 ..............2.
0x0180: 5062 0000 0000 0000 0000 0300 0000 ffff Pb..............
0x0190: ffff 7d00 5072 696d 6172 7920 696e 7374 ..}.Primary.inst
0x01a0: 616e 6365 2847 5250 315f 5254 5f30 3129 ance(GRP1_RT_01)
0x01b0: 2061 7263 6820 7374 6174 7573 2074 6f20 .arch.status.to.
0x01c0: 696e 7374 616e 6365 2847 5250 315f 5254 instance(GRP1_RT
0x01d0: 5f30 3229 2069 7320 5641 4c49 442c 2072 _02).is.VALID,.r
0x01e0: 6563 6f76 6572 7920 6f66 2069 6e73 7461 ecovery.of.insta
0x01f0: 6e63 6528 4752 5031 5f52 545f 3032 2920 nce(GRP1_RT_02).
0x0200: 6973 206e 6f74 206e 6563 6573 7361 7279 is.not.necessary
0x0210: 2101 000a 0047 5250 315f 5254 5f30 3101 !....GRP1_RT_01.
0x0220: 0400 8d7d 0000 0000 00e6 2200 0000 0000 ...}......".....
0x0230: 005e cb00 0000 0000 00e6 2200 0000 0000 .^........".....
0x0240: 005f cb00 0000 0000 0000 0000 00ff ffff ._..............
0x0250: ffff ffff ff0c 0031 3932 2e31 3638 2e35 .......192.168.5
0x0260: 362e 3702 0001 0000 0147 5250 315f 5254 6.7......GRP1_RT
0x0270: 5f30 3200 0000 0000 0047 5250 315f 5254 _02......GRP1_RT
0x0280: 5f30 3200 0000 0000 0001 0000 0000 0000 _02.............
0x0290: 0000 0000 0000 0000 4500 7365 6e64 2061 ........E.send.a
0x02a0: 7263 6820 746f 2073 6974 6528 4752 5031 rch.to.site(GRP1
0x02b0: 5f52 545f 3032 2920 7375 6363 6573 732c _RT_02).success,
0x02c0: 2062 6567 696e 206c 736e 3a35 3230 3632 .begin.lsn:52062
0x02d0: 2c20 656e 6420 6c73 6e3a 3532 3036 3240 ,.end.lsn:52062@
0x02e0: 0000 0058 0600 0000 0000 0060 af0f 0000 ...X.......`....
0x02f0: 0000 00b2 0b00 0000 0000 0092 281b 0000 ............(...
0x0300: 0000 00e6 1600 0000 0000 0007 d450 6200 .............Pb.
0x0310: 0000 004e 0500 0004 8802 005e cb00 0000 ...N.......^....
0x0320: 0000 0000 8100 0000 0000 0040 0000 0000 ...........@....
0x0330: 0000 002f c300 0000 0000 0004 0200 0001 .../............
0x0340: 0000 00ce 0300 0000 0000 005e cb00 0000 ...........^....
0x0350: 0000 005e cb00 0000 0000 0045 d750 6200 ...^.......E.Pb.
0x0360: 0000 0045 d750 6200 0000 0000 0040 002a ...E.Pb......@.*
0x0370: b795 3dfd 6851 00bb 62c6 2b92 7b4e ec20 ..=.hQ..b.+.{N..
0x0380: dc67 094d 0469 fdb7 02e6 896c 23b5 c146 .g.M.i.....l#..F
0x0390: 45ab 35fc 16e9 8926 f4ac c137 6323 63bb E.5....&...7c#c.
0x03a0: ccc9 3055 9818 9460 7027 a8b8 4f49 0000 ..0U...`p'..OI..
0x03b0: 0001 0100 0001 0001 08ef 9626 fdf8 403f ...........&..@?
0x03c0: 0100 fdf8 403f 0000 e622 0000 0000 0000 ....@?..."......
0x03d0: 5ecb 0000 0000 0000 0000 4a03 0000 0000 ^.........J.....
0x03e0: 0009 000c 0047 5250 315f 5254 5f30 315f .....GRP1_RT_01_
0x03f0: 3101 0000 0000 0000 00e6 0704 0600 0000 1...............
0x0400: 0000 00e8 0300 4752 5031 5f52 545f 3031 ......GRP1_RT_01
0x0410: 0000 0000 0000 4752 5031 5f52 545f 3031 ......GRP1_RT_01
0x0420: 0000 0000 0000 fdf8 403f fdf8 403f 0100 ........@?..@?..
0x0430: ab0e 0000 0000 0000 9d53 0000 0000 0000 .........S......
0x0440: 0c00 4752 5031 5f52 545f 3031 5f32 0200 ..GRP1_RT_01_2..
0x0450: 0000 0000 0000 e607 0406 0932 2200 0000 ...........2"...
0x0460: e803 0047 5250 315f 5254 5f30 3100 0000 ...GRP1_RT_01...
0x0470: 0000 0047 5250 315f 5254 5f30 3100 0000 ...GRP1_RT_01...
0x0480: 0000 00fd f840 3ffd f840 3f01 0098 1100 .....@?..@?.....
0x0490: 0000 0000 00b7 8e00 0000 0000 000c 0047 ...............G
0x04a0: 5250 315f 5254 5f30 315f 3303 0000 0000 RP1_RT_01_3.....
0x04b0: 0000 00e6 0704 0611 0315 0000 00e8 0301 ................
0x04c0: 4752 5031 5f52 545f 3031 0000 0000 0000 GRP1_RT_01......
0x04d0: 4752 5031 5f52 545f 3031 0000 0000 0000 GRP1_RT_01......
0x04e0: fdf8 403f fdf8 403f 0100 c311 0000 0000 ..@?..@?........
0x04f0: 0000 af94 0000 0000 0000 0c00 4752 5031 ............GRP1
0x0500: 5f52 545f 3032 5f34 0400 0000 0000 0000 _RT_02_4........
0x0510: e607 0406 1108 3200 0000 e803 0147 5250 ......2......GRP
0x0520: 315f 5254 5f30 3100 0000 0000 0047 5250 1_RT_01......GRP
0x0530: 315f 5254 5f30 3200 0000 0000 00fd f840 1_RT_02........@
0x0540: 3fb4 9d53 7d01 0033 1200 0000 0000 0005 ?..S}..3........
0x0550: 9a00 0000 0000 000c 0047 5250 315f 5254 .........GRP1_RT
0x0560: 5f30 315f 3505 0000 0000 0000 00e6 0704 _01_5...........
0x0570: 0611 0b1b 0000 00e8 0301 4752 5031 5f52 ..........GRP1_R
0x0580: 545f 3032 0000 0000 0000 4752 5031 5f52 T_02......GRP1_R
0x0590: 545f 3031 0000 0000 0000 b49d 537d fdf8 T_01........S}..
0x05a0: 403f 0100 6412 0000 0000 0000 839f 0000 @?..d...........
0x05b0: 0000 0000 0c00 4752 5031 5f52 545f 3031 ......GRP1_RT_01
0x05c0: 5f36 0600 0000 0000 0000 e607 0406 1120 _6..............
0x05d0: 0d00 0000 e803 0147 5250 315f 5254 5f30 .......GRP1_RT_0
0x05e0: 3100 0000 0000 0047 5250 315f 5254 5f30 1......GRP1_RT_0
0x05f0: 3100 0000 0000 00fd f840 3ffd f840 3f01 1........@?..@?.
0x0600: 0002 1400 0000 0000 004e a700 0000 0000 .........N......
0x0610: 000c 0047 5250 315f 5254 5f30 325f 3707 ...GRP1_RT_02_7.
0x0620: 0000 0000 0000 00e6 0704 0615 3310 0000 ............3...
0x0630: 00e8 0301 4752 5031 5f52 545f 3031 0000 ....GRP1_RT_01..
0x0640: 0000 0000 4752 5031 5f52 545f 3032 0000 ....GRP1_RT_02..
0x0650: 0000 0000 fdf8 403f b49d 537d 0100 d516 ......@?..S}....
0x0660: 0000 0000 0000 03af 0000 0000 0000 0c00 ................
0x0670: 4752 5031 5f52 545f 3031 5f38 0800 0000 GRP1_RT_01_8....
0x0680: 0000 0000 e607 0407 081a 2100 0000 e803 ..........!.....
0x0690: 0147 5250 315f 5254 5f30 3200 0000 0000 .GRP1_RT_02.....
0x06a0: 0047 5250 315f 5254 5f30 3100 0000 0000 .GRP1_RT_01.....
0x06b0: 00b4 9d53 7dfd f840 3f01 0060 1a00 0000 ...S}..@?..`....
0x06c0: 0000 00b5 b800 0000 0000 000c 0047 5250 .............GRP
0x06d0: 315f 5254 5f30 315f 3909 0000 0000 0000 1_RT_01_9.......
0x06e0: 00e6 0704 0907 181d 0000 00e8 0301 4752 ..............GR
0x06f0: 5031 5f52 545f 3031 0000 0000 0000 4752 P1_RT_01......GR
0x0700: 5031 5f52 545f 3031 0000 0000 0000 fdf8 P1_RT_01........
0x0710: 403f fdf8 403f 0100 961c 0000 0000 0000 @?..@?..........
0x0720: c9bf 0000 0000 0000 0000 0200 0101 7700 ..............w.
0x0730: 0100 012e b200 0000 0000 0032 3032 322d ...........2022-
0x0740: 3034 2d30 3920 3037 3a34 323a 3531 2000 04-09.07:42:51..
0x0750: 0000 003a 3a66 6666 663a 3139 322e 3136 ...::ffff:192.16
0x0760: 382e 3536 2e32 3400 0000 0000 0000 0000 8.56.24.........
0x0770: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0780: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0790: 0000 0012 0044 4d4d 4f4e 4954 4f52 5b34 .....DMMONITOR[4
0x07a0: 2e30 5d20 5638 0a00 0000 002f 0049 6e73 .0].V8...../.Ins
0x07b0: 7461 6e63 6528 4752 5031 5f52 545f 3031 tance(GRP1_RT_01
0x07c0: 2920 6973 2061 6c72 6561 6479 2069 6e20 ).is.already.in.
0x07d0: 4f70 656e 2073 7461 7475 7321 0000 0000 Open.status!....
0x07e0: 0000 ..
也会得到来自LEADER发送的状态信息,与之前相同,不再赘述。
整体状态
此时会在3个 节点看到不同的状态信息
1号节点
show state
2022-04-10 06:33:16
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2623 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON1 Active 1 LEADER 192.168.56.24 8339
MON2 Active 2 NOT LEADER 192.168.56.8 8340
MON3 Active 3 NOT LEADER 192.168.56.7 8341
#--------------------------------------------------------------------------------#
2号节点
show state
2022-04-10 06:33:27
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2419 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON2 Active 2 FOLLOWER 192.168.56.8 8340
MON1 Active 1 LEADER 192.168.56.24 8339
MON3 Active 3 NOT LEADER 192.168.56.7 8341
#--------------------------------------------------------------------------------#
3号节点
show state
2022-04-10 06:33:32
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 1674 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON3 Active 3 FOLLOWER 192.168.56.7 8341
MON1 Active 1 LEADER 192.168.56.24 8339
MON2 Active 2 NOT LEADER 192.168.56.8 8340
#--------------------------------------------------------------------------------#
此时并没有出现常规raft的2个FOLLOWER,而是始终存在NOT LEADER的角色,除了LEADER达成一致外,对于FOLLOWER和NOT LEADER的角色是存在差异的,除了LEADER外,所有FOLLOWER都将票投给了自己,在MONITOR端口上抓包,实际上FOLLOWER和NOT LEADER都会收到来自LEADER的信息,在标准raft协议的角色上似乎都是按FOLLOWER处理
关闭1号节点
2号节点从NOT LEADER变为FOLLOWER
show state
2022-04-10 06:43:40
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2356 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON2 Active 2 FOLLOWER 192.168.56.8 8340
MON1 Active 1 NOT LEADER 192.168.56.24 8339
MON3 Active 3 LEADER 192.168.56.7 8341
#--------------------------------------------------------------------------------#
3号节点从FOLLOWER变为LEADER
show state
2022-04-10 06:43:56
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2652 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON3 Active 3 LEADER 192.168.56.7 8341
MON1 Active 1 NOT LEADER 192.168.56.24 8339
MON2 Active 2 NOT LEADER 192.168.56.8 8340
#--------------------------------------------------------------------------------#
恢复1号节点
1号节点成为了新的FOLLOWER
show state
2022-04-10 06:54:04
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2395 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON1 Active 1 FOLLOWER 192.168.56.24 8339
MON2 Active 2 NOT LEADER 192.168.56.8 8340
MON3 Active 3 LEADER 192.168.56.7 8341
#--------------------------------------------------------------------------------#
2号节点成为NOT LEADER
show state
2022-04-10 06:54:11
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2356 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON2 Active 2 FOLLOWER 192.168.56.8 8340
MON1 Active 1 NOT LEADER 192.168.56.24 8339
MON3 Active 3 LEADER 192.168.56.7 8341
#--------------------------------------------------------------------------------#
3号节点保持LEADER状态
show state
2022-04-10 06:54:15
#--------------------------------------------------------------------------------#
GET MONITOR STATE FROM MONITOR SYSTEM, THE FIRST LINE IS SELF INFO.
MON_BRO_INTERVAL: 99 ms, MON_VOTE_INTERVAL: 2652 ms
MON_NAME MON_STATE ID MON_ROLE MON_IP MON_PORT
MON3 Active 3 LEADER 192.168.56.7 8341
MON1 Active 1 NOT LEADER 192.168.56.24 8339
MON2 Active 2 NOT LEADER 192.168.56.8 8340
#--------------------------------------------------------------------------------#
一点推论
从上面的现象来观察,可以推断出一些DM8的监视器raft协议中的特点
- 遵循标准raft协议的实现以及基本算法
- 除了标准raft协议的三种角色外还加入了一个名为NOT LEADER的角色,其行为与FOLLOWER无异,被动接收LEADER发来的日志信息,并且同等参与投票
- NOT LEADER PROMOTE时似乎只能提升为FOLLOWER而不会变为LEADER(顾名思义??)
- 最新加入的有效节点将会变为FOLLOWER而将上一任FOLLOWER降级为NOT LEADER(原因不明)
总结
基于本文的测试内容,对于DM8监视器raft协议工作流程可以粗略概括为: 在不更改原单一MONITOR连接DW工作模式的前提下,将MONITOR层增加raft选举流程,通过标准raft协议的LOG APPEND流程定向从LEADER流入FOLLOWER/NOT LEADER,一方面实现FOLLOWER对LEADER存活的监控,以便任期超时无LEADER发起新一轮的投票选举,另一方面基于最后的LOG能够在MONITOR故障接管时作为上一状态与当前从DW获取的信息进行比对,形成有效切换结论。
遗留问题
目前对于NOT LEADER角色的成因和定义尚不明确,应当与raft协议在节点中定义的某种RANK机制有关,后续如果有机会搞清楚这部分会再进行补充
达梦云适配技术社区 https://eco.dameng.com/