背景:公司缓存管理云平台的实例克隆功能(机器停机后,其他服务器启动停机实例,然后与原集群进行主从关系维护),复制后出现集群混乱,初步定位为集群离线实例。forget这个例子仍然存在于下线节点的集群中ip:port信息,但状态是fail状态,当该ip当港口再次启动并加入其他集群时,原集群自动召回,导致集群混乱。
有几个问题需要确认:
- 集群是否会有脏数据(指不同集群冲突槽的数据是否混淆)
- 集群混乱的原因是什么?
- 集群混乱后,槽位和数据覆盖的逻辑
一、环境准备
准备4台CENTOS虚拟机服务器,ip设置如下。
10.4.7.221 10.4.7.222 10.4.7.223 10.4.7.224
集群规划如下,三个集群分别部署在221、222、223台机器上,224台机器作为克隆机器。集群应有相应的规则,如 以8001为主,以8011为主,一方面方便编写启动和集群匹配的脚本,另一方面方便集群混乱后查看集群节点。
A 3主3从
10.4.7.221:8001 10.4.7.222:8011 10.4.7.222:8002 10.4.7.223:8012 10.4.7.223:8003 10.4.7.221:8013
B 4主4从
10.4.7.221:9001 10.4.7.222:9011 10.4.7.222:9002 10.4.7.223:9012 10.4.7.223:9003 10.4.7.221:9013 10.4.7.221:9004 10.4.7.222:9014
C 5主5从
10.4.7.221:7001 10.4.7.222:7011 10.4.7.222:7002 10.4.7.223:7012 10.4.7.223:7003 10.4.7.221:7013 10.4.7.221:7004 10.4.7.222:7014 10.4.7.222:7005 10.4.7.223:7015
目录规划
日志文件目录 /app/cachecloud/logs 配置文件目录 /app/cachecloud/conf 数据文件目录 /app/cachecloud/data
配置文件模板,创建所有计划实例的配置文件/app/cachecloud/conf
目录下,redis省略安装编译步骤。
port 8001 cluster-enabled yes cluster-node-timeout 15000 cluster-config-file "nodes-8001.conf" bind 0.0.0.0 daemonize yes logfile "/app/cachecloud/logs/redis-a-8001.log" dir /app/cachecloud/data dbfilename dump-8001.rdb
分别调整配置
#关闭linux大页设置 echo never > /sys/kernel/mm/transparent_hugepage/enabled #无论当前的内存状态如何,内核允许分配所有的物理内存。 echo 1 > /proc/sys/vm/overcommit_memory sysctl vm.overcommit_memory=1 #关闭防火墙(否则集群通信失败很重要) systemctl stop firewalld
重新配置集群,删除配置(kill redis过程,删除缓存集群nodes配置文件、日志文件、数据文件)
cd /app/cachecloud/conf;ps -ef | grep redis-server | grep -v grep | awk '{print $2}' | xargs kill;rm -f dump.rdb nodes-*.conf;rm -f /app/cachecloud/logs/*.log;rm -f /app/cachecloud/data/*.rdb
221、222、223分别执行以下启动实例
cd /app/cachecloud/conf;for i in `ls -l | grep redis | awk '{print $9}'`;do echo `redis-server $i`; done;
221服务器执行meet集群节点命令
#集群A redis-cli -h 10.4.7.221 -p 8001 cluster meet 10.4.7.221 8001
redis-cli -h 10.4.7.221 -p 8001 cluster meet 10.4.7.222 8011
redis-cli -h 10.4.7.221 -p 8001 cluster meet 10.4.7.222 8002
redis-cli -h 10.4.7.221 -p 8001 cluster meet 10.4.7.223 8012
redis-cli -h 10.4.7.221 -p 8001 cluster meet 10.4.7.223 8003
redis-cli -h 10.4.7.221 -p 8001 cluster meet 10.4.7.221 8013
#集群B
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.221 9001
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.222 9011
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.222 9002
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.223 9012
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.223 9003
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.221 9013
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.221 9004
redis-cli -h 10.4.7.221 -p 9001 cluster meet 10.4.7.222 9014
集群C
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.221 7001
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.222 7011
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.222 7002
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.223 7012
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.223 7003
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.221 7013
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.221 7004
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.222 7014
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.222 7005
redis-cli -h 10.4.7.221 -p 7001 cluster meet 10.4.7.223 7015
给三个集群分配槽位,集群实例分配的槽位细碎一些
redis-cli -h 10.4.7.221 -p 8001 cluster addslots {
0..5555}
redis-cli -h 10.4.7.222 -p 8002 cluster addslots {
5556..11112}
redis-cli -h 10.4.7.223 -p 8003 cluster addslots {
11113..16383}
redis-cli -h 10.4.7.221 -p 9001 cluster addslots {
0..4096}
redis-cli -h 10.4.7.222 -p 9002 cluster addslots {
4097..8192}
redis-cli -h 10.4.7.223 -p 9003 cluster addslots {
8193..12288}
redis-cli -h 10.4.7.221 -p 9004 cluster addslots {
12289..16383}
redis-cli -h 10.4.7.221 -p 7001 cluster addslots {
0..3278}
redis-cli -h 10.4.7.222 -p 7002 cluster addslots {
3279..6556}
redis-cli -h 10.4.7.223 -p 7003 cluster addslots {
6557..9834}
redis-cli -h 10.4.7.221 -p 7004 cluster addslots {
9835..13112}
redis-cli -h 10.4.7.222 -p 7005 cluster addslots {
13113..16383}
221执行主从对应
masternodeid=`redis-cli -h 10.4.7.221 -p 8001 cluster nodes | grep :8001 | awk '{print $1}'`;redis-cli -h 10.4.7.222 -p 8011 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 8001 cluster nodes | grep :8002 | awk '{print $1}'`;redis-cli -h 10.4.7.223 -p 8012 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 8001 cluster nodes | grep :8003 | awk '{print $1}'`;redis-cli -h 10.4.7.221 -p 8013 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 9001 cluster nodes | grep :9001 | awk '{print $1}'`;redis-cli -h 10.4.7.222 -p 9011 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 9001 cluster nodes | grep :9002 | awk '{print $1}'`;redis-cli -h 10.4.7.223 -p 9012 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 9001 cluster nodes | grep :9003 | awk '{print $1}'`;redis-cli -h 10.4.7.221 -p 9013 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 9001 cluster nodes | grep :9004 | awk '{print $1}'`;redis-cli -h 10.4.7.222 -p 9014 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 7001 cluster nodes | grep :7001 | awk '{print $1}'`;redis-cli -h 10.4.7.222 -p 7011 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 7001 cluster nodes | grep :7002 | awk '{print $1}'`;redis-cli -h 10.4.7.223 -p 7012 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 7001 cluster nodes | grep :7003 | awk '{print $1}'`;redis-cli -h 10.4.7.221 -p 7013 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 7001 cluster nodes | grep :7004 | awk '{print $1}'`;redis-cli -h 10.4.7.222 -p 7014 cluster replicate $masternodeid;
masternodeid=`redis-cli -h 10.4.7.221 -p 7001 cluster nodes | grep :7005 | awk '{print $1}'`;redis-cli -h 10.4.7.223 -p 7015 cluster replicate $masternodeid;
集群准备好以后,查看集群的状态,每次执行操作以后都要记录一下,便于比对实例的变更状态。
redis-cli -p 8001 cluster nodes
redis-cli -p 9001 cluster nodes
redis-cli -p 7001 cluster nodes
[root@redis-7-221 conf]# redis-cli -p 8001 cluster nodes
b03eb3e887c67a6305c7a205d5bfc4d6e222dd74 10.4.7.222:8002 master - 0 1648189621790 3 connected 5556-11112
ee340787e8a39a59f9169386bef30d70d2f1fb62 10.4.7.222:8011 slave b850d083307af116b1f61401cb7d01e2d162ad38 0 1648189623797 1 connected
b850d083307af116b1f61401cb7d01e2d162ad38 10.4.7.221:8001 myself,master - 0 0 1 connected 0-5555
3e42b6080162017b33ae87caac67daad9eb77046 10.4.7.221:8013 slave e656d5a698046fcc4ccd341f0b9bdaf172f2d5ed 0 1648189624800 5 connected
0afd6dd77395042026c1e358af7dd535fd4a5f0d 10.4.7.223:8012 slave b03eb3e887c67a6305c7a205d5bfc4d6e222dd74 0 1648189623295 4 connected
e656d5a698046fcc4ccd341f0b9bdaf172f2d5ed 10.4.7.223:8003 master - 0 1648189622794 5 connected 11113-16383
[root@redis-7-221 conf]# redis-cli -p 9001 cluster nodes
cc4191deafddf42db21ee3cdab4f906931237ca9 10.4.7.223:9012 slave 5f201e00106a512ab3a3d73455ee1269b367b204 0 1648189622893 5 connected
37dc860d2cc031c7d0a052bb34c07ad1625cbe8f 10.4.7.222:9011 slave 335a0cb8d9d82a764a19bf71da6379b73c703e95 0 1648189619884 7 connected
5f201e00106a512ab3a3d73455ee1269b367b204 10.4.7.222:9002 master - 0 1648189620888 4 connected 4097-8192
577e5ea9c7f56c3767cfcfa23905742d97448b02 10.4.7.221:9013 slave ca7e61ba84d005dc32570377a7e57cdc2b1ea395 0 1648189617878 3 connected
097271ce6ad0d7c5b4e5b80a645058bd2bb0099f 10.4.7.221:9004 master - 0 1648189623898 6 connected 12289-16383
dcf4c5f00f922e67094ebd0bf0cfe61701c0c3fe 10.4.7.222:9014 slave 097271ce6ad0d7c5b4e5b80a645058bd2bb0099f 0 1648189621891 6 connected
ca7e61ba84d005dc32570377a7e57cdc2b1ea395 10.4.7.223:9003 master - 0 1648189622392 2 connected 8193-12288
335a0cb8d9d82a764a19bf71da6379b73c703e95 10.4.7.221:9001 myself,master - 0 0 1 connected 0-4096
[root@redis-7-221 conf]# redis-cli -p 7001 cluster nodes
87a0bf02cbc2e7eebee500840a346dc7d4fe2603 10.4.7.222:7002 master - 0 1648189621892 4 connected 3279-6556
e4aa6c4158b31d8d5f1aa048571f725522d521d7 10.4.7.221:7001 myself,master - 0 0 0 connected 0-3278
387564ccc40c9d38a335f9af6276fa4545456c3c 10.4.7.222:7014 slave 1f984fd46a8e343d9c421fbab04f1f5696e059d4 0 1648189623900 9 connected
94985c7576c2a0ab05e3cb3bd3b63c75b137ddf3 10.4.7.223:7003 master - 0 1648189624904 6 connected 6557-9834
955014596d1929366f655ad6a020195432da917e 10.4.7.221:7013 slave 94985c7576c2a0ab05e3cb3bd3b63c75b137ddf3 0 1648189620887 6 connected
033229faffaede7d873acdc6d7c20a580ee9aa66 10.4.7.223:7015 slave 3df4258d28c93acb80444e56a330c5331681d413 0 1648189620387 7 connected
3db9f8bb3bbadd6ddaacf61d9d2b0fd6cd883e54 10.4.7.223:7012 slave 87a0bf02cbc2e7eebee500840a346dc7d4fe2603 0 1648189622393 5 connected
1f984fd46a8e343d9c421fbab04f1f5696e059d4 10.4.7.221:7004 master - 0 1648189622894 9 connected 9835-13112
3df4258d28c93acb80444e56a330c5331681d413 10.4.7.222:7005 master - 0 1648189622394 3 connected 13113-16383
966e18cfe9f02771752bac772918ec872d00bfb9 10.4.7.222:7011 slave e4aa6c4158b31d8d5f1aa048571f725522d521d7 0 1648189619886 2 connected
执行python脚本向3个集群中各插入100w条数据,每个集群的数据要有自己的规则,便于查看集群混乱后的数据分布状况。
import redis
import time
import traceback
import random
from time import ctime, sleep
from rediscluster import StrictRedisCluster
# 生成随机字符串
def generate_random_str(randomlength=16):
random_str = ''
base_str = 'ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz0123456789'
length = len(base_str) - 1
for i in range(randomlength):
random_str += base_str[random.randint(0, length)]
return random_str
def generate_random_num(randomlength=100):
return random.randint(0, randomlength)
# 集群A
startup_nodes = [
{
"host": "10.4.7.221", "port": 8001},
{
"host": "10.4.7.222", "port": 8011},
{
"host": "10.4.7.222", "port": 8002},
{
"host": "10.4.7.223", "port": 8012},
{
"host": "10.4.7.223", "port": 8003},
{
"host": "10.4.7.221", "port": 8013}
]
redis_conn = StrictRedisCluster(
startup_nodes=startup_nodes, decode_responses=True, password='')
p = redis_conn.pipeline()
for i in range(0, 1000000):
p.set('cluster:A:hello_'+str(i).zfill(8), generate_random_str(10))
if i % 1000 == 0:
p.execute()
print("========>executed:{}".format(i))
import redis
import time
import traceback
import random
from time import ctime, sleep
from rediscluster import StrictRedisCluster
# 生成随机字符串
def generate_random_str(randomlength=16):
random_str = ''
base_str = 'ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz0123456789'
length = len(base_str) - 1
for i in range(randomlength):
random_str += base_str[random.randint(0, length)]
return random_str
def generate_random_num(randomlength=100):
return random.randint(0, randomlength)
# 集群B
startup_nodes = [
{
"host": "10.4.7.221","port": 9001},
{
"host": "10.4.7.222","port": 9011},
{
"host": "10.4.7.222","port": 9002},
{
"host": "10.4.7.223","port": 9012},
{
"host": "10.4.7.223","port": 9003},
{
"host": "10.4.7.221","port": 9013},
{
"host": "10.4.7.221","port": 9004},
{
"host": "10.4.7.222","port": 9014}
]
redis_conn = StrictRedisCluster(
startup_nodes=startup_nodes, decode_responses=True, password='')
p = redis_conn.pipeline()
for i in range(0, 1000000):
# 字符串
p.set('cluster:B:hello_'+str(i).zfill(8), generate_random_str(10))
if i % 1000 == 0:
p.execute()
print("========>executed:{}".format(i))
import redis
import time
import traceback
import random
from time import ctime, sleep
from rediscluster import StrictRedisCluster
# 生成随机字符串
def generate_random_str(randomlength=16):
random_str = ''
base_str = 'ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz0123456789'
length = len(base_str) - 1
for i in range(randomlength):
random_str += base_str[random.randint(0, length)]
return random_str
def generate_random_num(randomlength=100):
return random.randint(0, randomlength)
# 集群C
startup_nodes = [
{
"host": "10.4.7.221","port": 7001},
{
"host": "10.4.7.222","port": 7011},
{
"host": "10.4.7.222","port": 7002},
{
"host": "10.4.7.223","port": 7012},
{
"host": "10.4.7.223","port": 7003},
{
"host": "10.4.7.221","port": 7013},
{
"host": "10.4.7.221","port": 7004},
{
"host": "10.4.7.222","port": 7014},
{
"host": "10.4.7.222","port": 7005},
{
"host": "10.4.7.223","port": 7015}
]
redis_conn = StrictRedisCluster(
startup_nodes=startup_nodes, decode_responses=True, password='')
p = redis_conn.pipeline()
for i in range(0, 1000000):
# 字符串
p.set('cluster:C:hello_'+str(i).zfill(8), generate_random_str(10))
if i % 1000 == 0:
p.execute()
print("========>executed:{}".format(i))
插入数据后记录一下集群各节点的数据数量和内存占用,用于比对集群混乱后的的数据分布情况。
redis-cli -h 10.4.7.221 -p 8001 dbsize redis-cli -h 10.4.7.222 -p 8002 dbsize redis-cli -h 10.4.7.223 -p 8003 dbsize redis-cli -h 10.4.7.221 -p 9001 dbsize redis-cli -h 10.4.7.222 -p 9002 dbsize redis-cli -h 10.4.7.223 -p 9003 dbsize redis-cli -h 10.4.7.221 -p 9004 dbsize redis-cli -h 10.4.7.221 -p 7001 dbsize redis-cli -h 10.4.7.222 -p 7002 dbsize redis-cli -h 10.4.7.223 -p 7003 dbsize redis-cli -h 10.4.7.221 -p 7004 dbsize redis-cli -h 10.4.7.222 -p 7005 dbsize redis-cli -h 10.4.7.222 -p 8011 dbsize redis-cli -h 10.4.7.223 -p 8012 dbsize redis-cli -h 10.4.7.221 -p 8013 dbsize redis-cli -h 10.4.7.222 -p 9011 dbsize redis-cli -h 10.4.7.223 -p 9012 dbsize redis-cli -h 10.4.7.221 -p 9013 dbsize redis-cli -h 10.4.7.222 -p 9014 dbsize redis-cli -h 10.4.7.222 -p 7011 dbsize redis-cli -h 10.4.7.223 -p 7012 dbsize redis-cli -h 10.4.7.221 -p 7013 dbsize redis-cli -h 10.4.7.222 -p 7014 dbsize redis-cli -h 10.4.7.223 -p 7015 dbsize redis-cli -h 10.4.7.221 -p 8001 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 8002 info memory | grep used_memory_human redis-cli -h 10.4.7.223 -p 8003 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 9001 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 9002 info memory | grep used_memory_human redis-cli -h 10.4.7.223 -p 9003 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 9004 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 7001 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 7002 info memory | grep used_memory_human redis-cli -h 10.4.7.223 -p 7003 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 7004 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 7005 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 8011 info memory | grep used_memory_human redis-cli -h 10.4.7.223 -p 8012 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 8013 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 9011 info memory | grep used_memory_human redis-cli -h 10.4.7.223 -p 9012 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 9013 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 9014 info memory | grep used_memory_human redis-cli -h 10.4.7.222 -p 7011 info memory | grep used_memory_human redis-cli -h 10.4.7.223 -p 7012 info memory | grep used_memory_human redis-cli -h 10.4.7.221 -p 7013 info memory | grep used_memory_human redis-cli -h