Jusene's Blog

Etcd Cluster集群管理

字数统计: 2.1k阅读时长: 10 min
2017/11/12 Share

Etcd Cluster

Etcd集群采用典型的主从模型,通过raft协议来保证在一段时间内有一个节点为主节点,这样的选举机制,所以就跟其他的分布式集群一样,集群节点个数推荐奇数个,最少3个。

构建集群

  • node1 10.211.55.6
  • node2 10.211.55.43
  • node3 10.211.55.4

静态配置集群信息

和构建其他集群一样修改/etc/hosts和同步时间

node1

1
~]# ./etcd --name n1 
2
			--initial-cluster-token cluster1 
3
			--initial-cluster-state new 
4
			--listen-client-urls http://127.0.0.1:2379,http://10.211.55.6:2379 
5
			--listen-peer-urls http://10.211.55.6:2380 
6
			--advertise-client-urls http://10.211.55.6:2379 
7
			--initial-advertise-peer-urls http://10.211.55.6:2380 
8
			--initial-cluster n1=http://10.211.55.6:2380,n2=http://10.211.55.43:2380,n3=http://10.211.55.4:2380

node2

1
~]#./etcd --name n2 
2
			--initial-cluster-token cluster1 
3
			--initial-cluster-state new 
4
			--listen-client-urls http://127.0.0.1:2379,http://10.211.55.43:2379 
5
			--listen-peer-urls http://10.211.55.43:2380 
6
			--advertise-client-urls http://10.211.55.43:2379 
7
			--initial-advertise-peer-urls http://10.211.55.43:2380 
8
			--initial-cluster n1=http://10.211.55.6:2380,n2=http://10.211.55.43:2380,n3=http://10.211.55.4:2380

node3

1
~]# ./etcd --name n3 
2
			--initial-cluster-token cluster1 
3
			--initial-cluster-state new 
4
			--listen-client-urls http://127.0.0.1:2379,http://10.211.55.4:2379 
5
			--listen-peer-urls http://10.211.55.4:2380 
6
			--advertise-client-urls http://10.211.55.4:2379 
7
			--initial-advertise-peer-urls http://10.211.55.4:2380 
8
			--initial-cluster n1=http://10.211.55.6:2380,n2=http://10.211.55.43:2380,n3=http://10.211.55.4:2380
1
~]# ./etcdctl cluster-health
2
member 126d5057628cf9e1 is healthy: got healthy result from http://10.211.55.43:2379
3
member 6da33978247d0f4c is healthy: got healthy result from http://10.211.55.6:2379
4
member d143f97740999fa6 is healthy: got healthy result from http://10.211.55.4:2379
5
cluster is healthy
6
~]# ./etcdctl member list
7
126d5057628cf9e1: name=n2 peerURLs=http://10.211.55.43:2380 clientURLs=http://10.211.55.43:2379 isLeader=true
8
6da33978247d0f4c: name=n1 peerURLs=http://10.211.55.6:2380 clientURLs=http://10.211.55.6:2379 isLeader=false
9
d143f97740999fa6: name=n3 peerURLs=http://10.211.55.4:2380 clientURLs=http://10.211.55.4:2379 isLeader=false

动态发现

可见动态配置信息在–initial-cluster上需要自己制定个集群节点,如果量大的话,就会很不方便,所以CoreOS也提供了一个Etcd发现服务。

首先需要为集群申请统一一个独一无二的uuid

1
~]# curl https://discovery.etcd.io/new?size=3
2
https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685

node1

1
~]# ./etcd --name n1 
2
			--initial-cluster-token cluster1 
3
			--initial-cluster-state new 
4
			--listen-client-urls http://127.0.0.1:2379,http://10.211.55.6:2379 
5
			--listen-peer-urls http://10.211.55.6:2380 
6
			--advertise-client-urls http://10.211.55.6:2379 
7
			--initial-advertise-peer-urls http://10.211.55.6:2380 
8
			--discovery https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685

node2

1
~]#./etcd --name n2 
2
			--initial-cluster-token cluster1 
3
			--initial-cluster-state new 
4
			--listen-client-urls http://127.0.0.1:2379,http://10.211.55.43:2379 
5
			--listen-peer-urls http://10.211.55.43:2380 
6
			--advertise-client-urls http://10.211.55.43:2379 
7
			--initial-advertise-peer-urls http://10.211.55.43:2380 
8
			--discovery https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685

node3

1
~]# ./etcd --name n3 
2
			--initial-cluster-token cluster1 
3
			--initial-cluster-state new 
4
			--listen-client-urls http://127.0.0.1:2379,http://10.211.55.4:2379 
5
			--listen-peer-urls http://10.211.55.4:2380 
6
			--advertise-client-urls http://10.211.55.4:2379 
7
			--initial-advertise-peer-urls http://10.211.55.4:2380 
8
			--discovery https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685
1
~]# ./etcdctl member list
2
126d5057628cf9e1: name=n2 peerURLs=http://10.211.55.43:2380 clientURLs=http://10.211.55.43:2379 isLeader=true
3
6da33978247d0f4c: name=n1 peerURLs=http://10.211.55.6:2380 clientURLs=http://10.211.55.6:2379 isLeader=false
4
d143f97740999fa6: name=n3 peerURLs=http://10.211.55.4:2380 clientURLs=http://10.211.55.4:2379 isLeader=false
5
~]# ./etcdctl cluster-health
6
member 126d5057628cf9e1 is healthy: got healthy result from http://10.211.55.43:2379
7
member 6da33978247d0f4c is healthy: got healthy result from http://10.211.55.6:2379
8
member d143f97740999fa6 is healthy: got healthy result from http://10.211.55.4:2379
9
cluster is healthy

集群参数配置

集群为我们提供了横向扩张的能力,但相对的也为整个服务提供一些额外的影响因素,如集群间的网络抖动,时间同步,数据同步的存储压力和网络压力,所以我们对于集群的管理需要更加的精细。

时间同步

时间同步这是每个分布式集群必须强调的地方,而对于etcd集群,时间误差超过1s就会导致Raft协议异常,所以时间必须同步。

心跳消息时间间隔和选举时间间隔

从这里看,无论如何选举时间时间间隔都要比心跳时间间隔要长,一般建议5倍以上,这个参数可以通过–heartbeat-interval和–election-timeout参数来指定。

snapshot频率

etcd会定期将数据存储为snapshot,默认10000次修改才会存储一次,在存储时会有大量数据写入,影响集群性能。

更新节点

1
$ etcdctl member list
2
6e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:23802 clientURLs=http://127.0.0.1:23792
3
924e2e83e93f2560: name=node3 peerURLs=http://localhost:23803 clientURLs=http://127.0.0.1:23793
4
a8266ecf031671f3: name=node1 peerURLs=http://localhost:23801 clientURLs=http://127.0.0.1:23791
5
6
在本例中,我们假设要更新 ID 为 a8266ecf031671f3 的节点的 peerURLs 为:http://10.0.1.10:2380
7
8
$ etcdctl member update a8266ecf031671f3 http://10.0.1.10:2380
9
Updated member with ID a8266ecf031671f3 in cluster

删除节点

1
$ etcdctl member remove a8266ecf031671f3
2
Removed member a8266ecf031671f3 from cluster

增加节点

1
$ etcdctl member add infra3 http://10.0.1.13:2380
2
added member 9bf1b35fc7761a23 to cluster
3
4
ETCD_NAME="infra3"
5
ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
6
ETCD_INITIAL_CLUSTER_STATE=existing

在etcd新节点的上执行:

1
$ export ETCD_NAME="infra3"
2
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
3
$ export ETCD_INITIAL_CLUSTER_STATE=existing
4
$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379  -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380 -data-dir %data_dir%

服务故障迁移

首先备份出正常节点的数据

1
$ ./etcdctl backup --data-dir /var/lib/etcd -backup-dir /tmp/etcd_backup
2
$ tar -zcxf backup.etcd.tar.gz /tmp/etcd_backup

然后将Etcd数据恢复到新的集群的任意一个节点上, 使用 –force-new-cluster 参数启动Etcd服务。这个参数会重置集群ID和集群的所有成员信息,其中节点的监听地址会被重置为localhost:2379, 表示集群中只有一个节点。

1
$ tar -zxvf backup.etcd.tar.gz -C /var/lib/etcd
2
$ etcd --data-dir=/var/lib/etcd --force-new-cluster ...

启动完成单节点的etcd,可以先对数据的完整性进行验证, 确认无误后再通过Etcd API修改节点的监听地址,让它监听节点的外部IP地址,为增加其他节点做准备。例如:

用etcd命令找到当前节点的ID。

1
$ etcdctl member list 
2
3
98f0c6bf64240842: name=cd-2 peerURLs=http://127.0.0.1:2580 clientURLs=http://127.0.0.1:2579

由于etcdctl不具备修改成员节点参数的功能, 下面的操作要使用API来完成。

1
$ curl http://127.0.0.1:2579/v2/members/98f0c6bf64240842 -XPUT \
2
 -H "Content-Type:application/json" -d '{"peerURLs":["http://127.0.0.1:2580"]}'

注意,在Etcd文档中, 建议首先将集群恢复到一个临时的目录中,从临时目录启动etcd,验证新的数据正确完整后,停止etcd,在将数据恢复到正常的目录中。

最后,在完成第一个成员节点的启动后,可以通过集群扩展的方法使用 etcdctl member add 命令添加其他成员节点进来。

参考文档:https://www.cnblogs.com/breg/p/5728237.html

CATALOG
  1. 1. Etcd Cluster
  2. 2. 构建集群
    1. 2.1. 静态配置集群信息
    2. 2.2. 动态发现
  3. 3. 集群参数配置
    1. 3.1. 时间同步
    2. 3.2. 心跳消息时间间隔和选举时间间隔
    3. 3.3. snapshot频率
  4. 4. 更新节点
  5. 5. 删除节点
  6. 6. 增加节点
  7. 7. 服务故障迁移