Etcd Cluster
Etcd集群采用典型的主从模型,通过raft协议来保证在一段时间内有一个节点为主节点,这样的选举机制,所以就跟其他的分布式集群一样,集群节点个数推荐奇数个,最少3个。
构建集群
- node1 10.211.55.6
- node2 10.211.55.43
- node3 10.211.55.4
静态配置集群信息
和构建其他集群一样修改/etc/hosts和同步时间
node1
1 | ~]# ./etcd --name n1 |
2 | --initial-cluster-token cluster1 |
3 | --initial-cluster-state new |
4 | --listen-client-urls http://127.0.0.1:2379,http://10.211.55.6:2379 |
5 | --listen-peer-urls http://10.211.55.6:2380 |
6 | --advertise-client-urls http://10.211.55.6:2379 |
7 | --initial-advertise-peer-urls http://10.211.55.6:2380 |
8 | --initial-cluster n1=http://10.211.55.6:2380,n2=http://10.211.55.43:2380,n3=http://10.211.55.4:2380 |
node2
1 | ~]#./etcd --name n2 |
2 | --initial-cluster-token cluster1 |
3 | --initial-cluster-state new |
4 | --listen-client-urls http://127.0.0.1:2379,http://10.211.55.43:2379 |
5 | --listen-peer-urls http://10.211.55.43:2380 |
6 | --advertise-client-urls http://10.211.55.43:2379 |
7 | --initial-advertise-peer-urls http://10.211.55.43:2380 |
8 | --initial-cluster n1=http://10.211.55.6:2380,n2=http://10.211.55.43:2380,n3=http://10.211.55.4:2380 |
node3
1 | ~]# ./etcd --name n3 |
2 | --initial-cluster-token cluster1 |
3 | --initial-cluster-state new |
4 | --listen-client-urls http://127.0.0.1:2379,http://10.211.55.4:2379 |
5 | --listen-peer-urls http://10.211.55.4:2380 |
6 | --advertise-client-urls http://10.211.55.4:2379 |
7 | --initial-advertise-peer-urls http://10.211.55.4:2380 |
8 | --initial-cluster n1=http://10.211.55.6:2380,n2=http://10.211.55.43:2380,n3=http://10.211.55.4:2380 |
1 | ~]# ./etcdctl cluster-health |
2 | member 126d5057628cf9e1 is healthy: got healthy result from http://10.211.55.43:2379 |
3 | member 6da33978247d0f4c is healthy: got healthy result from http://10.211.55.6:2379 |
4 | member d143f97740999fa6 is healthy: got healthy result from http://10.211.55.4:2379 |
5 | cluster is healthy |
6 | ~]# ./etcdctl member list |
7 | 126d5057628cf9e1: name=n2 peerURLs=http://10.211.55.43:2380 clientURLs=http://10.211.55.43:2379 isLeader=true |
8 | 6da33978247d0f4c: name=n1 peerURLs=http://10.211.55.6:2380 clientURLs=http://10.211.55.6:2379 isLeader=false |
9 | d143f97740999fa6: name=n3 peerURLs=http://10.211.55.4:2380 clientURLs=http://10.211.55.4:2379 isLeader=false |
动态发现
可见动态配置信息在–initial-cluster上需要自己制定个集群节点,如果量大的话,就会很不方便,所以CoreOS也提供了一个Etcd发现服务。
首先需要为集群申请统一一个独一无二的uuid
1 | ~]# curl https://discovery.etcd.io/new?size=3 |
2 | https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685 |
node1
1 | ~]# ./etcd --name n1 |
2 | --initial-cluster-token cluster1 |
3 | --initial-cluster-state new |
4 | --listen-client-urls http://127.0.0.1:2379,http://10.211.55.6:2379 |
5 | --listen-peer-urls http://10.211.55.6:2380 |
6 | --advertise-client-urls http://10.211.55.6:2379 |
7 | --initial-advertise-peer-urls http://10.211.55.6:2380 |
8 | --discovery https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685 |
node2
1 | ~]#./etcd --name n2 |
2 | --initial-cluster-token cluster1 |
3 | --initial-cluster-state new |
4 | --listen-client-urls http://127.0.0.1:2379,http://10.211.55.43:2379 |
5 | --listen-peer-urls http://10.211.55.43:2380 |
6 | --advertise-client-urls http://10.211.55.43:2379 |
7 | --initial-advertise-peer-urls http://10.211.55.43:2380 |
8 | --discovery https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685 |
node3
1 | ~]# ./etcd --name n3 |
2 | --initial-cluster-token cluster1 |
3 | --initial-cluster-state new |
4 | --listen-client-urls http://127.0.0.1:2379,http://10.211.55.4:2379 |
5 | --listen-peer-urls http://10.211.55.4:2380 |
6 | --advertise-client-urls http://10.211.55.4:2379 |
7 | --initial-advertise-peer-urls http://10.211.55.4:2380 |
8 | --discovery https://discovery.etcd.io/21b5178b7787340a178d56cb11c8f685 |
1 | ~]# ./etcdctl member list |
2 | 126d5057628cf9e1: name=n2 peerURLs=http://10.211.55.43:2380 clientURLs=http://10.211.55.43:2379 isLeader=true |
3 | 6da33978247d0f4c: name=n1 peerURLs=http://10.211.55.6:2380 clientURLs=http://10.211.55.6:2379 isLeader=false |
4 | d143f97740999fa6: name=n3 peerURLs=http://10.211.55.4:2380 clientURLs=http://10.211.55.4:2379 isLeader=false |
5 | ~]# ./etcdctl cluster-health |
6 | member 126d5057628cf9e1 is healthy: got healthy result from http://10.211.55.43:2379 |
7 | member 6da33978247d0f4c is healthy: got healthy result from http://10.211.55.6:2379 |
8 | member d143f97740999fa6 is healthy: got healthy result from http://10.211.55.4:2379 |
9 | cluster is healthy |
集群参数配置
集群为我们提供了横向扩张的能力,但相对的也为整个服务提供一些额外的影响因素,如集群间的网络抖动,时间同步,数据同步的存储压力和网络压力,所以我们对于集群的管理需要更加的精细。
时间同步
时间同步这是每个分布式集群必须强调的地方,而对于etcd集群,时间误差超过1s就会导致Raft协议异常,所以时间必须同步。
心跳消息时间间隔和选举时间间隔
从这里看,无论如何选举时间时间间隔都要比心跳时间间隔要长,一般建议5倍以上,这个参数可以通过–heartbeat-interval和–election-timeout参数来指定。
snapshot频率
etcd会定期将数据存储为snapshot,默认10000次修改才会存储一次,在存储时会有大量数据写入,影响集群性能。
更新节点
1 | $ etcdctl member list |
2 | 6e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:23802 clientURLs=http://127.0.0.1:23792 |
3 | 924e2e83e93f2560: name=node3 peerURLs=http://localhost:23803 clientURLs=http://127.0.0.1:23793 |
4 | a8266ecf031671f3: name=node1 peerURLs=http://localhost:23801 clientURLs=http://127.0.0.1:23791 |
5 | |
6 | 在本例中,我们假设要更新 ID 为 a8266ecf031671f3 的节点的 peerURLs 为:http://10.0.1.10:2380 |
7 | |
8 | $ etcdctl member update a8266ecf031671f3 http://10.0.1.10:2380 |
9 | Updated member with ID a8266ecf031671f3 in cluster |
删除节点
1 | $ etcdctl member remove a8266ecf031671f3 |
2 | Removed member a8266ecf031671f3 from cluster |
增加节点
1 | $ etcdctl member add infra3 http://10.0.1.13:2380 |
2 | added member 9bf1b35fc7761a23 to cluster |
3 | |
4 | ETCD_NAME="infra3" |
5 | ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380" |
6 | ETCD_INITIAL_CLUSTER_STATE=existing |
在etcd新节点的上执行:
1 | $ export ETCD_NAME="infra3" |
2 | $ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380" |
3 | $ export ETCD_INITIAL_CLUSTER_STATE=existing |
4 | $ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379 -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380 -data-dir %data_dir% |
服务故障迁移
首先备份出正常节点的数据
1 | $ ./etcdctl backup --data-dir /var/lib/etcd -backup-dir /tmp/etcd_backup |
2 | $ tar -zcxf backup.etcd.tar.gz /tmp/etcd_backup |
然后将Etcd数据恢复到新的集群的任意一个节点上, 使用 –force-new-cluster 参数启动Etcd服务。这个参数会重置集群ID和集群的所有成员信息,其中节点的监听地址会被重置为localhost:2379, 表示集群中只有一个节点。
1 | $ tar -zxvf backup.etcd.tar.gz -C /var/lib/etcd |
2 | $ etcd --data-dir=/var/lib/etcd --force-new-cluster ... |
启动完成单节点的etcd,可以先对数据的完整性进行验证, 确认无误后再通过Etcd API修改节点的监听地址,让它监听节点的外部IP地址,为增加其他节点做准备。例如:
用etcd命令找到当前节点的ID。
1 | $ etcdctl member list |
2 | |
3 | 98f0c6bf64240842: name=cd-2 peerURLs=http://127.0.0.1:2580 clientURLs=http://127.0.0.1:2579 |
由于etcdctl不具备修改成员节点参数的功能, 下面的操作要使用API来完成。
1 | $ curl http://127.0.0.1:2579/v2/members/98f0c6bf64240842 -XPUT \ |
2 | -H "Content-Type:application/json" -d '{"peerURLs":["http://127.0.0.1:2580"]}' |
注意,在Etcd文档中, 建议首先将集群恢复到一个临时的目录中,从临时目录启动etcd,验证新的数据正确完整后,停止etcd,在将数据恢复到正常的目录中。
最后,在完成第一个成员节点的启动后,可以通过集群扩展的方法使用 etcdctl member add 命令添加其他成员节点进来。