Jusene's Blog

corosync v1+pacemaker高可用mysql

字数统计: 4.2k阅读时长: 20 min
2017/05/18 Share

corosync

corosync是基于openais(application interface standard)的一种实现,corosync在整个高可用集群中担任者message layer的工作,相比于heartbeat重量级应用,corosync就相当于轻量级了,在高可用方案的选择上corosync已经打败了heartbeat的应用,而corosync的1.0版本也是存在一个致命的缺陷,没有投票系统,所以需要cman来支持投票,而corosync的2.0版本就完全补足了这个缺陷,不同于heartbeat自带crm,corosync的crm来源于heartbeat发展到3.0的时候独立出来的crm管理器pacemaker,下面通过各实例来介绍下corosync+pacemaker的配置应用。

安装crmsh corosync pacemaker

准备工作

  1. 两台机器时间同步
  2. 两台机器名要与uname -a输出的名字相同
  3. 配置hosts解析
  4. 配置两台主机双机互信
  5. 资源的开机启动项必须关闭

实验环境
centos 6
crmsh 3.0.0
corosync 1.4.7
pacemaker 1.1.5
节点
node1 10.211.55.48
node2 10.211.55.49
资源
mysql 5.1
vip 10.211.55.24

  • 安装corosync和pacemaker

corosync和pacemaker作为主流的高可用集群架构实现,yum源就收录这些应用。

1
~]# yum install -y corosync pacemaker;ssh node2 ‘yum install -y corosync pacemaker’
  • 安装crmsh

crmsh是pacemaker的一个接口,可以实现对pacemaker上的资源进行定义和管理,从pacemaker1.1.8开始,crmsh发展成独立项目,pacemaker不再支持,crmsh目前由suse维护,而redhat也有相同的项目pcs,而crmsh相对pcs有更好的管理效果,所以大部分集群管理还是会使用crmsh来管理。

1
对于 RedHat RHEL-6,请以 根用户 root 运行下面命令:
2
3
cd /etc/yum.repos.d/
4
wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo
5
yum install crmsh
6
7
对于 CentOS CentOS-7,请以 根用户 root 运行下面命令:
8
9
cd /etc/yum.repos.d/
10
wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
11
yum install crmsh
12
13
对于 CentOS CentOS-6,请以 根用户 root 运行下面命令:
14
15
cd /etc/yum.repos.d/
16
wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo
17
yum install crmsh

以上是suse官网提供的crmsh的下载源,可以供不同的用户使用。

配置corosync和pacemaker

这里我们将pacemaker以corosync的插件模式运行,也就只有corosync1.0版本支持,2.0不支持插件运行。

1
~]# cd /etc/corosync
2
~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
3
~]# cat /etc/corosync/corosync.conf
4
# Please read the corosync.conf.5 manual page
5
compatibility: whitetank  # 兼容08.以前的版本
6
totem {
7
    version: 2 # totme 的版本,不可更改
8
    secauth: on # 安全认证,当使用aisexec时,会非常消耗CPU
9
    threads: 0 # 用于安全认证开启并行线程数
10
    interface {
11
        ringnumber: 0 # 回环号码,如果一个主机有多块网卡,避免心跳信息回流
12
		bindnetaddr: 10.211.55.0 # 绑定心跳网段 corosync会自动判断本地网卡上配置的哪个IP地址是属于这个网络的,并把这个接口作为多播心跳信息传递的接口
13
        mcastaddr: 239.245.14.1 # 心跳信息组播地址(所有节点组播地址必须为同一个)
14
        mcastport: 5405    # 组播时使用的端口
15
        ttl: 1 #只向外一跳心跳信息,避免组播报文环路
16
    }
17
}
18
    #totem定义集群内各节点间是如何通信的,totem本是一种协议,专用于corosync专用于各节点间的协议,totem协议是有版本的;
19
logging {
20
    fileline: off # 指定要打印的行
21
    to_stderr: no # 日志信息是否发往错误输出(建议为否)
22
    to_logfile: yes # 是否记录日志文件
23
    to_syslog: no # 是否记录于syslog日志-->此类日志记录于/var/log/message中
24
    logfile: /var/log/cluster/corosync.log # 日志存放位置
25
    debug: off #只要不是为了排错,最好关闭debug,它记录的信息过于详细,会占用大量的磁盘IO.
26
    timestamp: on # 是否打印时间戳,利于定位错误,但会产生大量系统调用,消耗CPU资源
27
    logger_subsys {
28
        subsys: AMF
29
        debug: off
30
    }
31
}
32
service{
33
    ver:0  # 版本号
34
    name:pacemaker  # 模块名 # 启动corosync时同时启动pacemaker
35
}
36
    # corosync启动后会自动启动 pacemaker (此时会以插件的方式来启动pacemaker)
37
aisxec {
38
    user:root
39
    group:root
40
41
    # 启用ais功能时以什么身份来运行,默认就是 root,aisxec区域也可省略;
42
43
~]# corosync-keygen   #我们启动了安全认证,需要产生密钥
44
~]# scp -p {authkey,corosync.conf} node2:/etc/corosync/
45
~]# service corosync start;ssh node2 "service corosync start"

注意:
corosync-keygen命令生成密钥时会用到 /dev/random
/dev/random是Linux系统下的随机数生成器,它会从当前系统的内存中一个叫熵池的地址空间中根据系统中断来生成随机数,加密程序或密钥生成程序会用到大量的随机数,就会出现随机数不够用的情况,random 的特性就是一旦熵池中的随机数被取空,会阻塞当前系统进程等待产生中断会继续生成随机数;
由于此处会用到1024位长度的密钥,可能会存在熵池中的随机数不够用的情况,就会一直阻塞在生成密钥的阶段,两种解决办法:
1、手动在键盘上输入大量字符,产生系统中断(产生中断较慢,不建议使用)
2、通过互联网或FTP服务器下载较大的文件(产生中断较快,建议使用)
3、dd不停的产生io读写(在没有下载数据的时候可以使用)

检查corosync的启动日志:

1
~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
2
May 16 11:17:52 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
3
May 16 11:17:52 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
4
5
~]# ]# grep  TOTEM  /var/log/cluster/corosync.log
6
May 16 11:17:52 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
7
May 16 11:17:52 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
8
May 16 11:17:52 corosync [TOTEM ] The network interface [10.211.55.48] is now up.
9
May 16 11:17:53 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
10
11
~]# ]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
12
May 16 11:17:52 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
13
May 16 11:17:52 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
14
May 16 11:17:54 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=15431, rc=100)
15
16
这里可以看到pacemaker作为插件的警告,这里可以忽略。
17
18
~]# grep pcmk_startup /var/log/cluster/corosync.log
19
May 16 11:17:53 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
20
May 16 11:17:53 corosync [pcmk  ] Logging: Initialized pcmk_startup
21
May 16 11:17:53 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
22
May 16 11:17:53 corosync [pcmk  ] info: pcmk_startup: Service: 9
23
May 16 11:17:53 corosync [pcmk  ] info: pcmk_startup: Local hostname: node1
24
25
这里检查pacemaker是否启动

配置crmsh

查看crm的监测状态:

1
~]# crm_mon
2
Stack: classic openais (with plugin)
3
Current DC: node1 (version 1.1.15-5.el6-e174ec8) - partition with quorum
4
Last updated: Tue May 16 12:38:28 2017          Last change: Tue May 16 11:41:36 2017 by root via cibadmin on node1
5
, 2 expected votes
6
2 nodes and 0 resources configured
7
8
Online: [ node1 node2 ]
9
10
No active resources
11
~]# crm_verify -L -V
12
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
13
   error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
14
   error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
15
Errors found during check: config not valid 
16
17
这里看下corosync默认启用了stonith设备,而集群是没油stonith设备的,因此配置是不可用的,我们需要关闭stonith设备的启用
1
crmsh有两种工作模式
2
1、命令行模式
3
~]# crm status
4
Stack: classic openais (with plugin)
5
Current DC: node1 (version 1.1.15-5.el6-e174ec8) - partition with quorum
6
Last updated: Thu May 18 03:04:39 2017          Last change: Tue May 16 11:41:36 2017 by root via cibadmin on node1
7
, 2 expected votes
8
2 nodes and 0 resources configured
9
10
Online: [ node1 node2 ]
11
12
No resources
13
14
2、交互模式
15
~]# crm
16
crm(live)# status
17
Stack: classic openais (with plugin)
18
Current DC: node1 (version 1.1.15-5.el6-e174ec8) - partition with quorum
19
Last updated: Thu May 18 03:05:19 2017          Last change: Tue May 16 11:41:36 2017 by root via cibadmin on node1
20
, 2 expected votes
21
2 nodes and 0 resources configured
22
23
Online: [ node1 node2 ]
24
25
No resources

简单介绍下子命令的使用:

1
~]# crm
2
crm(live)# configure
3
crm(live)configure# help
4
		node             define a cluster node # 定义一个集群节点
5
		primitive        define a resource # 定义资源
6
        monitor          add monitor operation to a primitive # 对一个资源添加监控选项(如超时时间,启动失败后的操作)
7
        group            define a group # 定义一个组类型(将多个资源整合在一起)
8
        clone            define a clone # 定义一个克隆类型(可以设置总的克隆数,每一个节点上可以运行几个克隆)
9
        ms               define a master-slave resource # 定义一个主从类型(集群内的节点只能有一个运行主资源,其它从的做备用)
10
        rsc_template     define a resource template # 定义一个资源模板
11
        location         a location preference # 定义位置约束优先级(默认运行于那一个节点(如果位置约束的值相同,默认倾向性那一个高,就在那一个节点上运行))
12
        colocation       colocate resources # 排列约束资源(多个资源在一起的可能性)
13
        order            order resources # 资源的启动的先后顺序
14
        rsc_ticket       resources ticket dependency
15
        property         set a cluster property # 设置集群属性
16
        rsc_defaults     set resource defaults # 设置资源默认属性(粘性)
17
        fencing_topology node fencing order # 隔离节点顺序
18
        role             define role access rights # 定义角色的访问权限
19
        user             define user access rights # 定义用用户访问权限
20
        op_defaults      set resource operations defaults # 设置资源默认选项
21
        schema           set or display current CIB RNG schema
22
        show             display CIB objects # 显示集群信息库对
23
        edit             edit CIB objects # 编辑集群信息库对象(vim模式下编辑)
24
        filter           filter CIB objects # 过滤CIB对象
25
        delete           delete CIB objects # 删除CIB对象
26
        default-timeouts set timeouts for operations to minimums from the meta-data
27
        rename           rename a CIB object # 重命名CIB对象
28
        modgroup         modify group # 改变资源组
29
        refresh          refresh from CIB # 重新读取CIB信息
30
        erase            erase the CIB # 清除CIB信息
31
        ptest            show cluster actions if changes were committed
32
        rsctest          test resources as currently configured
33
        cib              CIB shadow management
34
        cibstatus        CIB status management and editing
35
        template         edit and import a configuration from a template
36
        commit           commit the changes to the CIB # 将更改后的信息提交写入CIB
37
        verify           verify the CIB with crm_verify # CIB语法验证
38
        upgrade          upgrade the CIB to version 1.0
39
        save             save the CIB to a file # 将当前CIB导出到一个文件中(导出的文件存于切换crm 之前的目录)
40
        load             import the CIB from a file # 从文件内容载入CIB
41
        graph            generate a directed graph
42
        xml              raw xml
43
        help             show help (help topics for list of topics) # 显示帮助信息
44
        end              go back one level # 回到第一级(crm(live)#)
45
crm(live)configure# cd ..
46
crm(live)# resource
47
crm(live)resource# help
48
		status           show status of resources # 显示资源状态信息
49
        start            start a resource # 启动一个资源
50
        stop             stop a resource # 停止一个资源
51
        restart          restart a resource # 重启一个资源
52
        promote          promote a master-slave resource # 提升一个主从资源
53
        demote           demote a master-slave resource # 降级一个主从资源
54
        manage           put a resource into managed mode
55
        unmanage         put a resource into unmanaged mode
56
        migrate          migrate a resource to another node # 将资源迁移到另一个节点上
57
        unmigrate        unmigrate a resource to another node
58
        param            manage a parameter of a resource # 管理资源的参数
59
        secret           manage sensitive parameters # 管理敏感参数
60
        meta             manage a meta attribute # 管理源属性
61
        utilization      manage a utilization attribute
62
        failcount        manage failcounts # 管理失效计数器
63
        cleanup          cleanup resource status # 清理资源状态
64
        refresh          refresh CIB from the LRM status # 从LRM(LRM本地资源管理)更新CIB(集群信息库),在
65
        reprobe          probe for resources not started by the CRM # 探测在CRM中没有启动的资源
66
        trace            start RA tracing # 启用资源代理(RA)追踪
67
        untrace          stop RA tracing # 禁用资源代理(RA)追踪
68
        help             show help (help topics for list of topics) # 显示帮助
69
        end              go back one level # 返回一级(crm(live)#)
70
        quit             exit the program # 退出交互式程序
71
crm(live)resource# cd ..
72
crm(live)# node
73
crm(live)node# help
74
	status           show nodes status as XML # 以xml格式显示节点状态信息
75
    show             show node # 命令行格式显示节点状态信息
76
    standby          put node into standby # 模拟指定节点离线(standby在后面必须的FQDN)
77
    online           set node online # 节点重新上线
78
    maintenance      put node into maintenance mode
79
    ready            put node into ready mode
80
    fence            fence node # 隔离节点
81
    clearstate       Clear node state # 清理节点状态信息
82
    delete           delete node # 删除 一个节点
83
    attribute        manage attributes
84
    utilization      manage utilization attributes
85
    status-attr      manage status attributes
86
    help             show help (help topics for list of topics)
87
    end              go back one level
88
    quit             exit the program
89
crm(live)node# cd ..
90
crm(live)# ra
91
crm(live)ra # help
92
		classes          list classes and providers # 为资源代理分类
93
        list             list RA for a class (and provider)# 显示一个类别中的提供的资源
94
        meta             show meta data for a RA # 显示一个资源代理序的可用参数(如meta ocf:heartbeat:IPaddr2)
95
        providers        show providers for a RA and a class
96
        help             show help (help topics for list of topics)
97
        end              go back one level
98
        quit             exit the program

配置高可用mysql

1
~]#crm
2
crm(live)# configure
3
crm(live)configure# proterty stonith-enabled=false
4
crm(live)configure# verify
5
crm(live)configure# commit
6
crm(libe)configure# show
7
node node1
8
node node2
9
property cib-bootstrap-options: \
10
        have-watchdog=false \
11
        dc-version=1.1.15-5.el6-e174ec8 \
12
        cluster-infrastructure="classic openais (with plugin)" \
13
        expected-quorum-votes=2 \
14
        stonith-enabled=false
15
crm(live)configure# primitive mysqlip ocf:heartbeat:IPaddr params ip=10.211.55.24 iflabel=eth0 nic=eth0 op monitor interval=10s timeout=20s
16
crm(live)configure# primitive mysqlservice lsb:mysqld op monitor interval=10s timeout=20s
17
crm(live)configure# verify
18
crm(live)configure# commit
19
crm(live)configure# cd ..
20
crm(live)# status
21
Stack: classic openais (with plugin)
22
Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
23
Last updated: Thu May 18 03:33:15 2017          Last change: Thu May 18 03:32:37 2017 by root via cibadmin on node1
24
, 2 expected votes
25
2 nodes and 2 resources configured
26
27
Online: [ node1 node2 ]
28
29
Full list of resources:
30
31
 mysqlip        (ocf::heartbeat:IPaddr):        Started node1
32
 mysqlservice   (lsb:mysqld):   Started node2
33
 
34
crm(live)# configure
35
crm(live)configure# group mysql mysqlip mysqlservice
36
crm(live)configure# commit
37
crm(live)configure# cd ..
38
crm(live)# status
39
Stack: classic openais (with plugin)
40
Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
41
Last updated: Thu May 18 03:39:37 2017          Last change: Thu May 18 03:39:34 2017 by root via cibadmin on node1
42
, 2 expected votes
43
2 nodes and 2 resources configured
44
45
Online: [ node1 node2 ]
46
47
Full list of resources:
48
49
 Resource Group: mysql
50
     mysqlip    (ocf::heartbeat:IPaddr):        Started node2
51
     mysqlservice       (lsb:mysqld):   Started node2

这中间会出现一个问题,我们是个双节点集群,当其中一个节点停止,资源就会消失而不是转移到另一个节点上,因为当前是两节点的集群,只要一个节点顺坏,节点就没法投票,就会出现without quorum,而这种问题解决需要两种方法:

  1. 配置一个仲裁节点
  2. 当不具备法定票数的时候忽略处理
    注意:忽略法定票数,可能导致集群分裂,在生产环境不建议使用。
1
~]# crm
2
crm(live)# configure
3
crm(live)configure# property no-quorum-policy=ignore
4
crm(live)configure# commit
5
6
注:no-quorum-policy={stop|freeze|suicide|ignore} 默认是stop

配置资源约束

先删除资源组

1
~]#crm
2
crm(live)# resource
3
crm(live)resource# stop mysql
4
crm(live)resource# cd ..
5
crm(live)# status
6
Stack: classic openais (with plugin)
7
Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
8
Last updated: Thu May 18 04:02:27 2017          Last change: Thu May 18 04:01:49 2017 by root via cibadmin on node1
9
, 2 expected votes
10
2 nodes and 2 resources configured: 4 resources DISABLED and 0 BLOCKED from being started due to failures
11
12
Online: [ node1 node2 ]
13
14
Full list of resources:
15
16
 Resource Group: mysql
17
     mysqlip    (ocf::heartbeat:IPaddr):        Stopped (disabled)
18
     mysqlservice       (lsb:mysqld):   Stopped (disabled)
19
20
crm(live)# configure
21
crm(live)configure# delete mysql
22
crm(live)configure# cd ..
23
There are changes pending. Do you want to commit them (y/n)? y
24
crm(live)# status
25
Stack: classic openais (with plugin)
26
Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
27
Last updated: Thu May 18 04:05:46 2017          Last change: Thu May 18 04:05:38 2017 by root via cibadmin on node1
28
, 2 expected votes
29
2 nodes and 2 resources configured
30
31
Online: [ node1 node2 ]
32
33
Full list of resources:
34
35
 mysqlip        (ocf::heartbeat:IPaddr):        Started node1
36
 mysqlservice   (lsb:mysqld):   Started node2
  • 排列约束

    1
    crm(live)# configure
    2
    crm(live)configure# colocation mysqlip_with_mysqlservice inf: mysqlip mysqlservice
    3
    crm(live)configure# commit
    4
    crm(live)configure# cd ..
    5
    crm(live)# status
    6
    Stack: classic openais (with plugin)
    7
    Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
    8
    Last updated: Thu May 18 04:18:48 2017          Last change: Thu May 18 04:14:22 2017 by root via cibadmin on node1
    9
    , 2 expected votes
    10
    2 nodes and 2 resources configured
    11
    12
    Online: [ node1 node2 ]
    13
    14
    Full list of resources:
    15
    16
     mysqlip        (ocf::heartbeat:IPaddr):        Started node2
    17
     mysqlservice   (lsb:mysqld):   Started node2
  • 顺序约束

    1
    crm(live)# configure
    2
    crm(live)configure# order mysqlip_after_myserver 
    3
    Mandatory   Optional    Serialize   
    4
    crm(live)configure# order mysqlip_after_myserver Mandatory: mysqlip mysqlservice
    5
    crm(live)configure# commit

Mandatory代表强制,mysqlip mysqlservice这两个资源必须按照我给定的顺序启动

  • 定义位置约束
    1
    crm(live)configure# location mysqip_prefer_node1 mysqlip rule 100: #uname eq node1
    2
    crm(live)configure# commit
    3
    crm(live)configure# cd ..
    4
    crm(live)# status
    5
    Stack: classic openais (with plugin)
    6
    Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
    7
    Last updated: Thu May 18 04:32:40 2017          Last change: Thu May 18 04:32:36 2017 by root via cibadmin on node2
    8
    , 2 expected votes
    9
    2 nodes and 2 resources configured
    10
    11
    Online: [ node1 node2 ]
    12
    13
    Full list of resources:
    14
    15
     mysqlip        (ocf::heartbeat:IPaddr):        Started node2
    16
     mysqlservice   (lsb:mysqld):   Started node2
    17
    18
    crm(live)# status
    19
    Stack: classic openais (with plugin)
    20
    Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
    21
    Last updated: Thu May 18 04:32:43 2017          Last change: Thu May 18 04:32:36 2017 by root via cibadmin on node1
    22
    , 2 expected votes
    23
    2 nodes and 2 resources configured
    24
    25
    Online: [ node1 node2 ]
    26
    27
    Full list of resources:
    28
    29
     mysqlip        (ocf::heartbeat:IPaddr):        Started node1
    30
     mysqlservice   (lsb:mysqld):   Started node1
CATALOG
  1. 1. corosync
  2. 2. 安装crmsh corosync pacemaker
    1. 2.1. 准备工作
    2. 2.2. 配置corosync和pacemaker
  3. 3. 配置crmsh
  4. 4. 配置高可用mysql
    1. 4.1. 配置资源约束