mha 部署及流程说明

2013-09-09

mha (Mysql Master High Availability Manaager) 流程及部署说明

一. 系统环境 ** 管理节点

   Hostname | mgr.com
    Platform | Linux
     Release | CentOS release 6.4 (Final)
      Kernel | 2.6.32-358.el6.i686
Architecture | CPU = 32-bit, OS = 32-bit

** 节点1

    Hostname | node1.com
      System | innotek GmbH; VirtualBox; v1.2 (Other)
 Service Tag | 0
    Platform | Linux
     Release | CentOS release 5.5 (Final)
      Kernel | 2.6.18-194.el5
Architecture | CPU = 32-bit, OS = 32-bit

** 节点2

    Hostname | node2.com
      System | innotek GmbH; VirtualBox; v1.2 (Other)
 Service Tag | 0
    Platform | Linux
     Release | CentOS release 5.5 (Final)
      Kernel | 2.6.18-194.el5
Architecture | CPU = 32-bit, OS = 32-bit

** 节点3

    Hostname | node3.com
    Platform | Linux
     Release | CentOS release 6.4 (Final)
      Kernel | 2.6.32-358.el6.i686
Architecture | CPU = 32-bit, OS = 32-bit

二. 实验环境:

192.168.56.108 (current master)
 +--192.168.56.109
 +--192.168.56.110
 
vip - 192.168.56.200

From:

192.168.56.108 (current master)
 +--192.168.56.109
 +--192.168.56.110

To:

192.168.56.109 (new master)
 +--192.168.56.110

详细参数:

node1.com  192.168.56.108
Version         5.5.30-rel30.2-log
Server ID       199914
Uptime          3+15:39:49 (started 2013-05-30T19:12:20)
Replication     Is not a slave, has 2 slaves connected, is not read_only
Filters         binlog_ignore_db=mysql,test,information_schema,performance_schema
Binary logging  STATEMENT
Slave status    
Slave mode      STRICT
Auto-increment  increment 1, offset 1
InnoDB version  5.5.30-rel30.2
+- 192.168.56.109
   Version         5.5.30-rel30.2-log
   Server ID       134378
   Uptime          3+15:20:38 (started 2013-05-30T19:31:31)
   Replication     Is a slave, has 0 slaves connected, is read_only
   Filters         binlog_ignore_db=mysql,test,information_schema,performance_schema; replicate_ignore_db=mysql,test,information_schema,performance_schema
   Binary logging  STATEMENT
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  5.5.30-rel30.2
+- 192.168.56.110
   Version         5.5.30-rel30.2-log
   Server ID       68842
   Uptime          3+15:43:24 (started 2013-05-30T19:08:46)
   Replication     Is a slave, has 0 slaves connected, is not read_only
   Filters         binlog_ignore_db=mysql,test,information_schema,performance_schema; replicate_ignore_db=mysql,test,information_schema,performance_schema
   Binary logging  STATEMENT
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  5.5.30-rel30.2

三. 配置说明 详细参数见 : Parameters

全局配置:

[root@mgr tmha]# cat global.conf 
[server default]
user=root
repl_user=replica
port=3306
init_conf_load_script=/usr/local/bin/init_conf_loads    #  密码信息,不以明文方式显示,init_conf_loads脚本通过base64封装密码;
ssh_user=root
master_binlog_dir=/web/mysql/node3306/data               ## mysql server  数据目录,主从目录需一致,否则每次切换主从后,改参数都需更新.
remote_workdir=/web/log/masterha                         ## mha 管理节点处理的日志信息
ping_interval=3
master_ip_failover_script=/usr/local/bin/master_ip_failover        ## 不定义stop函数 (见流程说明), 定义start函数 切换vip地址
#shutdown_script=/usr/local/bin/power_manager
report_script=/usr/local/bin/send_report                           ## mail 命令发送报告

应用配置 (主从对)

[root@mgr tmha]# cat tcase.cnf 
[server default]
manager_workdir=/web/log/mha/tcase1
manager_log=/web/log/mha/tcase1/case1.log

[server1]
hostname=192.168.56.108
candidate_master=1               # master 候选

[server2]
hostname=192.168.56.109
candidate_master=1               #  master 候选

[server3]
hostname=192.168.56.110
no_master=1                      #  永远不做master.

mha启用

nohup masterha_manager --global_conf=/web/tmha/global.conf --conf=/web/tmha/tcase.cnf --ignore_last_failover > /web/tmha/manager.log 2>&1 & 

四. 流程说明
Sequences_of_MHA

 | 复制设置和当前 master 的检测 |    注 1
             |
             |
         | 验证 | -- N -> exit.
             |
             Y
             |
       | 监控master |  -- no die --> wait until master dies.  注 2
             |                        /
            die                      /
             |                      /
    | master 连续3次失败 |  -- N --+    注 3
             |
             Y
             |
      | 检测 slave 配置 |   -- N --> stop with error. 注 4
             |
             Y
             |
      | last failover | [optional]  -- Y --> stop     ||--> ignore this setup by using ignore_last_failover
             |
             N
             |
    | master_ip_failover | [optional]  -- Y -- stop server || ->  注1   modify script based on environment.   注 5
             |
             N
             |
   | recovering new master | -- 1. saving binaty log [optinal] -- 2. determining new master -- 3. latest slave -- 4. recovering and promoting new master. 注 6.
             |
   | activating new master | -- switch virtual ip address [optional]  注 7.
             |
   | recovering rest slaves | 
             |
     | notification |  [optional]  --- sending mails

注 :

MHA::ServerManager::validate_slaves() slave检测; MHA::ServerManager::get_current_alive_master() 当前master;

2.

apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.56.110 --slave_ip=192.168.56.110 --slave_port=3306
MHA::SSHCheck::do_ssh_connection_check()  -- ssh 检测;   MHA::MasterMonitor::check_slave_env() slave环境检测; MHA::ManagerUtil::exec_ssh_cmd()    slave有效性检测;
MHA::HealthCheck::wait_until_unreachable()  -- master 健康检查;

3.

MHA::HealthCheck::wait_until_unreachable() MHA::SSHCheck::do_ssh_connection_check() ssh 检测;

4.

MHA::MasterMonitor::wait_until_master_is_dead() master失效后的操作;

5.

MHA::MasterFailover::do_master_failover() 主故障切换; 
force_shutdown -- sshrecheable : stopssh ; sshunreachable : stop mysql server 
       /usr/local/bin/master_ip_failover --orig_master_host=192.168.56.108 --orig_master_ip=192.168.56.108 --orig_master_port=3306 --command=stop		

6.

MHA::MasterFailover::check_set_latest_slaves() ;  选择具有最新relay logs的slave;
MHA::MasterFailover::save_master_binlog() ;   获取旧master的log信息,依赖于sshreachable,如果unreachable,返回warning信息;
MHA::MasterFailover::select_new_master() ;   选择新master;
MHA::MasterFailover::recover_master_internal();  Master Log Apply;
MHA::MasterFailover::recover_master();  启用新master, $new_master->disable_read_only()  新master禁止read_only
MHA::ServerManager::get_new_master_binlog_position() 得到新master的位置信息;
MHA::MasterFailover::recover_slave();   更新所有slave 记录,与新master保持一致;
MHA::Server::reset_slave_on_new_master() , reset_slave_info  and  reset_slave_all  更新所有slave到新master. 
                                                                        /
                                                                       /
MHA::DBHelper::reset_slave_by_change_master   清楚change master信息, change_master() 更新新的change master语句;   

7.

/usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.56.108 
--orig_master_ip=192.168.56.108 --orig_master_port=3306
--new_master_host=192.168.56.109 --new_master_ip=192.168.56.109 
--new_master_port=3306 --new_master_user='root' --new_master_password='xxxxxx'