MySQL 5.6主从故障处理说明

MySQL 5.6主从故障处理说明

5.6增加GTID特性作为主从复制的新协议, 如果开启需要指定 gtid_mode 为 on, 如果不开启主从复制采用传统的复制协议, 故障处理同5.1, 5.5. 以下讨论采用gtid协议后的故障处理;

GTID配置

http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-howto.html 与传统的复制相比, GTID去掉了文件及位置的参数信息, 改用 MASTER_AUTO_POSITION 替换.

GTID特性

http://www.percona.com/blog/2014/05/09/gtids-in-mysql-5-6-new-replication-protocol-new-ways-to-break-replication/ 明确gtid和传统方式的区别, GTID的限制包括以下:

http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-restrictions.html

1.不要混用存储引擎
2.不要使用 create table … select …
3.不要使用临时表
4.不要使用一些无效的语句, 比如skip_slave_skip_counter对GTID无效, 应该改用gtid_next方式处理;
5.mysqldump默认开启gtid模式, 比如出现以下信息:

Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.

6.如果启用了gtid, mysql_upgrade不会连接MySQL, 除非指定了–write-binlog=OFF选项, 或者在升级的时候关闭gtid, 重启Server升级.

主从故障恢复

http://www.percona.com/blog/2013/02/08/how-to-createrestore-a-slave-using-gtid-replication-in-mysql-5-6/

明确gtid_purged参数的含义, 等效于传统模式的文件及位置信息; slave出现故障的恢复方式; xtrabackup恢复见: http://arstercz.com/innobackupex-restore/

错误更新

http://www.percona.com/blog/2014/05/19/errant-transactions-major-hurdle-for-gtid-based-failover-in-mysql-5-6/

不同于传统的复制, 在gtid中, 手工在slave中更新记录后, 会埋下主从切换失败的隐患, gtid会做相关的校验, 来保证切换后, 原来的master继续执行相应的更新操作; 可以通过注入空事务的方式跳过master中的一条事务;等同于传统模式下的 skip_slave_skip_counter .

MHA切换

https://code.google.com/p/mysql-master-ha/wiki/GTID_Based_Failover

从0.56版本开始支持GTID特性,新版mha增加以下参数:

client_bindir: 指定bin目录; client_libdir: 指定lib目录; multi_tier_slave: 忽略第三层slave信息;

在切换的时候MHA检测实例的GTID是否开启, 再进行切换:

[root@cz-test1 masterha]# masterha_master_switch --master_state=dead  --global_conf=/etc/masterha/app_default.cnf --conf=/etc/masterha/app_56.conf --dead_master_host=10.0.21.7 --dead_master_ip=10.0.21.7 --dead_master_port=3301
...
From:
10.0.21.7(10.0.21.7:3301) (current master)
 +--10.0.21.17(10.0.21.17:3301)

To:
10.0.21.17(10.0.21.17:3301) (new master)

/*检测GTID信息*/
Wed Nov 26 13:37:49 2014 - [info] Retrieved Gtid Set: 62f821d2-7453-11e4-bebb-fa163e43bfe5:1-7
Wed Nov 26 13:37:49 2014 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Nov 26 13:37:49 2014 - [info]   10.0.21.17(10.0.21.17:3301)  Version=5.6.21-69.0-log (oldest major version between slaves) log-bin:enabled
Wed Nov 26 13:37:49 2014 - [info]     GTID ON
Wed Nov 26 13:37:49 2014 - [info]     Replicating from 10.0.21.7(10.0.21.7:3301)
Wed Nov 26 13:37:49 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Nov 26 13:37:49 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:1473
Wed Nov 26 13:37:49 2014 - [info] Retrieved Gtid Set: 62f821d2-7453-11e4-bebb-fa163e43bfe5:1-7
Wed Nov 26 13:37:49 2014 - [info] Oldest slaves:
Wed Nov 26 13:37:49 2014 - [info]   10.0.21.17(10.0.21.17:3301)  Version=5.6.21-69.0-log (oldest major version between slaves) log-bin:enabled
Wed Nov 26 13:37:49 2014 - [info]     GTID ON
Wed Nov 26 13:37:49 2014 - [info]     Replicating from 10.0.21.7(10.0.21.7:3301)
Wed Nov 26 13:37:49 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Nov 26 13:37:49 2014 - [info] 
Wed Nov 26 13:37:49 2014 - [info] * Phase 3.3: Determining New Master Phase..
Wed Nov 26 13:37:49 2014 - [info] 
Wed Nov 26 13:37:49 2014 - [info] Searching new master from slaves..
Wed Nov 26 13:37:49 2014 - [info]  Candidate masters from the configuration file:
Wed Nov 26 13:37:49 2014 - [info]   10.0.21.17(10.0.21.17:3301)  Version=5.6.21-69.0-log (oldest major version between slaves) log-bin:enabled
Wed Nov 26 13:37:49 2014 - [info]     GTID ON
Wed Nov 26 13:37:49 2014 - [info]     Replicating from 10.0.21.7(10.0.21.7:3301)
Wed Nov 26 13:37:49 2014 - [info]     Primary candidate for the new Master (candidate_master is set)

...
/*得到binlog的文件及位置信息,及GTID信息*/
Wed Nov 26 13:37:53 2014 - [info] Getting new master's binlog name and position..
Wed Nov 26 13:37:53 2014 - [info]  mysql-bin.000003:1243
Wed Nov 26 13:37:53 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.21.17', MASTER_PORT=3301, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Wed Nov 26 13:37:53 2014 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000003, 1243, 62f821d2-7453-11e4-bebb-fa163e43bfe5:1-7

原master当做slave启动的时候指定以下命令:

All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.21.17', MASTER_PORT=3301, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

执行后如下:

10.0.21.17(10.0.21.17:3301) (current master)
 +--10.0.21.7(10.0.21.7:3301)