技术开发 频道

通过Heartbert2 让Mysql Replication 具有HA

  6、失败测试

  6.1 网络切换测试

  将ha1 的网线拔掉,在ha2 看资源的切换情况

  Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online

  Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): OFFLINE

  Resource Group: group_1

  IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha2

  Filesystem_2 (ocf::heartbeat:Filesystem): Started ha2

  mysqld_3 (lsb:mysqld): Started ha2

  没有问题,把ha1 网线插上,资源又回切到ha1。

  Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online

  Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): online

  Resource Group: group_1

  IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha1

  Filesystem_2 (ocf::heartbeat:Filesystem): Started ha1

  mysqld_3 (lsb:mysqld): Started ha1

  6.2 宕机切换

  将ha1 强制关机在开机,这和6.2 的情况差不多,能正常切换。

  6.3 服务切换

  将ha1 的mysql 服务强制中断,修改/etc/my.cf

  datadir=/u09/data #实际这个路径不存在

  [root@ha1 ~]# ps -ef|grep mysql

  root 4597 1 0 10:44 ? 00:00:00 /bin/sh ./bin/mysqld_safe

  --datadir=/u01/data --pid-file=/u01/data/ha1.pid

  mysql 4760 4597 0 10:44 ? 00:00:00 /usr/local/mysql/bin/mysqld

  --basedir=/usr/local/mysql --datadir=/u01/data --user=mysql --log-error=/u01/data/ha1.err

  --pid-file=/u01/data/ha1.pid --socket=/tmp/mysql.sock --port=3306

  root 13174 4285 0 11:22 ? 00:00:00 /bin/sh /etc/init.d/mysqld start

  root 13192 4208 0 11:22 pts/0 00:00:00 grep mysql

  [root@ha1 ~]# kill 4597 4760

  [root@ha1 ~]# ps -ef|grep mysql

  root 13532 4208 0 11:23 pts/0 00:00:00 grep mysql

  在ha2 上查看资源情况,已经切换过来。

  Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online

  Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): online

  Resource Group: group_1

  IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha2

  Filesystem_2 (ocf::heartbeat:Filesystem): Started ha2

  mysqld_3 (lsb:mysqld): Started ha2

  Failed actions:

  mysqld_3_monitor_120000 (node=ha1, call=10, rc=1): complete

  mysqld_3_start_0 (node=ha1, call=12, rc=1): complete

  6.4 slave 同步测试:

  将ha1 的heartbeat 关闭。

  [root@ha1 data]# /etc/init.d/heartbeat stop

  在ha2 看资源情况。

  Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online

  Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): OFFLINE

  Resource Group: group_1

  IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha2

  Filesystem_2 (ocf::heartbeat:Filesystem): Started ha2

  mysqld_3 (lsb:mysqld): Started ha2

  资源全部切换到ha2,我们执行一些操作,看是否同步。

  [root@ha2 ~]# /usr/local/mysql/bin/mysql -u root -p -D test

  Enter password:

  Welcome to the MySQL monitor. Commands end with ; or \g.

  Your MySQL connection id is 30

  Server version: 5.1.31-log MySQL Community Server (GPL)

  Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

  root@test >create table t(t1 int);

  root@test >insert into t values(1);

  root@test >select * from t;

  +------+

  | t1 |

  +------+

  | 1 |

  +------+

  1 row in set (0.00 sec)

  在slave 查看是否同步

  [root@mysql2 ~]# /root/cron/lg.sh

  Welcome to the MySQL monitor. Commands end with ; or \g.

  Your MySQL connection id is 7

  Server version: 5.1.31-log MySQL Community Server (GPL)

  Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

  mysql> use test

  Database changed

  mysql> select * from t;

  +------+

  | t1 |

  +------+

  | 1 |

  +------+

  1 row in set (0.00 sec)

  在将资源切回到ha1,在进行操作,看slave 的情况

  [root@ha1 ~]# /etc/init.d/heartbeat start

  Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online

  Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): online

  Resource Group: group_1

  IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha1

  Filesystem_2 (ocf::heartbeat:Filesystem): Started ha1

  mysqld_3 (lsb:mysqld): Started ha1

  资源已经切回到ha1,对其进行操作。

  [root@ha1 ~]# /usr/local/mysql/bin/mysql -u root -p -D test

  Enter password:

  Welcome to the MySQL monitor. Commands end with ; or \g.

  Your MySQL connection id is 3

  Server version: 5.1.31-log MySQL Community Server (GPL)

  Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

  mysql> insert into t values(2);

  mysql> insert into t values(3);

  mysql> select * from t;

  +------+

  | t1 |

  +------+

  | 1 |

  | 2 |

  | 3 |

  +------+

  3 rows in set (0.00 sec)

  在slave 查看是否同步

  mysql> select * from t;

  +------+

  | t1 |

  +------+

  | 1 |

  | 2 |

  | 3 |

  +------+

  3 rows in set (0.00 sec)

  mysql> show slave status\G;

  *************************** 1. row ***************************

  Slave_IO_State:Waiting for master to send event

  Master_Host: 10.0.0.180

  Master_User: slave162

  Master_Port: 3306

  Connect_Retry: 60

  Master_Log_File: master.000003

  Read_Master_Log_Pos: 278

  Relay_Log_File: mysql2-relay-bin.000023

  Relay_Log_Pos: 420

  Relay_Master_Log_File: master.000003

  Slave_IO_Running: Yes

  Slave_SQL_Running: Yes

  Master 切换后,slave 无需任何人工介入,自动同步。

  6.5 模拟写入,切换测试。

  写个简单的循环,

  [root@mysql2 cron]# cat test.sh

  for ((num=1;num<10000000;num=num+1))

  do

  echo

  /usr/local/mysql/bin/mysql -u sg -psg109504 -h 10.0.0.180 -D test -e"

  insert into t values($num);

  "

  if (( $? )); then

  echo $num:no

  else

  echo $num:ok

  fi

  done

  在写入过程中,手工切换,手工reboot 等操作,和6.4 的测试结果一样。在写入过程中强制ha1 断电,需要手工处理slave 同步。将ha1 断电后,资源切到ha2 后,在slave 会报错:

  mysql> show slave status\G;

  *************************** 1. row ***************************

  Slave_IO_State:

  Master_Host: 10.0.0.180

  Master_User: slave162

  Master_Port: 3306

  Connect_Retry: 60

  Master_Log_File: master.000023

  Read_Master_Log_Pos: 657144

  Relay_Log_File: mysql2-relay-bin.000013

  Relay_Log_Pos: 657286

  Relay_Master_Log_File: master.000023

  Slave_IO_Running: No

  Slave_SQL_Running: Yes

  查看slave 报错日志:

  090225 16:54:51 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log

  'master.000023' at postion 657144

  090225 16:54:51 [ERROR] Error reading packet from server: Client requested master to start

  replication from impossible position ( server_errno=1236)

  090225 16:54:51 [ERROR] Got fatal error 1236: 'Client requested master to start replication from

  impossible position' from master when reading data from binary log

  090225 16:54:51 [Note] Slave I/O thread exiting, read up to log 'master.000023', position 657144

  在master.000023 中找不到position 657144。

  查看master 的binlog。

  [root@ha2 data]# /usr/local/mysql/bin/mysqlbinlog master.000023|tail -10

  /*!*/;

  # at 610152

  #090225 16:50:52 server id 180 end_log_pos 610241 Query thread_id=6871

  exec_time=0 error_code=0

  SET TIMESTAMP=1235551852/*!*/;

  insert into t values(6867)

  /*!*/;

  DELIMITER ;

  # End of log file

  ROLLBACK /* added by mysqlbinlog */;

  /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

  在master 上,master.000023 最后值是610152,而在slave 却是657144。Slave 的Log_Pos比master 大,应该是在断电的时候,binlog 有部分没有及时写入磁盘,但从库已经读入。我们需要在slave 执行change maseter 操作,MASTER_LOG_FILE 在原来的基础上加1,Log_Pos 为1。

  mysql> stop slave;

  Query OK, 0 rows affected (0.00 sec)

  mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.180',MASTER_PORT =

  3306,MASTER_USER

  ='slave162',MASTER_PASSWORD='nslave162',MASTER_LOG_FILE='master.000024',MAST

  ER_LOG_POS = 000000001;

  Query OK, 0 rows affected (0.00 sec)

  mysql> start slave;

  Query OK, 0 rows affected (0.00 sec)

  mysql> show slave status\G;

  *************************** 1. row ***************************

  Slave_IO_State:Waiting for master to send event

  Master_Host: 10.0.0.180

  Master_User: slave162

  Master_Port: 3306

  Connect_Retry: 60

  Master_Log_File: master.000024

  Read_Master_Log_Pos: 106

  Relay_Log_File: mysql2-relay-bin.000003

  Relay_Log_Pos: 248

  Relay_Master_Log_File: master.000025

  Slave_IO_Running: Yes

  Slave_SQL_Running: Yes

  在这次断电的切换过程中,将会有657144 减去610152 条binlog 操作在slave 存在,而在masert 没有。在后面的复制中可能会产生错误,不过这些错误影响不大。如果对数据的同步要求很严格,可以在主库添加参数sync_binlog=1,这样最多将只会有1 条数据错误,不过这样将影响mysql 写性能。

0
相关文章