6、失败测试
6.1 网络切换测试
将ha1 的网线拔掉,在ha2 看资源的切换情况
Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online
Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): OFFLINE
Resource Group: group_1
IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha2
Filesystem_2 (ocf::heartbeat:Filesystem): Started ha2
mysqld_3 (lsb:mysqld): Started ha2
没有问题,把ha1 网线插上,资源又回切到ha1。
Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online
Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): online
Resource Group: group_1
IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha1
Filesystem_2 (ocf::heartbeat:Filesystem): Started ha1
mysqld_3 (lsb:mysqld): Started ha1
6.2 宕机切换
将ha1 强制关机在开机,这和6.2 的情况差不多,能正常切换。
6.3 服务切换
将ha1 的mysql 服务强制中断,修改/etc/my.cf
datadir=/u09/data #实际这个路径不存在
[root@ha1 ~]# ps -ef|grep mysql
root 4597 1 0 10:44 ? 00:00:00 /bin/sh ./bin/mysqld_safe
--datadir=/u01/data --pid-file=/u01/data/ha1.pid
mysql 4760 4597 0 10:44 ? 00:00:00 /usr/local/mysql/bin/mysqld
--basedir=/usr/local/mysql --datadir=/u01/data --user=mysql --log-error=/u01/data/ha1.err
--pid-file=/u01/data/ha1.pid --socket=/tmp/mysql.sock --port=3306
root 13174 4285 0 11:22 ? 00:00:00 /bin/sh /etc/init.d/mysqld start
root 13192 4208 0 11:22 pts/0 00:00:00 grep mysql
[root@ha1 ~]# kill 4597 4760
[root@ha1 ~]# ps -ef|grep mysql
root 13532 4208 0 11:23 pts/0 00:00:00 grep mysql
在ha2 上查看资源情况,已经切换过来。
Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online
Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): online
Resource Group: group_1
IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha2
Filesystem_2 (ocf::heartbeat:Filesystem): Started ha2
mysqld_3 (lsb:mysqld): Started ha2
Failed actions:
mysqld_3_monitor_120000 (node=ha1, call=10, rc=1): complete
mysqld_3_start_0 (node=ha1, call=12, rc=1): complete
6.4 slave 同步测试:
将ha1 的heartbeat 关闭。
[root@ha1 data]# /etc/init.d/heartbeat stop
在ha2 看资源情况。
Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online
Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): OFFLINE
Resource Group: group_1
IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha2
Filesystem_2 (ocf::heartbeat:Filesystem): Started ha2
mysqld_3 (lsb:mysqld): Started ha2
资源全部切换到ha2,我们执行一些操作,看是否同步。
[root@ha2 ~]# /usr/local/mysql/bin/mysql -u root -p -D test
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 30
Server version: 5.1.31-log MySQL Community Server (GPL)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
root@test >create table t(t1 int);
root@test >insert into t values(1);
root@test >select * from t;
+------+
| t1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
在slave 查看是否同步
[root@mysql2 ~]# /root/cron/lg.sh
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.1.31-log MySQL Community Server (GPL)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> use test
Database changed
mysql> select * from t;
+------+
| t1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
在将资源切回到ha1,在进行操作,看slave 的情况
[root@ha1 ~]# /etc/init.d/heartbeat start
Node: ha2 (0c267980-de77-4421-bcb5-9bb6b0743eef): online
Node: ha1 (ab30057d-03f6-4be8-a787-98c5fc7f4c64): online
Resource Group: group_1
IPaddr_10_0_0_180 (ocf::heartbeat:IPaddr): Started ha1
Filesystem_2 (ocf::heartbeat:Filesystem): Started ha1
mysqld_3 (lsb:mysqld): Started ha1
资源已经切回到ha1,对其进行操作。
[root@ha1 ~]# /usr/local/mysql/bin/mysql -u root -p -D test
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.31-log MySQL Community Server (GPL)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> insert into t values(2);
mysql> insert into t values(3);
mysql> select * from t;
+------+
| t1 |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)
在slave 查看是否同步
mysql> select * from t;
+------+
| t1 |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State:Waiting for master to send event
Master_Host: 10.0.0.180
Master_User: slave162
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master.000003
Read_Master_Log_Pos: 278
Relay_Log_File: mysql2-relay-bin.000023
Relay_Log_Pos: 420
Relay_Master_Log_File: master.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master 切换后,slave 无需任何人工介入,自动同步。
6.5 模拟写入,切换测试。
写个简单的循环,
[root@mysql2 cron]# cat test.sh
for ((num=1;num<10000000;num=num+1))
do
echo
/usr/local/mysql/bin/mysql -u sg -psg109504 -h 10.0.0.180 -D test -e"
insert into t values($num);
"
if (( $? )); then
echo $num:no
else
echo $num:ok
fi
done
在写入过程中,手工切换,手工reboot 等操作,和6.4 的测试结果一样。在写入过程中强制ha1 断电,需要手工处理slave 同步。将ha1 断电后,资源切到ha2 后,在slave 会报错:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.0.0.180
Master_User: slave162
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master.000023
Read_Master_Log_Pos: 657144
Relay_Log_File: mysql2-relay-bin.000013
Relay_Log_Pos: 657286
Relay_Master_Log_File: master.000023
Slave_IO_Running: No
Slave_SQL_Running: Yes
查看slave 报错日志:
090225 16:54:51 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log
'master.000023' at postion 657144
090225 16:54:51 [ERROR] Error reading packet from server: Client requested master to start
replication from impossible position ( server_errno=1236)
090225 16:54:51 [ERROR] Got fatal error 1236: 'Client requested master to start replication from
impossible position' from master when reading data from binary log
090225 16:54:51 [Note] Slave I/O thread exiting, read up to log 'master.000023', position 657144
在master.000023 中找不到position 657144。
查看master 的binlog。
[root@ha2 data]# /usr/local/mysql/bin/mysqlbinlog master.000023|tail -10
/*!*/;
# at 610152
#090225 16:50:52 server id 180 end_log_pos 610241 Query thread_id=6871
exec_time=0 error_code=0
SET TIMESTAMP=1235551852/*!*/;
insert into t values(6867)
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
在master 上,master.000023 最后值是610152,而在slave 却是657144。Slave 的Log_Pos比master 大,应该是在断电的时候,binlog 有部分没有及时写入磁盘,但从库已经读入。我们需要在slave 执行change maseter 操作,MASTER_LOG_FILE 在原来的基础上加1,Log_Pos 为1。
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.180',MASTER_PORT =
3306,MASTER_USER
='slave162',MASTER_PASSWORD='nslave162',MASTER_LOG_FILE='master.000024',MAST
ER_LOG_POS = 000000001;
Query OK, 0 rows affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State:Waiting for master to send event
Master_Host: 10.0.0.180
Master_User: slave162
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master.000024
Read_Master_Log_Pos: 106
Relay_Log_File: mysql2-relay-bin.000003
Relay_Log_Pos: 248
Relay_Master_Log_File: master.000025
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
在这次断电的切换过程中,将会有657144 减去610152 条binlog 操作在slave 存在,而在masert 没有。在后面的复制中可能会产生错误,不过这些错误影响不大。如果对数据的同步要求很严格,可以在主库添加参数sync_binlog=1,这样最多将只会有1 条数据错误,不过这样将影响mysql 写性能。