O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Building a High-Availability MySQL Cluster

by Kris Buytaert
02/16/2006

Today's enterprise depends on the availability of mail and web services. Failure is never far away, whether it be a hardware failure or a human error. We have to try to make an infrastructure as highly available as possible.

When building highly available clusters, people often choose one extra physical machine per service, creating an A-B, fail-over schema. With static websites, there is no problem making the application highly available; you can just store the data in two places. However, the moment you add a database to your environment, things start to become more difficult. The easy way out is to move the database to a different machine and move that server into a SEP field.

That's not how we do it. In the old days when sites became too heavily loaded, we used MySQL replication to create multiple read-only copies of the database, which LVS load balanced. This, however, meant that we had to modify the application itself so that it could only write to the master node.

Later, many people tried to create a replication environment that implemented high availability. All of them struggled with the problem that they couldn't exactly define where a node failed, so it was possible to lose records. Also, recovering from a failover seemed to be a difficult task.

In late 2004 and early 2005, MySQL announced MySQL Cluster--the NDB storage engine from MySQL that plugs into the MySQL environment. With this cluster environment, I built a highly redundant environment. This article discusses how to do it yourself.

Hold on before you decide to implement MySQL cluster, as it certainly doesn't suit all jobs yet. The MySQL NDB engine currently runs its database completely in memory. This means that you have to be able to fit your database in memory. If you have a 1GB dataset and you want to have your data spread over two nodes, you need 1GB of memory per node. If you have four nodes, you can deal with 512MB per node. The first thing that becomes clear is that you have to look at your dataset. Giant data-warehousing databases won't fit into the MySQL NDB engine yet. Today, you might need to look at commercial alternatives such as Continuent, the former Emic Networks.

Secondly, this article is about building a truly available MySQL cluster, which means it focuses on high availability, not on high throughput. If your database is suffering from too-high loads, you still might want to look back at replication in combination with LVS.

The MySQL NDB storage engine consists of different parts:

Standard MySQL clients connect to the SQL node and won't notice the difference between a MyISAM or InnoDB query, so there is no need to change the API.

In order to achieve high availability, you need at least three nodes: one management node, and two different replica nodes for ndbd.

Installing MySQL-HA-Cluster

Initially, MySQL didn't provide the NDB build in RPM form, but they did include source RPMs so you could build your own. These days, MySQL kindly has split the different binaries into different packages: MySQL-client-4.1.12-1.i386.rpm, MySQL-devel-4.1.12-1.i386.rpm, MySQL-ndb-extra-4.1.12-1.i386.rpm, MySQL-ndb-management-4.1.12-1.i386.rpm, MySQL-ndb-storage-4.1.12-1.i386.rpm, MySQL-ndb-tools-4.1.12-1.i386.rpm, and MySQL-server-4.1.12-1.i386.rpm.

Install MySQL-ndb-management on the management node. Install MySQL-ndb-storage, MySQL-ndb-tools, MySQL-NDB-extra, and the usual MySQL-client and MySQL-server on the database nodes (here, DB-A and DB-B).

On the management node, the configuration file (config.ini) resides in the directory /var/lib/mysql/mysql-cluster:

[ndbd default]
NoOfReplicas= 2 

[mgm]
HostName= 172.16.32.1

[ndbd]
HeartbeatIntervalDbDb=600
HeartbeatIntervalDbApi=600
HostName= 172.16.32.7
DataDir: /var/lib/mysql/mysql-cluster/
DataMemory = 100M
IndexMemory = 50M
MaxNoOfConcurrentTransactions = 500
MaxNoOfConcurrentOperations = 250000
TimeBetweenWatchDogCheck = 40000
MaxNoOfOrderedIndexes = 27000
MaxNoOfTables = 9000
MaxNoOfAttributes = 25000

[ndbd]
HeartbeatIntervalDbDb=600
HeartbeatIntervalDbApi=600
HostName = 172.16.32.107
DataDir: /var/lib/mysql/mysql-cluster/
DataMemory = 100M
IndexMemory = 50M
MaxNoOfConcurrentTransactions = 50
MaxNoOfConcurrentOperations = 250000
TimeBetweenWatchDogCheck = 40000
MaxNoOfOrderedIndexes = 27000
MaxNoOfTables = 9000
MaxNoOfAttributes = 25000

[mysqld]
[mysqld]
[mysqld]

This is the main config file that you need to get MySQL NDB cluster up and running. In this version, ndbd has two replicas. The management server listens on 172.16.32.1. In my work project, our current needs don't exceed a 100MB data segment and a 50MB index memory.

The three unnamed [mysqld] sections refer to the two mysqld processes on the different DBs and a possible third mysqld engine.

With this config in place, you can start the ndb_mgmd for other nodes to connect to.

Configuring the NDB nodes

Both data nodes have a file called Ndb.cfg in /var/lib/mysql/mysql-cluster (also symlinked to /var/lib/mysql/):

DB-A:/var/lib/mysql # cat Ndb.cfg
host=172.16.32.1:1186

The first time you start ndbd, run it with the --initial parameter.

Running ndb_mgm now should show two active ndb nodes:

DB-A:/var/lib/mysql # ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 172.16.32.1:1186
Cluster Configuration
---------------------
[ndbd(NDB)]   2 node(s)
id=2  @172.16.32.7 (Version: 4.1.12, Nodegroup: 0)
id=3  @172.16.32.107 (Version: 4.1.12, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 1 node(s)
id=1  @172.16.32.1 (Version: 4.1.12)

If you can see this, you've accomplished the first part of the mission. Congratulations! You now have a working NDB engine cluster.

Configuring the SQL Nodes

Now you need to tell the MySQL SQL node that it can use the NDB engine. Modify the MySQL config file my.cnf in /etc/ to add the parameter ndbcluster in the [mysqld] section. Upon restarting mysqld, the output of ndb_mgm should have some additional lines:

[mysqld(API)]  3 node(s)
id=4  @172.16.32.7 (Version: 4.1.12)
id=5  @172.16.32.107 (Version: 4.1.12)
id=6 (not connected, accepting connect from any host)

Now test the actual cluster database and create a table with the option engine=Ndb;:

mysql> create table blah (i int, a char) engine NDB;
Query OK, 0 rows affected (0.46 sec)

Then on node DB-B, watch the table show up:

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| blah           |
+----------------+
1 row in set (0.03 sec)

On node DB-A, insert a value into the table:

mysql> insert into blah values (1,"x");
Query OK, 1 row affected (0.01 sec)

mysql> select * from blah;
+------+------+
| i    | a    |
+------+------+
|    1 | x    |
+------+------+
1 row in set (0.01 sec)

... and watch it appear in the same table on the second node (DB-B):

mysql> select * from blah;
Empty set (0.08 sec)

mysql> select * from blah;
+------+------+
| i    | a    |
+------+------+
|    1 | x    |
+------+------+
1 row in set (0.00 sec)

As you can see, modifying a record on one node makes it automagically appear on the other node, as well. This actually works both directions, so it really doesn't matter which node you connect to.

When querying the database, the SQL node will contact the management node to see which engine it should query. If one of the NDB engine nodes disappears, the management node will direct the SQL node to another. That's why you need to have the management node running on a separate node; if the node where both your active NDB engine and your MGM node run crashes, your SQL engine has nowhere to go. You can restart or take down your management node for maintenance; as long as none of the other nodes goes down your applications will continue. Of course, you can always configure two management nodes.

Adding Heartbeat and Monitoring To the RealDBs

MySQL Cluster makes sure your database stays up, but what if the mysqld you are talking to crashes? What if the physical machine you are talking to crashes? What if your network connection goes down? In these cases, your application, which used to talk to the mysql daemon on DB-A, doesn't know what to do anymore. That's why my team added the mon-plus-Heartbeat combination to our setup.

Heartbeat will take care of monitoring both nodes and making sure that the active node has the correct application address (for example, the address of your Java applications).

Setting up Heartbeat on cluster nodes means configuring three files. In /etc/ha.d:

$ cat authkeys
auth 1
1 md5 WorldDominationFast!

$ more haresources
DB-A    172.17.0.103/24/eth0:1

DB-A:/etc/ha.d # more ha.cf
logfacility   local0
keepalive 2
deadtime 10
warntime 5
initdead 120
nice_failback off
udpport 694
bcast eth0
node DB-A
node DB-B

Make sure that files both exist and are identical on both nodes.

Heartbeat does not monitor the state of the application. You can still be able to ping a DB, but its mysqld might be completely dysfunctional. That's why we added mon to our setup.

The next change is to mon.cf in /etc/mon/:

hostgroup mysql-node 127.0.0.1

#
# watch definitions
watch mysql-node
    service mysql
        interval 10s
        monitor msql-mysql.monitor --mode mysql --database=your database \
            --password=yourpasswordgoeshere
        period wd {Mon-Sun}
            alert stop-heartbeat.alert

As this config snippet shows, mon uses its msql-mysql.monitor script to see if it can actually still connect to the database. It also periodically checks that it can still do a show tables on the database on the node on which it is running. The moment that fails, it will run stop-heartbeat.alert.

When mon decides it can't connect to MySQL anymore, it triggers Heartbeat to move the active IP to the second node, where another MySQL instance is happily running and can access identical data to that of the prior node.

With this setup, we created a highly available MySQL cluster with no single point of failure.

Kris Buytaert is a Linux and open source consultant operating in the Benelux. He currently maintains the openMosix HOWTO.


Return to O'Reilly Databases

Copyright © 2009 O'Reilly Media, Inc.