One of the most exciting features introduced in MySQL 5 is the federated engine. The ability to access data from a remote server without the constraints of replication tickles every programmer's fancy.
Unfortunately, as of today, the documentation is not quite as detailed as I would like. I have the feeling that the federated engine has been somehow neglected. This article comes from my personal experience with this engine.
This information is also available in the MySQL manual, but for the sake of completeness, I want to say a few words on the basics.
Federated tables are tables with storage in a remote server. When defining a table with this engine, you refer to another table in a different server, using the same structure. The only thing that resides in your local server is the definition of the table, which must be identical to the remote one, except for the engine specification.
On the remote server, you have:
CREATE TABLE `City` (
`ID` int(11) NOT NULL auto_increment,
`Name` char(35) NOT NULL default '',
`CountryCode` char(3) NOT NULL default '',
`District` char(20) NOT NULL default '',
`Population` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`)
) ENGINE=MyISAM;
To use that table in your local server, enter the description:
CREATE TABLE `City` (
`ID` int(11) NOT NULL auto_increment,
`Name` char(35) NOT NULL default '',
`CountryCode` char(3) NOT NULL default '',
`District` char(20) NOT NULL default '',
`Population` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`)
) ENGINE = FEDERATED
connection='mysql://user:pass@remote.com:3306/world/City';
Here, user and pass are valid credentials to
access the table City in the database world on server
remote.com.
With that done, you can query your federated table as if it were in your local server. Issue a query and get a record set.
select * from City where ID = 1;
+----+-------+-------------+----------+------------+
| ID | Name | CountryCode | District | Population |
+----+-------+-------------+----------+------------+
| 1 | Kabul | AFG | Kabol | 1780000 |
+----+-------+-------------+----------+------------+
There are a few limitations concerning federated engine usage, namely:
ALTER TABLE commands on a federated
table.This is basically what the official docs say, no more and no less. Beyond that, it's all unexplored land.
Because you won't have the reassuring backing from the manual in this matter, you should test these features for yourself. Because you need at least two instances of a MySQL server to use a federated table, here is a quick recipe to get the job done. Download the MySQL Sandbox and, from the command line, execute:
$ ./express_install.pl /opt/mysql/5.0.22/ -P 5001 -d server1 \
-c log=general.log -c log-slow-queries=slow_queries.log \
-c long_query_time=1 -c log-queries-not-using-indexes
$ ./express_install.pl /opt/mysql/5.0.22/ -P 5002 -d server2 \
-c log=general.log -c log-slow-queries=slow_queries.log \
-c long_query_time=1 -c log-queries-not-using-indexes
If you want to know more about the Sandbox, you can refer to the MySQL Sandbox tutorial. If you don't work in a Unix environment, you have to install it manually, following MySQL Windows server installation official guidelines.
The previous commands will create two directories under your
$HOME, server1 and server2, each of which
contains a data directory and a few bash scripts to start, stop, and use the
instance. Both servers will have the general log and slow query log enabled, so you can examine what happens when the federated engine exchanges data between them.
Now start them with their appropriate commands:
$ ~/server1/start.sh
$ ~/server2/start.sh
To access the first server, use the command:
$ mysql -h 127.0.0.1 -u datacharmer -pdatacharmer -P 5001
Or just:
$ ~/server1/use.sh
For the second server, change the port to 5002:
$ mysql -h 127.0.0.1 -u datacharmer -pdatacharmer -P 5002
To make things easier to detect, create a user on server1, which server2 will use to access the federated tables.
$ echo \
"grant select,execute on test.* to server2usr identified by 'datacharmer'" \
| ~/server1/use.sh -u root
Now you're ready to get your hands dirty.
|
Your first task is to create two simple tables on server1, one with a key and one without. The reason for these two tables will become apparent later.
create database if not exists test;
use test;
drop table if exists tkey;
create table tkey (
id int not null auto_increment primary key,
word varchar(30) not null,
key (word)
);
drop table if exists tnokey;
create table tnokey (
id int not null primary key,
word varchar(30) not null
);
load data local infile '/usr/share/dict/words'
into table tkey (word);
insert into tnokey select * from tkey;
select count(*) from tkey;
select count(*) from tnokey;
Save this script to fed1.mysql and feed it to server1:
$ ~/server1/use.sh -t < fed1.mysql
+----------+
| count(*) |
+----------+
| 518584 |
+----------+
+----------+
| count(*) |
+----------+
| 518584 |
+----------+
(These numbers may be different in your system, depending on the size of your dictionary.)
On server2, create two federated tables that point to the ones just built on server1.
drop database if exists test;
create database if not exists test;
use test;
drop table if exists tkey_fed;
create table tkey_fed (
id int not null auto_increment primary key,
word varchar(30) not null,
key (word)
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey';
drop table if exists tnokey_fed;
create table tnokey_fed (
id int not null primary key,
word varchar(30) not null
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tnokey';
Save this to fed2.mysql and load it:
$ ~/server2/use.sh -t < fed2.mysql
If you don't get any output, your federated tables were created without problems. If you get any errors, go back to the previous section and make sure that you have done everything, including creating the appropriate user on server1.
|
Now it's time to examine the first reaction from server1 by perusing the general log:
$ tail ~/server1/data/general.log
...
060604 18:22:14 7 Connect server2usr@localhost on test
7 Query SELECT * FROM `tkey` WHERE 1=0
7 Quit
8 Connect server2usr@localhost on test
8 Query SELECT * FROM `tnokey` WHERE 1=0
8 Quit
Rule number 1 of the Federated Missing Manual: The receiving server does not know that it's being federated.
I will show you more evidence for this rule later, but for now it's enough
to note that, when creating a table, the second server issues a query to the
first server with a WHERE clause that will return an empty set.
That's because it needs to be sure that the target table exists.
Notice also that the connection closes immediately. Now proceed with...
Rule number 2: The creation of a federated table is an inexpensive operation.
It's one query on the linking server, plus one quick query on the linked server. That's enough evidence to state that a federated table does not require lots of resources, at least during its creation.
Your first operation on server2 will just check that the table exists.
server2 (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| tkey_fed |
| tnokey_fed |
+----------------+
This simple operation will be visible on the linked server.
$ tail ~/server1/data/general.log
...
060610 23:07:02 6 Connect server2usr@localhost on test
7 Connect server2usr@localhost on test
There are two connections, one for each federated table. Is it possible to confirm this by other means?
server1 (test) > show processlist;
+----+-------------+-----------------+------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+------+-------+------------------+
| 6 | server2usr | localhost:46443 | test | Sleep | 298 | NULL | NULL |
| 7 | server2usr | localhost:46444 | test | Sleep | 298 | NULL | NULL |
| 8 | datacharmer | localhost | test | Query | 0 | NULL | show processlist |
+----+-------------+-----------------+------+---------+------+-------+------------------+
There are indeed two connections from server2usr, which is the user
that appears in the federated table connection string. Now, go one step
further by adding another federated table on server2.
server2 (test) > create table test1 like tkey_fed;
Query OK, 0 rows affected (0.00 sec)
server2 (test) > show create table test1\G
*************************** 1. row ***************************
Table: test1
Create Table: CREATE TABLE `test1` (
`id` int(11) NOT NULL auto_increment,
`word` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `word` (`word`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1
CONNECTION='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey'
1 row in set (0.01 sec)
server2 (test) > show create table tkey_fed\G
*************************** 1. row ***************************
Table: tkey_fed
Create Table: CREATE TABLE `tkey_fed` (
`id` int(11) NOT NULL auto_increment,
`word` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `word` (`word`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1
CONNECTION='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey'
1 row in set (0.00 sec)
Now there are three federated tables on server2: two pointing to
tkey and one pointing to tnokey. The general log will
show the usual inexpensive query, but the surprise comes when checking the
process list again:
server1 (test) > show processlist;
+----+-------------+-----------------+------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+------+-------+------------------+
| 7 | server2usr | localhost:46444 | test | Sleep | 869 | NULL | NULL |
| 8 | datacharmer | localhost | test | Query | 0 | NULL | show processlist |
| 10 | server2usr | localhost:56715 | test | Sleep | 272 | NULL | NULL |
| 11 | server2usr | localhost:60575 | test | Sleep | 265 | NULL | NULL |
+----+-------------+-----------------+------+---------+------+-------+------------------+
There are three connections, for just two tables that are federated by another server, which leads to...
Rule number 3: Each federated table on a remote server will consume one connection, even if there are several remote tables pointing to the same local table.
While the creation is inexpensive, the actual usage starts to look a bit more onerous. If you plan to federate many tables from several servers, prepare to increase your server's connections.
Developer's angle: There will be some improvements. MySQL is already planning to reduce the number of open connections.
Speaking of connections, here's a good chance to try yet another experiment. What happens to server2's tables if you stop server1?
$ ~/server1/stop.sh
server2 (test) > SELECT table_name, engine
-> FROM information_schema.tables
-> WHERE table_schema=schema();
ERROR 1430 (HY000): There was a problem processing the query on the
foreign data source. Data source error:
%The foreign data source you are trying to reference does not exist.
Data source error: : 2003 : Can't connect to MySQL server on '127.0.0.1' (111)
server2 (test) > SELECT count(*)
-> FROM information_schema.tables;
ERROR 1430 (HY000): There was a problem processing the query on the
foreign data source. Data source error:
%The foreign data source you are trying to reference does not exist.
Data source error: : 2003 : Can't connect to MySQL server on '127.0.0.1' (111)
What happened? These queries don't use the federated tables (at least, that's what you might have thought while asking this simple question about table metadata). However, something is complaining about not being able to connect.
Restart server1 and repeat the operation.
$ ~/server1/start.sh
server2 (test) > SELECT table_name, engine
-> FROM information_schema.tables
-> WHERE table_schema=schema();
+------------+-----------+
| table_name | engine |
+------------+-----------+
| test1 | FEDERATED |
| tkey_fed | FEDERATED |
| tnokey_fed | FEDERATED |
+------------+-----------+
The general log reveals that there was a request sent to the linked server. Actually, there is one request for each table:
$ tail ~/server1/data/general.log
Time Id Command Argument
060610 23:43:33 1 Connect server2usr@localhost on test
1 Query SET NAMES latin1
1 Query SHOW TABLE STATUS LIKE 'tkey'
2 Connect server2usr@localhost on test
2 Query SET NAMES latin1
2 Query SHOW TABLE STATUS LIKE 'tkey'
3 Connect server2usr@localhost on test
3 Query SET NAMES latin1
3 Query SHOW TABLE STATUS LIKE 'tnokey'
Rule number 4: Each INFORMATION_SCHEMA request related to a federated table involves a connection.
This rule looks like a bug, and perhaps it is. In the meantime, keep it in
mind. If you are working with federated tables, be careful with your
INFORMATION_SCHEMA requests.
Developer's angle: Unfortunately, it's a feature. It's the only way to get information about the table. For now, there's no way to avoid this problem.
Because federated tables strictly depend on active connections, it's a good idea to check whether the data is reachable before using a table. Here's a function that does just that, by attempting to query the information schema about a federated table and immediately checking whether it caught SQLSTATE 'HY000'.
CREATE FUNCTION fed_table_available(
p_table_schema varchar(50),
p_table_name varchar(50) )
RETURNS BOOLEAN
BEGIN
DECLARE connection_dead BOOLEAN DEFAULT FALSE;
DECLARE how_many int;
DECLARE curx CURSOR FOR
SELECT COUNT(*)
FROM information_schema.TABLES
WHERE
TABLE_SCHEMA = p_table_schema
AND TABLE_NAME = p_table_name;
DECLARE CONTINUE HANDLER FOR SQLSTATE 'HY000'
SET connection_dead = TRUE;
OPEN curx;
IF connection_dead THEN
RETURN FALSE;
ELSE
FETCH curx into how_many;
CLOSE curx;
RETURN (how_many > 0);
END IF;
END
Using this method, your application can exit gracefully if a federated table is not available.
$ ~/server1/stop.sh
server2 (test) > select fed_table_available(schema(), 'tkey_fed');
+--------------------------------------------+
| fed_table_available( schema(), 'tkey_fed') |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
1 row in set (0.05 sec)
$ ~/server1/start.sh
server2 (test) > select fed_table_available(schema(), 'tkey_fed');
+--------------------------------------------+
| fed_table_available( schema(), 'tkey_fed') |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.05 sec)
|
Now that you know something more about the table connections, you can consider the interesting things, such as how the system performs when you actually use the data, which is the purpose of the whole exercise.
Start by issuing a simple query, searching for a word that you may reasonably expect to find in the table.
server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.08 sec)
The target server will dutifully execute exactly what you asked, as the general log shows.
$ tail ~/server1/data/general.log
060612 13:37:18 4 Query SHOW TABLE STATUS LIKE 'tkey'
4 Query SELECT `id`, `word` FROM `tkey` WHERE (`word` = 'condor')
Now try something a tad more ambitious:
server2 (test) > select * from tkey_fed where word like 'condor%';
+-------+-----------+
| id | word |
+-------+-----------+
| 91569 | condor |
| 91570 | condorcet |
| 91571 | condores |
| 91572 | condors |
+-------+-----------+
4 rows in set (0.09 sec)
...and the general log shows:
060612 13:41:47 4 Query SHOW TABLE STATUS LIKE 'tkey'
4 Query SELECT `id`, `word` FROM `tkey` WHERE
( (`word` >= 'condor\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0') )
AND ( (`word` <= 'condor') )
Surprise, surprise! What's this? This query looks wrong, even though the result is as you expected. There must be something more than that. In fact, if you capture the previous query through a packet sniffer, you will see the truth:
0000 a6 00 00 00 03 53 45 4c 45 43 54 20 60 69 64 60 .....SELECT `id`
0010 2c 20 60 77 6f 72 64 60 20 46 52 4f 4d 20 60 74 , `word` FROM `t
0020 6b 65 79 60 20 57 48 45 52 45 20 20 28 20 28 60 key` WHERE ( (`
0030 77 6f 72 64 60 20 3e 3d 20 27 63 6f 6e 64 6f 72 word` >= 'condor
0040 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 \0\0\0\0\0\0\0\0
0050 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 \0\0\0\0\0\0\0\0
0060 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 \0\0\0\0\0\0\0\0
0070 27 29 20 29 20 41 4e 44 20 28 20 28 60 77 6f 72 ') ) AND ( (`wor
0080 64 60 20 3c 3d 20 27 63 6f 6e 64 6f 72 ff ff ff d` <= 'condor...
0090 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00a0 ff ff ff ff ff 27 29 20 29 20 .....') )
That's better, as odd as it may look. The server will find the string
condor, followed by any number of characters between
0x00 and 0xff (decimal 255). There might be some
issue for Unicode support, but leaving that aside for now, it's a reasonably
efficient query.
It's time to examine the ugly part. What happens when you issue the same query against the table without an index?
server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (2.07 sec)
You likely expected the query to be a little slower, but surely not that slow! Inspecting the logs reveals more:
$ tail ~/server1/data/general.log
060612 13:57:33 5 Connect server2usr@localhost on test
5 Query SET NAMES latin1
5 Query SHOW TABLE STATUS LIKE 'tnokey'
5 Query SELECT `id`, `word` FROM `tnokey`
$ tail ~/server1/data/slow_queries.log
# Time: 060612 13:57:35
# User@Host: server2usr[server2usr] @ localhost [127.0.0.1]
# Query_time: 2 Lock_time: 0 Rows_sent: 518584 Rows_examined: 518584
SELECT `id`, `word` FROM `tnokey`;
That's bad! The WHERE clause was irrelevant! As the slow
queries log shows, the federated engine sent half a million rows across the
network, leaving the burden of filtering it to the client. It was obvious that
this query would trigger a full table scan (because there is no index on that
column), and although you have to live with that, sending all the records across
the network is a burden that's best avoided. What can you do?
The truth is, for reasons I can't figure out, federated tables honor the
WHERE clause only if the linking table has an index. This looks
like a bug to me, but there is a workaround: rebuild the
federated table with a phantom key.
drop table if exists tnokey_fed; # remember: no ALTER TABLE support for federated tables
create table tnokey_fed (
id int not null primary key,
word varchar(30) not null,
KEY (word)
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tnokey';
Now test that assumption:
server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.64 sec)
This is much better. A table scan is slower than a direct index lookup (there are exceptions, but in this case the rule stays unchallenged). A look at the general log confirms this approach.
060612 14:12:10 7 Connect server2usr@localhost on test
7 Query SHOW TABLE STATUS LIKE 'tnokey'
7 Query SELECT `id`, `word` FROM `tnokey` WHERE (`word` = 'condor')
Thus, I can confidently state....
Rule number 5: The WHERE clause is honored only when using an index defined on the local server.
Do you want more evidence? Just rebuild tkey_fed without a key
for the word column, and the previous query will be as slow as the
initial one from tnokey_fed.
Developer's angle: The fix is to push down conditions for non-indexed columns, resulting in fewer table scans.
drop table if exists tkey_fed;
create table tkey_fed (
id int not null auto_increment primary key,
word varchar(30) not null
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey';
server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (1.57 sec)
$ tail ~/server1/data/general.log
060612 14:19:07 9 Connect server2usr@localhost on test
9 Query SHOW TABLE STATUS LIKE 'tkey'
9 Query SELECT `id`, `word` FROM `tkey`
Summing up on performance, the situation looks bad, but it's livable. Cope. Adding an index to a normal table is an expensive operation, but adding an index to a federated table is quite cheap, as stated in rule number two. Therefore, whenever you face a performance problem, you can cheat a bit.
Be aware that the index rule is quite crude. The federated engine will only
use (according to my tests) one column in the WHERE clause, even
if you create a compound index.
The notes so far apply also to joins. The federated engine will grab all the data it needs to perform a join, and this can easily lead to a disaster if you don't plan your queries very carefully.
This next test uses the MySQL world sample database on server1 and three federated tables on server2.
server2 (world) > select City.Name
-> from City INNER JOIN Country ON Country.Code=City.CountryCode
-> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name |
+----------------+
| Kabul |
| Qandahar |
| Herat |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.19 sec)
Remember that these are relatively small tables, with just a couple of
hundred records in Country and a few thousand in
City. If you have seen the world database before, you
know that its tables have just a primary key and no secondary indexes. The
general log will show some horrors behind the scenes.
$ tail ~/server1/data/general.log
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VUT')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VAT')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VEN')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'RUS')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VNM')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'EST')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'USA')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VIR')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'ZWE')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'PSE')
Here tail will not be much help, besides telling you that the
federated engine issued a bunch of queries from Country. You have
to dig deeper to find out how many.
$ grep 'FROM `Country`' ~/server1/data/general.log | wc -l
232
$ tail -n 236 ~/server1/data/general.log | head -n 6
Time Id Command Argument
19 Query SELECT `ID`, `Name`, `CountryCode`, `District`, `Population` FROM `City`
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'AFG')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'NLD')
First it asked for all records from City. That's
understandable because there is no key. Then, it created a query for each
CountryCode found in City: 232 queries. For a simple
join, it's quite expensive.
Once more, you can fix the problem by adding appropriate keys to both
tables. An index on City.CountryCode and one on
Country.Name will solve the problem.
server2 (world) > SELECT City.Name
-> FROM City INNER JOIN Country ON Country.Code=City.CountryCode
-> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name |
+----------------+
| Kabul |
| Qandahar |
| Herat |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.04 sec)
$ tail -n 3 ~/server1/data/general.log
25 Query SHOW TABLE STATUS LIKE 'Country'
25 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Name` = 'Afghanistan')
24 Query SELECT `ID`, `Name`, ... FROM `City` WHERE (`CountryCode` = 'AFG')
The outcome was just two queries, both using a WHERE clause.
Still, the engine is not smart enough to make one query only, but that's better
than nothing. Another improvement is to add a real index on
server1, although that is beside the point; the main problem in
this case is not the speed of record retrieval, but the amount of data passed
across the network.
Developer's angle: A bug fix is on its way for join performance.
Here comes the really bad news. No matter how much cheating, when using
COUNT, SUM, MAX, and all other aggregate
functions, the federated engine will read all of the records.
Consider again the first example, with half a million words. Try a seemingly harmless query:
server2 (test) > select count(*) from tkey_fed;
+----------+
| count(*) |
+----------+
| 518585 |
+----------+
1 row in set (2.37 sec)
If you think it's always a good idea to do a SELECT COUNT(*)
before attempting a potentially demanding query, think again.
This query took much longer than expected, and as always, the log inspection shows why:
$ tail ~/server1/data/general.log
060612 20:50:24 2 Query SHOW TABLE STATUS LIKE 'tkey'
2 Query SELECT `id`, `word` FROM `tkey`
Disaster! Five hundred thousand records went across the network just to get a count? There must be good reasons for such behavior, but from the user's point of view, it's really unacceptable. There is a workaround coming, but for now, it's enough to note....
Rule number 6: Aggregate functions with federated tables are extremely expensive.
However, don't lose heart just yet. Some relief is coming your way shortly.
|
So far, I have shown federated tables on a peer-to-peer basis; that is, as a link between two tables in two different MySQL 5.0 servers. That may be the most frequent case, but it does not have to be. There are other objects that you can federate, provided that they look like a MySQL 5.0 table.
One thing that looks like a MySQL 5.0 table (in the federated engine's limited view) is a MySQL 4.x or 3.23 table. Because the federated table course of action is currently quite crude, any table on the other side would do, provided that it belongs to a server that can answer to the federated engine basic requests.
If you don't have a 3.23 server hanging around, install one through the MySQL Sandbox:
$ ./express_install /opt/mysql/3.23.58 \
-d server3 -P 3230 -c log=general.log \
-c log-slow-queries -c log-long-format
Then, after loading the MySQL world sample
database on server3, make three federated tables from
server2, as you did for server1. (Don't forget to add
indexes for City.CountryCode and Country.Name.)
Now you can enjoy the unique pleasure of issuing a nested query on a couple of MySQL 3.23 tables and getting a meaningful result:
server2 (world) > SELECT *
-> FROM City
-> WHERE CountryCode = (
-> SELECT Code FROM Country WHERE Name = 'Afghanistan');
+----+----------------+-------------+----------+------------+
| ID | Name | CountryCode | District | Population |
+----+----------------+-------------+----------+------------+
| 1 | Kabul | AFG | Kabol | 1780000 |
| 2 | Qandahar | AFG | Qandahar | 237500 |
| 3 | Herat | AFG | Herat | 186800 |
| 4 | Mazar-e-Sharif | AFG | Balkh | 127800 |
+----+----------------+-------------+----------+------------+
4 rows in set (0.01 sec)
The federated engine did you a favor, turning your nested query into something that the old engine understands:
$ tail ~/server3/data/general.log
1 Connect server2usr@localhost on world
1 Query SHOW TABLE STATUS LIKE 'City'
2 Connect server2usr@localhost on world
2 Query SHOW TABLE STATUS LIKE 'Country'
2 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Name` = 'Afghanistan')
1 Query SELECT `ID`, `Name`, ... FROM `City` WHERE (`CountryCode` = 'AFG')
Rule number 7: You can federate any MySQL table, regardless of the server version.
Think about the possibilities. You can federate any existing MySQL table and use this link to suit several purposes. For example, you can consolidate data from several sources into a data warehouse, or you can just check from a centralized point whether all of your servers are working properly.
Another thing that looks like a table, from the particular angle of the federated engine, is a view. You can cheat the engine into linking to a view instead of a table, which opens a whole new world of opportunities.
Rule number 8: You can federate a view instead of a table.
For instance, you can mitigate the terrible performance of aggregating
functions by using views. First, create a view that simply counts
tkey records:
server1 (test) > CREATE VIEW tkey_count AS SELECT COUNT(*) AS how_many FROM tkey;
Query OK, 0 rows affected (0.00 sec)
Then, on the second server, federate that view as if it were a table:
create table tkey_count_fed
(how_many int)
ENGINE=FEDERATED
CONNECTION='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey_count';
Now try it. This time the count comes in no time at all.
server2 (test) > SELECT * FROM tkey_count_fed;
+----------+
| how_many |
+----------+
| 518584 |
+----------+
1 row in set (0.00 sec)
server2 (test) > INSERT INTO tkey_fed VALUES (NULL, 'this_was_not_there');
Query OK, 1 row affected (0.00 sec)
server2 (test) > SELECT * FROM tkey_count_fed;
+----------+
| how_many |
+----------+
| 518585 |
+----------+
1 row in set (0.00 sec)
Your satisfaction will be almost total when the general log shows that the request did not trigger any unpleasant side effects.
$ tail ~/server1/data/general.log
8 Query SHOW TABLE STATUS LIKE 'tkey_count'
8 Query SELECT `how_many` FROM `tkey_count`
060613 21:18:24 4 Query INSERT INTO `tkey` (id, word) VALUES (0, 'this_was_not_there')
060613 21:18:28 8 Query SHOW TABLE STATUS LIKE 'tkey_count'
8 Query SELECT `how_many` FROM `tkey_count`
Still, there is room for improvement. This solution does not allow a
WHERE clause for the COUNT query. All worries should
dissipate when you read the next section.
It's time to open the Wizardry department and give the federated engine a workout.
You can execute predefined functions on a remote server by implementing a simple framework based on a federated table, a view, and a few support tables.
On server1, create:
exec_table with just a record,input_table, in which to insert the wanted command and
its parameters,output_table, in which to report the result of the
command,exec_function that reads from the input table and executes the
request, writes the result to output_table, and returns 1, andexec_view that reads the only record from
exec_table and has a call to exec_function in its
WHERE clause.On server2, create:
input_table,output_table, andexec_view.The full implementation of the framework requires some more components to allow different users to execute remote commands concurrently, and some additions to allow the execution of arbitrary commands, but I won't burden you now with such details. I plan to describe the full framework in a separate article.
Rule number 9: You can exploit the federated engine to execute remote commands.
Finally, I should mention that there are plans for improving the federated engine in future releases of MySQL.
This knowledge comes from attending a presentation at the MySQL Users
Conference 2006, where Patrick Galbraith explained that the plan is to
implement a new servers table to hold the connection parameters
for a remote server. Unfortunately, this implementation won't allow a name for
the table being linked, which will have the same name as the local one. This
fact will make the usage of federated tables much harder in some situations
where you need to have, in the same database, a local table X and a link to table
X on one or more remote servers. I appeal to the MySQL developers to implement
this feature in the right way. For example, something along the lines of:
CREATE SERVER server_def (
host : 'my.host.mydomain.org',
port : 3306,
user : 'uname',
password : 'mysecret',
database : 'my_db_name'
);
CREATE TABLE xyz ( ... )
ENGINE = FEDERATED CONNECT TO 'table_name' USING server_def;
Such an enhancement would yield the clear syntax combined with the freedom of defining a table name.
Developer's angle: There will also be a foreign
tables system table, so this shouldn't be a problem
anymore.
Federated tables are one great feature in MySQL. With the right amount of documentation and test units, it can really make a difference, turning MySQL into a distributed DBMS.
Thanks to Roland Bouman and Beat Vontobel for reviewing this draft and
providing useful advice. In particular, I owe Roland for an improvement on the
fed_table_available function.
Thanks to Patrick Galbraith for answering some questions about the future development of this engine.
Giuseppe Maxia is a QA developer in MySQL community team. A system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years. He is a frequent speaker at open source events and he's the author of many articles. He lives in Sardinia (Italy).
Return to the Databases.
Copyright © 2007 O'Reilly Media, Inc.