MySQL Federated Tables: The Missing Manual
Pages: 1, 2, 3, 4, 5
Usage and Performance
Now that you know something more about the table connections, you can consider the interesting things, such as how the system performs when you actually use the data, which is the purpose of the whole exercise.
Honoring the WHERE Clause
Start by issuing a simple query, searching for a word that you may reasonably expect to find in the table.
server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.08 sec)
The target server will dutifully execute exactly what you asked, as the general log shows.
$ tail ~/server1/data/general.log
060612 13:37:18 4 Query SHOW TABLE STATUS LIKE 'tkey'
4 Query SELECT `id`, `word` FROM `tkey` WHERE (`word` = 'condor')
Now try something a tad more ambitious:
server2 (test) > select * from tkey_fed where word like 'condor%';
+-------+-----------+
| id | word |
+-------+-----------+
| 91569 | condor |
| 91570 | condorcet |
| 91571 | condores |
| 91572 | condors |
+-------+-----------+
4 rows in set (0.09 sec)
...and the general log shows:
060612 13:41:47 4 Query SHOW TABLE STATUS LIKE 'tkey'
4 Query SELECT `id`, `word` FROM `tkey` WHERE
( (`word` >= 'condor\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0') )
AND ( (`word` <= 'condor') )
Surprise, surprise! What's this? This query looks wrong, even though the result is as you expected. There must be something more than that. In fact, if you capture the previous query through a packet sniffer, you will see the truth:
0000 a6 00 00 00 03 53 45 4c 45 43 54 20 60 69 64 60 .....SELECT `id`
0010 2c 20 60 77 6f 72 64 60 20 46 52 4f 4d 20 60 74 , `word` FROM `t
0020 6b 65 79 60 20 57 48 45 52 45 20 20 28 20 28 60 key` WHERE ( (`
0030 77 6f 72 64 60 20 3e 3d 20 27 63 6f 6e 64 6f 72 word` >= 'condor
0040 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 \0\0\0\0\0\0\0\0
0050 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 \0\0\0\0\0\0\0\0
0060 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 \0\0\0\0\0\0\0\0
0070 27 29 20 29 20 41 4e 44 20 28 20 28 60 77 6f 72 ') ) AND ( (`wor
0080 64 60 20 3c 3d 20 27 63 6f 6e 64 6f 72 ff ff ff d` <= 'condor...
0090 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00a0 ff ff ff ff ff 27 29 20 29 20 .....') )
That's better, as odd as it may look. The server will find the string
condor, followed by any number of characters between
0x00 and 0xff (decimal 255). There might be some
issue for Unicode support, but leaving that aside for now, it's a reasonably
efficient query.
It's time to examine the ugly part. What happens when you issue the same query against the table without an index?
server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (2.07 sec)
You likely expected the query to be a little slower, but surely not that slow! Inspecting the logs reveals more:
$ tail ~/server1/data/general.log
060612 13:57:33 5 Connect server2usr@localhost on test
5 Query SET NAMES latin1
5 Query SHOW TABLE STATUS LIKE 'tnokey'
5 Query SELECT `id`, `word` FROM `tnokey`
$ tail ~/server1/data/slow_queries.log
# Time: 060612 13:57:35
# User@Host: server2usr[server2usr] @ localhost [127.0.0.1]
# Query_time: 2 Lock_time: 0 Rows_sent: 518584 Rows_examined: 518584
SELECT `id`, `word` FROM `tnokey`;
That's bad! The WHERE clause was irrelevant! As the slow
queries log shows, the federated engine sent half a million rows across the
network, leaving the burden of filtering it to the client. It was obvious that
this query would trigger a full table scan (because there is no index on that
column), and although you have to live with that, sending all the records across
the network is a burden that's best avoided. What can you do?
Cheating With Bogus Indexes
The truth is, for reasons I can't figure out, federated tables honor the
WHERE clause only if the linking table has an index. This looks
like a bug to me, but there is a workaround: rebuild the
federated table with a phantom key.
drop table if exists tnokey_fed; # remember: no ALTER TABLE support for federated tables
create table tnokey_fed (
id int not null primary key,
word varchar(30) not null,
KEY (word)
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tnokey';
Now test that assumption:
server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.64 sec)
This is much better. A table scan is slower than a direct index lookup (there are exceptions, but in this case the rule stays unchallenged). A look at the general log confirms this approach.
060612 14:12:10 7 Connect server2usr@localhost on test
7 Query SHOW TABLE STATUS LIKE 'tnokey'
7 Query SELECT `id`, `word` FROM `tnokey` WHERE (`word` = 'condor')
Thus, I can confidently state....
Rule number 5: The WHERE clause is honored only when using an index defined on the local server.
Do you want more evidence? Just rebuild tkey_fed without a key
for the word column, and the previous query will be as slow as the
initial one from tnokey_fed.
Developer's angle: The fix is to push down conditions for non-indexed columns, resulting in fewer table scans.
drop table if exists tkey_fed;
create table tkey_fed (
id int not null auto_increment primary key,
word varchar(30) not null
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey';
server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id | word |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (1.57 sec)
$ tail ~/server1/data/general.log
060612 14:19:07 9 Connect server2usr@localhost on test
9 Query SHOW TABLE STATUS LIKE 'tkey'
9 Query SELECT `id`, `word` FROM `tkey`
Summing up on performance, the situation looks bad, but it's livable. Cope. Adding an index to a normal table is an expensive operation, but adding an index to a federated table is quite cheap, as stated in rule number two. Therefore, whenever you face a performance problem, you can cheat a bit.
Be aware that the index rule is quite crude. The federated engine will only
use (according to my tests) one column in the WHERE clause, even
if you create a compound index.
JOINs
The notes so far apply also to joins. The federated engine will grab all the data it needs to perform a join, and this can easily lead to a disaster if you don't plan your queries very carefully.
This next test uses the MySQL world sample database on server1 and three federated tables on server2.
server2 (world) > select City.Name
-> from City INNER JOIN Country ON Country.Code=City.CountryCode
-> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name |
+----------------+
| Kabul |
| Qandahar |
| Herat |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.19 sec)
Remember that these are relatively small tables, with just a couple of
hundred records in Country and a few thousand in
City. If you have seen the world database before, you
know that its tables have just a primary key and no secondary indexes. The
general log will show some horrors behind the scenes.
$ tail ~/server1/data/general.log
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VUT')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VAT')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VEN')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'RUS')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VNM')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'EST')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'USA')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'VIR')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'ZWE')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'PSE')
Here tail will not be much help, besides telling you that the
federated engine issued a bunch of queries from Country. You have
to dig deeper to find out how many.
$ grep 'FROM `Country`' ~/server1/data/general.log | wc -l
232
$ tail -n 236 ~/server1/data/general.log | head -n 6
Time Id Command Argument
19 Query SELECT `ID`, `Name`, `CountryCode`, `District`, `Population` FROM `City`
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'AFG')
20 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Code` = 'NLD')
First it asked for all records from City. That's
understandable because there is no key. Then, it created a query for each
CountryCode found in City: 232 queries. For a simple
join, it's quite expensive.
Once more, you can fix the problem by adding appropriate keys to both
tables. An index on City.CountryCode and one on
Country.Name will solve the problem.
server2 (world) > SELECT City.Name
-> FROM City INNER JOIN Country ON Country.Code=City.CountryCode
-> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name |
+----------------+
| Kabul |
| Qandahar |
| Herat |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.04 sec)
$ tail -n 3 ~/server1/data/general.log
25 Query SHOW TABLE STATUS LIKE 'Country'
25 Query SELECT `Code`, `Name`, ... FROM `Country` WHERE (`Name` = 'Afghanistan')
24 Query SELECT `ID`, `Name`, ... FROM `City` WHERE (`CountryCode` = 'AFG')
The outcome was just two queries, both using a WHERE clause.
Still, the engine is not smart enough to make one query only, but that's better
than nothing. Another improvement is to add a real index on
server1, although that is beside the point; the main problem in
this case is not the speed of record retrieval, but the amount of data passed
across the network.
Developer's angle: A bug fix is on its way for join performance.
Using Aggregate Functions
Here comes the really bad news. No matter how much cheating, when using
COUNT, SUM, MAX, and all other aggregate
functions, the federated engine will read all of the records.
Consider again the first example, with half a million words. Try a seemingly harmless query:
server2 (test) > select count(*) from tkey_fed;
+----------+
| count(*) |
+----------+
| 518585 |
+----------+
1 row in set (2.37 sec)
If you think it's always a good idea to do a SELECT COUNT(*)
before attempting a potentially demanding query, think again.
This query took much longer than expected, and as always, the log inspection shows why:
$ tail ~/server1/data/general.log
060612 20:50:24 2 Query SHOW TABLE STATUS LIKE 'tkey'
2 Query SELECT `id`, `word` FROM `tkey`
Disaster! Five hundred thousand records went across the network just to get a count? There must be good reasons for such behavior, but from the user's point of view, it's really unacceptable. There is a workaround coming, but for now, it's enough to note....
Rule number 6: Aggregate functions with federated tables are extremely expensive.
However, don't lose heart just yet. Some relief is coming your way shortly.







