<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>O&apos;Reilly Databases</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/" />
    <link rel="self" type="application/atom+xml" href="http://www.oreillynet.com/databases/blog/atom.xml" />
   <id>tag:www.oreillynet.com,2013:/databases/blog//6</id>
    <updated>2008-09-25T00:10:07Z</updated>
    <subtitle>O&apos;Reilly Databases Blog</subtitle>    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.21</generator>
 
<entry>
    <title>MySQL backups using ZFS snapshot</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2008/04/mysql_backups_using_zfs_snapsh_1.html" />
    <id>tag:www.oreillynet.com,2008:/databases/blog//6.23422</id>
    
    <published>2008-04-11T20:34:15Z</published>
    <updated>2008-09-25T00:10:07Z</updated>
    
    <summary> One of the significant features in version 2.0 of Zmanda Recovery Manager for MySQL is MySQL backups using Solaris ZFS. Doing MySQL backups using filesystem snapshots has minimal impact on the MySQL databases. The MySQL databases are not available...</summary>
    <author>
        <name>Paddy Sreenivasan</name>
            </author>
            <category term="Articles" />
        <content type="html">
&lt;p&gt;
One of the significant features in version 2.0 of &lt;a href="http://www.zmanda.com/backup-mysql.html"&gt;Zmanda Recovery Manager for MySQL&lt;/a&gt; is MySQL backups using Solaris ZFS. Doing MySQL backups using&lt;br /&gt;
filesystem snapshots has minimal impact on the MySQL databases. The MySQL databases are not available for updates for less than a second and the application impact is not dependent on the size of the database.
&lt;/p&gt;
&lt;p&gt;
        ZRM 2.0 can be downloaded from &lt;a href="http://www.zmanda.com/download-zrm.php"&gt;Zmanda downloads&lt;/a&gt; page. It supports all Linux and Solaris distributions. The documentation is available on &lt;a href="http://mysqlbackup.zmanda.com/"&gt;ZRM wiki&lt;/a&gt;.&lt;br /&gt;
&lt;a href="http://forums.zmanda.com"&gt;ZRM forums&lt;/a&gt; can be used to get questions answered about the project.
&lt;/p&gt;
&lt;p&gt;
This article shows an example of how to install, configure, backup and restore MySQL databases using Zmanda Recovery Manager (ZRM) for MySQL on OpenSolaris. The example takes advantage of ZFS snapshots to do full backups.
&lt;/p&gt;
&lt;p&gt;The example assumes that the ZRM server and MySQL server are the same machine. We are backing up MySQL database &amp;#8220;myisamnetflix&amp;#8221; to the same machine running Solaris Express Community Edition snv_77 on AMD platform.&lt;/p&gt;
&lt;h2&gt;ZRM for MySQL installation&lt;/h2&gt;
&lt;p&gt;* Installation has to be done as super user.&lt;/p&gt;
&lt;p&gt;* ZRM for MySQL works with Perl 5.8.4 from SUNWperl584core package.&lt;/p&gt;
&lt;p&gt;* Install perl-DBD and perl-XML-parser modules. SUNWperl-xml-parser is part for Solaris Express Community Edition.&lt;/p&gt;
&lt;p&gt;* Download ZRM for MySQL Solaris packages from &lt;a href="http://www.zmanda.com/download-zrm.php"&gt;Zmanda downloads&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;* Install ZRM for MySQL (ZRM server package is sufficient because MySQL server and ZRM server are the same machine).&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 # gunzip MySQLZrm-2.0-SunOS5.10-noarch.pkg.gz&lt;br /&gt;
 # pkgadd -d MySQLZrm-2.0-SunOS5.10-noarch.pkg&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt; MySQL server configuration &lt;/h2&gt;
&lt;p&gt;* Check to see if MySQL server is running. If MySQL server is not installed, please install SUNWmysqlr, SUNWmysqlt, SUNWmysqlu packages from the distribution. Update the &amp;#8220;root&amp;#8221; MySQL server with a password using mysqladmin command (/usr/sfw/bin/mysqladmin &amp;#8211;user root password boot12). We are using &amp;#8220;boot12&amp;#8243; as the root password. This user will be used for doing MySQL backups and restores. It is better to user a specific user with minimal privileges to do MySQL backups instead of using &amp;#8220;root&amp;#8221; MySQL user.&lt;/p&gt;
&lt;p&gt;* The MySQL server has to run as &amp;#8220;mysql&amp;#8221; user and &amp;#8220;mysql&amp;#8221; OS user should belong to &amp;#8220;mysql&amp;#8221; group. The default installation of ZRM for MySQL requires MySQL server to run as &amp;#8220;mysql&amp;#8221; user.&lt;/p&gt;
&lt;p&gt;* Enable binary logging on the MySQL server. Binary logging must be enabled to do incremental backups of the MySQL server.&lt;/p&gt;
&lt;p&gt;* Edit /etc/my.cnf configuration file. Add &amp;#8220;log-bin&amp;#8221; in mysqld section&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 [mysqld]&lt;br /&gt;
 log-bin&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Create a ZFS filesystem as shown below. We will be storing MySQL data in /testpool/testfs directory.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 # zpool list&lt;br /&gt;
 NAME       SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT&lt;br /&gt;
 testpool  29.8G   628M  29.1G     2%  ONLINE  -&lt;br /&gt;
 # zfs list&lt;br /&gt;
 NAME                   USED  AVAIL  REFER  MOUNTPOINT&lt;br /&gt;
 testpool               521M  23.8G  33.2K  /testpool&lt;br /&gt;
 testpool/testfs        521M  23.8G   520M  /testpool/testfs&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Change /etc/my.cnf configuration file so that datadir points to the ZFS filesystem.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 [mysqld]&lt;br /&gt;
 datadir = /testpool/testfs/mysql/data/&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Restart MySQL server. The MySQL server is listening to default port 3036.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
    mysql   616   592   0   Mar 03 console    1:2:13 /usr/sfw/sbin/mysqld --basedir=/usr/sfw --datadir=/testpool/testfs/mysql/data/&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
* We have mysql database &amp;#8220;myisamnetflix&amp;#8221; that contains two tables. We will be backing this database. This database uses MyISAM storage engine&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 mysql&gt; show databases;&lt;br /&gt;
 +--------------------+&lt;br /&gt;
 &amp;#124; Database           &amp;#124;&lt;br /&gt;
 +--------------------+&lt;br /&gt;
 &amp;#124; information_schema &amp;#124;&lt;br /&gt;
 &amp;#124; myisamnetflix      &amp;#124;&lt;br /&gt;
 &amp;#124; mysql              &amp;#124;&lt;br /&gt;
 +--------------------+&lt;br /&gt;
 3 rows in set (0.00 sec)&lt;/p&gt;
&lt;p&gt; mysql&gt; use myisamnetflix;&lt;br /&gt;
 Reading table information for completion of table and column names&lt;br /&gt;
 You can turn off this feature to get a quicker startup with -A&lt;/p&gt;
&lt;p&gt; Database changed&lt;br /&gt;
 mysql&gt; show tables;&lt;br /&gt;
 +-------------------------+&lt;br /&gt;
 &amp;#124; Tables_in_myisamnetflix &amp;#124;&lt;br /&gt;
 +-------------------------+&lt;br /&gt;
 &amp;#124; MovieID                 &amp;#124;&lt;br /&gt;
 &amp;#124; MovieRatings            &amp;#124;&lt;br /&gt;
 +-------------------------+&lt;br /&gt;
 2 rows in set (0.00 sec)&lt;/p&gt;
&lt;p&gt; mysql&gt; select count(*) from MovieID;&lt;br /&gt;
 +----------+&lt;br /&gt;
 &amp;#124; count(*) &amp;#124;&lt;br /&gt;
 +----------+&lt;br /&gt;
 &amp;#124;    17770 &amp;#124;&lt;br /&gt;
 +----------+&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
* MySQL client commands are installed in  &lt;i&gt;/usr/bin/&lt;/i&gt; directory. If they are not, accordingly configure the client command location and binary log location  in &lt;i&gt;mysql-zrm.conf&lt;/i&gt;.&lt;/p&gt;
&lt;h2&gt; ZRM configuration &lt;/h2&gt;
&lt;p&gt;* This should be done as &lt;i&gt;mysql&lt;/i&gt; user&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 $ id&lt;br /&gt;
 uid=100(mysql) gid=100(mysql)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Create the backup set directory. The backup set is called &amp;#8220;zfstest&amp;#8221;.&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 $ mkdir /etc/mysql-zrm/zfstest&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Create mysql-zrm.conf configuration file. The &amp;#8220;myisamnetflix&amp;#8221; database is being backed up. The location of MySQL client commands have specified in &amp;#8220;mysql-binpath&amp;#8221; parameter. ZFS snapshot plugin has been specified as the backup method.&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 $ cat /etc/mysql-zrm/zfstest/mysql-zrm.conf&lt;br /&gt;
 mysql-binpath="/usr/sfw/bin"&lt;br /&gt;
 host="localhost"&lt;br /&gt;
 databases="myisamnetflix"&lt;br /&gt;
 password="boot12"&lt;br /&gt;
 user="root"&lt;br /&gt;
 snapshot-plugin="/usr/share/mysql-zrm/plugins/zfs-snapshot.pl"&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt; Perform ZRM backups &lt;/h2&gt;
&lt;p&gt;* This should be done as &lt;i&gt;mysql&lt;/i&gt; user&lt;/p&gt;
&lt;p&gt;* Perform full backup of the database immediately using &lt;i&gt;mysql-zrm-scheduler&lt;/i&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 $ mysql-zrm-scheduler --now --backup-set zfstest --backup-level 0&lt;br /&gt;
 schedule:INFO: ZRM for MySQL Community Edition - version 2.0&lt;br /&gt;
 Logging to /var/log/mysql-zrm/mysql-zrm-scheduler.log&lt;br /&gt;
 backup:INFO: ZRM for MySQL Community Edition - version 2.0&lt;br /&gt;
 zfstest:backup:INFO: START OF BACKUP&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Initialization&lt;br /&gt;
 zfstest:backup:INFO: backup-set=zfstest&lt;br /&gt;
 zfstest:backup:INFO: backup-date=20080326053921&lt;br /&gt;
 zfstest:backup:INFO: mysql-server-os=Linux/Unix&lt;br /&gt;
 zfstest:backup:INFO: host=localhost&lt;br /&gt;
 zfstest:backup:INFO: backup-date-epoch=1206535161&lt;br /&gt;
 zfstest:backup:INFO: mysql-zrm-version=ZRM for MySQL Community Edition - version 2.0&lt;br /&gt;
 zfstest:backup:INFO: mysql-version=4.0.24-log&lt;br /&gt;
 zfstest:backup:INFO: backup-directory=/var/lib/mysql-zrm/zfstest/20080326053921&lt;br /&gt;
 zfstest:backup:INFO: backup-level=0&lt;br /&gt;
 zfstest:backup:INFO: backup-mode=raw&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Initialization&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Running pre backup plugin&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Running pre backup plugin&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Flushing logs&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Flushing logs&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Creating snapshot based backup&lt;br /&gt;
 zfstest:backup:INFO: File       Position        Binlog_do_db    Binlog_ignore_db&lt;br /&gt;
 mysql-bin.045   4&lt;br /&gt;
 zfstest:backup:INFO: innodb-data=/testpool/testfs/mysql/data/ibdata1;&lt;br /&gt;
 zfstest:backup:INFO: innodb-logs=/testpool/testfs/mysql/data/./ib_logfile*&lt;br /&gt;
 zfstest:backup:INFO: raw-databases-snapshot=myisamnetflix&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Creating snapshot based backup&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Find table type&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Find table type&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Calculating backup size &amp; checksums&lt;br /&gt;
 zfstest:backup:INFO: next-binlog=mysql-bin.045&lt;br /&gt;
 zfstest:backup:INFO: backup-size=261.87 MB&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Calculating backup size &amp; checksums&lt;br /&gt;
 zfstest:backup:INFO: read-locks-time=00:00:01&lt;br /&gt;
 zfstest:backup:INFO: flush-logs-time=00:00:00&lt;br /&gt;
 zfstest:backup:INFO: backup-time=00:02:19&lt;br /&gt;
 zfstest:backup:INFO: backup-status=Backup succeeded&lt;br /&gt;
 zfstest:backup:INFO: Backup succeeded&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Running post backup plugin&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Running post backup plugin&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Mailing backup report&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Mailing backup report&lt;br /&gt;
 zfstest:backup:INFO: PHASE START: Cleanup&lt;br /&gt;
 zfstest:backup:INFO: PHASE END: Cleanup&lt;br /&gt;
 zfstest:backup:INFO: END OF BACKUP&lt;br /&gt;
 /usr/bin/mysql-zrm started successfully&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Delete some entries from the &amp;#8220;myisamnetflix&amp;#8221; database (so that we can do incremental backup of the database). We are deleting all movies that start with &amp;#8220;Sherlock&amp;#8221;&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 mysql&gt; use myisamnetflix;&lt;br /&gt;
 Reading table information for completion of table and column names&lt;br /&gt;
 You can turn off this feature to get a quicker startup with -A&lt;/p&gt;
&lt;p&gt; Database changed&lt;/p&gt;
&lt;p&gt; mysql&gt; delete from MovieID where MovieTitle regexp 'Sherlock*';&lt;br /&gt;
Query OK, 31 rows affected (0.13 sec)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Perform incremental backup of the backup set.&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 $ mysql-zrm-scheduler --now --backup-set zfstest --backup-level 1&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt; ZRM backup reports &lt;/h2&gt;
&lt;p&gt;* Use &lt;i&gt;mysql-zrm-reporter&lt;/i&gt; to look at the status of backups available&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 $ /usr/bin/mysql-zrm-reporter --where backup-set=zfstest --show backup-status-info&lt;/p&gt;
&lt;p&gt; REPORT TYPE : backup-status-info&lt;/p&gt;
&lt;p&gt;           backup_set  backup_date                  backup_level  backup_status         comment&lt;br /&gt;
 -----------------------------------------------------------------------------------------------------------&lt;br /&gt;
              zfstest  Wed Mar 26 07:10:26 2008                1  Backup succeeded      ----&lt;br /&gt;
              zfstest  Wed Mar 26 05:39:21 2008                0  Backup succeeded      ----&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* ZRM reports can also provide information on impact on MySQL application.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 $ /usr/bin/mysql-zrm-reporter --where backup-set=zfstest --show  backup-app-performance-info&lt;/p&gt;
&lt;p&gt; REPORT TYPE : backup-app-performance-info&lt;/p&gt;
&lt;p&gt;           backup_set  backup_date                  backup_level     backup_size  backup_time   read_locks_time     flush_logs_time&lt;br /&gt;
 -------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;
              zfstest  Wed Mar 26 07:10:26 2008                1         0.00 MB  00:00:01      00:00:00            00:00:00&lt;br /&gt;
              zfstest  Wed Mar 26 05:39:21 2008                0       261.87 MB  00:02:19      00:00:01            00:00:00&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt; Database recovery &lt;/h2&gt;
&lt;p&gt;* Use ZRM reporting tool to identify the location of MySQL backup images.&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 $ /usr/bin/mysql-zrm-reporter --where backup-set=zfstest --show restore-info&lt;/p&gt;
&lt;p&gt; REPORT TYPE : restore-info&lt;/p&gt;
&lt;p&gt;           backup_set  backup_date                  backup_level  backup_directory                          backup_status         comment&lt;br /&gt;
 -----------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;
              zfstest  Wed Mar 26 07:10:26 2008                1  /var/lib/mysql-zrm/zfstest/20080326071026  Backup succeeded      ----&lt;br /&gt;
              zfstest  Wed Mar 26 05:39:21 2008                0  /var/lib/mysql-zrm/zfstest/20080326053921  Backup succeeded      ----&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* You can parse incremental backups to identify database events of interest. In our example, we will look for the &amp;#8220;DELETE&amp;#8221; event.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 $ /usr/bin/mysql-zrm-parse-binlogs --source-directory /var/lib/mysql-zrm/zfstest/20080326071026  --mysql-binpath /usr/sfw/bin&lt;br /&gt;
 parse-binlogs:INFO: ZRM for MySQL Community Edition - version 2.0&lt;br /&gt;
 ------------------------------------------------------------&lt;br /&gt;
 Log filename &amp;#124; Log Position &amp;#124; Timestamp &amp;#124; Event Type &amp;#124; Event&lt;br /&gt;
 ------------------------------------------------------------&lt;br /&gt;
 /var/lib/mysql-zrm/zfstest/20080326071026/mysql-bin.045 &amp;#124; 4 &amp;#124; 08-03-26 07:08:59 &amp;#124; Query &amp;#124; use myisamnetflix; delete from MovieID where MovieTitle regexp 'Sherlock Holmes*';&lt;br /&gt;
 /var/lib/mysql-zrm/zfstest/20080326071026/mysql-bin.045 &amp;#124; 110 &amp;#124; 08-03-26 07:09:27 &amp;#124; Query &amp;#124; delete from MovieID where MovieTitle regexp 'Sherlock*';&lt;br /&gt;
 /var/lib/mysql-zrm/zfstest/20080326071026/mysql-bin.045 &amp;#124; 209 &amp;#124; 08-03-26 07:10:26 &amp;#124; Rotate to mysql-bin.046  pos: 4 &amp;#124;&lt;br /&gt;
 ------------------------------------------------------------&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
* Restore the database from the full backup done at 05:39:21&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
 $ /usr/bin/mysql-zrm-restore --mysql-binpath /usr/sfw/bin --user=root --password=boot12 --source-directory=/var/lib/mysql-zrm/zfstest/20080326053921&lt;br /&gt;
restore:INFO: ZRM for MySQL Community Edition - version 2.0&lt;br /&gt;
 BackupSet1:restore:INFO: Restored database from raw backup: myisamnetflix&lt;br /&gt;
 BackupSet1:restore:INFO: Restore done in 336 seconds.&lt;br /&gt;
 MySQL server has been shutdown. Please restart after verification.&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;* Restart the MySQL server. ZRM for MySQL shuts down the database after recovery so that the contents can be verified.&lt;/p&gt;
&lt;p&gt;* Verify database recovery&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
 mysql&gt; use myisamnetflix;&lt;br /&gt;
 Reading table information for completion of table and column names&lt;br /&gt;
 You can turn off this feature to get a quicker startup with -A&lt;/p&gt;
&lt;p&gt; Database changed&lt;br /&gt;
 mysql&gt; select * from MovieID where MovieTitle regexp 'Sherlock*';&lt;br /&gt;
 +---------+------+------------------------------------------+&lt;br /&gt;
 &amp;#124; MovieID &amp;#124; Year &amp;#124; MovieTitle                               &amp;#124;&lt;br /&gt;
 +---------+------+------------------------------------------+&lt;br /&gt;
 &amp;#124;     742 &amp;#124; 2002 &amp;#124; Sherlock: Case of Evil                   &amp;#124;&lt;br /&gt;
 &amp;#124;     757 &amp;#124; 1984 &amp;#124; Sherlock Hound                           &amp;#124;&lt;br /&gt;
 &amp;#124;     777 &amp;#124; 1944 &amp;#124; Sherlock Holmes and the Spider Woman     &amp;#124;&lt;br /&gt;
 &amp;#124;     804 &amp;#124; 1944 &amp;#124; Sherlock Holmes: The Scarlet Claw        &amp;#124;&lt;br /&gt;
 &amp;#124;    1332 &amp;#124; 1954 &amp;#124; Sherlock Holmes                          &amp;#124;&lt;br /&gt;
 &amp;#124;    2688 &amp;#124; 1994 &amp;#124; The Memoirs of Sherlock Holmes           &amp;#124;&lt;br /&gt;
 &amp;#124;    4013 &amp;#124; 1993 &amp;#124; Sherlock Holmes: The Eligible Bachelor   &amp;#124;&lt;br /&gt;
 &amp;#124;    4902 &amp;#124; 1985 &amp;#124; Young Sherlock Holmes                    &amp;#124;&lt;br /&gt;
 &amp;#124;    5569 &amp;#124; 1939 &amp;#124; The Adventures of Sherlock Holmes        &amp;#124;&lt;br /&gt;
 &amp;#124;    6468 &amp;#124; 2000 &amp;#124; The Sherlock Holmes Collection           &amp;#124;&lt;br /&gt;
 &amp;#124;    7159 &amp;#124; 1942 &amp;#124; Sherlock Holmes and the Secret Weapon    &amp;#124;&lt;br /&gt;
 &amp;#124;    7741 &amp;#124; 1943 &amp;#124; Sherlock Holmes in Washington            &amp;#124;&lt;br /&gt;
 &amp;#124;    8654 &amp;#124; 1984 &amp;#124; Sherlock Holmes: The Hound of the Basker &amp;#124;&lt;br /&gt;
 &amp;#124;    8860 &amp;#124; 1984 &amp;#124; The Adventures of Sherlock Holmes        &amp;#124;&lt;br /&gt;
 &amp;#124;   10289 &amp;#124; 1993 &amp;#124; Sherlock Holmes: The Last Vampyre        &amp;#124;&lt;br /&gt;
 &amp;#124;   10450 &amp;#124; 1943 &amp;#124; Sherlock Holmes Faces Death              &amp;#124;&lt;br /&gt;
 &amp;#124;   11142 &amp;#124; 1991 &amp;#124; The Casebook of Sherlock Holmes          &amp;#124;&lt;br /&gt;
 &amp;#124;   12176 &amp;#124; 1999 &amp;#124; The Fall and Rise of Sherlock Holmes     &amp;#124;&lt;br /&gt;
 &amp;#124;   12416 &amp;#124; 1942 &amp;#124; Sherlock Holmes and the Voice of Terror  &amp;#124;&lt;br /&gt;
 &amp;#124;   12871 &amp;#124; 1926 &amp;#124; Our Hospitality / Sherlock Jr.           &amp;#124;&lt;br /&gt;
 &amp;#124;   12929 &amp;#124; 1944 &amp;#124; Sherlock Holmes: The Pearl of Death      &amp;#124;&lt;br /&gt;
 &amp;#124;   13253 &amp;#124; 1946 &amp;#124; Sherlock Holmes: Dressed to Kill         &amp;#124;&lt;br /&gt;
 &amp;#124;   14577 &amp;#124; 1970 &amp;#124; The Private Life of Sherlock Holmes      &amp;#124;&lt;br /&gt;
 &amp;#124;   14636 &amp;#124; 1945 &amp;#124; Sherlock Holmes: In Pursuit to Algiers   &amp;#124;&lt;br /&gt;
 &amp;#124;   14859 &amp;#124; 1986 &amp;#124; The Return of Sherlock Holmes            &amp;#124;&lt;br /&gt;
 &amp;#124;   15353 &amp;#124; 1945 &amp;#124; Sherlock Holmes: The House of Fear       &amp;#124;&lt;br /&gt;
 &amp;#124;   15676 &amp;#124; 1946 &amp;#124; Sherlock Holmes: Terror by Night         &amp;#124;&lt;br /&gt;
 &amp;#124;   15987 &amp;#124; 1987 &amp;#124; Sherlock Holmes: The Sign of Four        &amp;#124;&lt;br /&gt;
 &amp;#124;   16089 &amp;#124; 2004 &amp;#124; Sherlock Holmes and the Case of the Silk &amp;#124;&lt;br /&gt;
 &amp;#124;   16729 &amp;#124; 1945 &amp;#124; Sherlock Holmes: The Woman in Green      &amp;#124;&lt;br /&gt;
 &amp;#124;   17684 &amp;#124; 1992 &amp;#124; Sherlock Holmes: The Master Blackmailer  &amp;#124;&lt;br /&gt;
 +---------+------+------------------------------------------+&lt;br /&gt;
 31 rows in set (0.15 sec)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;
Now that you have read till the end, it is time to &lt;a href="http://www.zmanda.com/download-zrm.php"&gt;download&lt;/a&gt; and try.
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Improved Snapshot interface for MySQL backups</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2008/04/improved_snapshot_interface_fo.html" />
    <id>tag:www.oreillynet.com,2008:/databases/blog//6.23357</id>
    
    <published>2008-04-04T18:41:36Z</published>
    <updated>2008-04-04T18:41:42Z</updated>
    
    <summary>We have significantly improved the snapshot interface for doing MySQL backups using ZRM. This work has been released as part of ZRM 2.0. ZRM 2.0 has couple of snapshot plugins - Linux LVM and Solaris ZFS that uses the interface....</summary>
    <author>
        <name>Paddy Sreenivasan</name>
            </author>
            <category term="News" />
        <content type="html">
&lt;p&gt;We have significantly improved the snapshot interface for doing MySQL backups using &lt;a href="http://www.zmanda.com/backup-mysql.html"&gt;ZRM&lt;/a&gt;. This work has been released as part of ZRM 2.0.  ZRM 2.0 has couple of snapshot plugins - Linux LVM and Solaris ZFS that uses the interface. &lt;/p&gt;
&lt;p&gt;Changes in ZRM 2.0:&lt;br /&gt;
* Solaris packages&lt;br /&gt;
* ZRM clients for Linux (RPM/Debian) and Solaris&lt;br /&gt;
* Tested on Gentoo distribution&lt;br /&gt;
* Improved Snapshot plugin interface&lt;br /&gt;
* Solaris ZFS snapshot plugin&lt;br /&gt;
* Backup of remote servers using snapshots&lt;br /&gt;
* Asychronous checksum computation for improved backup performance&lt;br /&gt;
* Backup compression on the fly for logical backups&lt;/p&gt;
&lt;p&gt;Download it from &lt;a href="http://www.zmanda.com/download-zrm.php"&gt;Zmanda&lt;/a&gt; downloads page and give it a try. I will write more about how to use the plugin interface next week.&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Reporting MySQL Internals with Information Schema plug-ins</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2008/02/reporting_mysql_internals_with.html" />
    <id>tag:www.oreillynet.com,2008:/databases/blog//6.22954</id>
    
    <published>2008-02-10T23:04:51Z</published>
    <updated>2008-02-11T20:21:37Z</updated>
    
    <summary>Last week, I described how to use the MySQL plug-in API to write a minimal &apos;Hello world!&apos; information schema plug-in. The main purpose of that plug-in is to illustrate the bare essentials of the MySQL information schema plug-in interface. In...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Articles" />
        <content type="html">
&lt;p&gt;Last week, &lt;a href="http://rpbouman.blogspot.com/2008/02/mysql-information-schema-plugins-best.html" target="_rpb"&gt;I described&lt;/a&gt; how to use the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugin-api.html" target="_mysql"&gt;MySQL plug-in API&lt;/a&gt; to write a minimal &lt;a href="http://www.xcdsql.org/MySQL/Plugin/mysql_is_hello.cc" target="_rpb"&gt;&amp;#8216;Hello world!&amp;#8217; information schema plug-in&lt;/a&gt;. The main purpose of that plug-in is to illustrate the bare essentials of the MySQL information schema plug-in interface. &lt;/p&gt;
&lt;p&gt;In this article, I&amp;#8217;d like to take that to the next level and demonstrate how to write an information schema plug-in that can access some of the internals of the MySQL server. For this particular purpose, we will focus on a plug-in that reports all the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/savepoints.html" target="_mysql"&gt;&lt;code&gt;SAVEPOINT&lt;/code&gt;&lt;/a&gt;s available in the current session. This &lt;a href="http://www.xcdsql.org/MySQL/Plugin/mysql_is_savepoints.cc" target="_rpb"&gt;&lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt;&lt;/a&gt; plug-in may be of some value when debugging scripts and stored routines that rely on complex scenarios using transactions and savepoints. &lt;/p&gt;
&lt;p&gt;In a forthcoming article, I will describe a few information schema plug-ins that are arguably more interesting, such as a plug-in to list the currently existing &lt;code&gt;TEMPORARY&lt;/code&gt; tables, &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/user-variables.html" target="_mysql"&gt;user-defined variables&lt;/a&gt;, and the contents of the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/query-cache.html" target="_href"&gt;query cache&lt;/a&gt;. Although the plug-in described in this article may be of some use, its main purpose is to illustrate the minimal requirements for plug-ins that can access the server&amp;#8217;s internals.&lt;/p&gt;
&lt;h3&gt;A Quick Recapitulation&lt;/h3&gt;
&lt;p&gt;You might recall that:
&lt;ul&gt;
&lt;li&gt;The MySQL plug-in API is one of the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-nutshell.html" target="_mysql"&gt;new features&lt;/a&gt; in &lt;a href="http://dev.mysql.com/downloads/mysql/5.1.html" target="_mysql"&gt;MySQL 5.1&lt;/a&gt;, and forms a &lt;em&gt;generic extension point&lt;/em&gt; of the MySQL database server, allowing privileged database users to add functionality to the MySQL Server by loading a shared library from the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#option_mysqld_plugin_dir" target="_mysql"&gt;plug-in directory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Loading and unloading a plug-in is a completely dynamic process controlled using the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/install-plugin.html" target="_mysql"&gt;&lt;code&gt;INSTALL PLUGIN&lt;/code&gt;&lt;/a&gt; and &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/uninstall-plugin.html" target="_mysql"&gt;&lt;code&gt;UNINSTALL PLUGIN&lt;/code&gt;&lt;/a&gt; syntax, and does &lt;em&gt;not&lt;/em&gt; involve compiling the server or even restarting it&lt;/li&gt;
&lt;li&gt;There are several types of plug-ins, the most well-known being &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html" target="_mysql"&gt;storage engines&lt;/a&gt; and &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugin-full-text-plugins.html" target="_mysql"&gt;full-text parsers&lt;/a&gt;. Less well-known types include information schema and daemon plug-ins.&lt;/li&gt;
&lt;li&gt;An information schema plug-in provides the implementation of a table (or actually, a system view) in the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/information-schema.html" target="_mysql"&gt;information_schema database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Plug-ins are usually implemented in C/C++. To implement a plug-in, the implementor must include the header file &lt;code&gt;plugin.h&lt;/code&gt; and provide an initialized instance of the &lt;code&gt;st_mysql_plugin&lt;/code&gt; structure. In addition, the implementor must provide code to implement the plug-in type dependent part of the interface&lt;/li&gt;
&lt;li&gt;The plug-in type dependent part of the interface for information schema plug-ins consists of two things: the column definitions of the information schema table and a &lt;code&gt;fill_table&lt;/code&gt; function that is called whenever the server wants to retrieve the rows of data from that table.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;How Information Schema plug-ins can access MySQL Server internals&lt;/h3&gt;
&lt;p&gt;Before we discuss the &lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt; information schema plug-in in detail, let&amp;#8217;s take a look at the way information schema plug-ins can obtain access to the internals of the MySQL server.&lt;/p&gt;
&lt;p&gt;Like I just recapitulated from last week&amp;#8217;s article, the plug-in type dependent inferface for information schema plug-ins consists of two things:
&lt;ul&gt;
&lt;li&gt;An array of &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; structures, each of which defines a column of the information schema table&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;fill_table&lt;/code&gt; function that is called by the server when it needs to retrieve the data from table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The column definitions may be considered a static part of the interface - they simply define the structure of the table - no more, no less. The &lt;code&gt;fill_table&lt;/code&gt; function is a different matter. Let&amp;#8217;s take a look at the signature of the signature of that function:
&lt;pre&gt;
int fill_table(THD *thd, TABLE_LIST *tables, COND *cond);
&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;TABLE_LIST *tables&lt;/code&gt; argument provides the handle to the information schema table that is being filled, and the &lt;code&gt;COND *cond&lt;/code&gt; argument represents the &lt;code&gt;WHERE&lt;/code&gt; condition of the SQL statement that is currently being handled, allowing the &lt;code&gt;fill_table&lt;/code&gt; function to directly filter rows (instead of relying on the query execution engine to do that). As such, these arguments are occupied with the actual delivery of rows of data to the server.&lt;/p&gt;
&lt;p&gt;The first argument to the &lt;code&gt;fill_table&lt;/code&gt; function offers all kinds of interesting opportunities to see what is going on inside the server. We will discuss it in more detail in the next section.&lt;br /&gt;
&lt;h4&gt;Public accessors to the current &lt;code&gt;THD&lt;/code&gt; instance&lt;/h4&gt;
&lt;p&gt;The first argument to the &lt;code&gt;fill_table&lt;/code&gt; function is &lt;code&gt;THD *thd&lt;/code&gt;. This is the so-callled &lt;em&gt;thread descriptor&lt;/em&gt; - something that is best thought of as a handle to the current session.  Note that in a MySQL context, the terms &lt;em&gt;connection&lt;/em&gt;, &lt;em&gt;thread&lt;/em&gt; and &lt;em&gt;session&lt;/em&gt; are often used interchangeably. However, I find the term thread too broad, and the term connection too narrow. As there are many parts in &lt;code&gt;THD&lt;/code&gt; that maintain state regarding the events that occurred since a connection is established, it seems most sensible to think of &lt;code&gt;THD&lt;/code&gt; as the server-side implementation of a &lt;em&gt;session&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;This session handle or thread descriptor has the form of a pointer to an instance of the &lt;code&gt;THD&lt;/code&gt; class. The &lt;code&gt;plugin.h&lt;/code&gt; header file contains a forward declaration to this class, but the actual declaration is contained in &lt;code&gt;sql/sql_class.h&lt;/code&gt;. The &lt;code&gt;THD&lt;/code&gt; class is one of the key data structures in understanding the workings of the MySQL server as it is passed as an argument to many internal server functions. Consequently it provides a wealth of possibilities to create interesting new information schema plug-ins. In fact, the number of possibilities are so great, that a number of common usages has been explicitly set aside in the &lt;code&gt;plugin.h&lt;/code&gt; header file.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;plugin.h&lt;/code&gt; header file contains a number of function declarations and macros that provide access to the members of a &lt;code&gt;THD&lt;/code&gt; instance. I will not discuss all of them here, but highlight just a few just to give you an idea:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;thd_test_options()&lt;/code&gt; - Find out which options are set. This can be used to find out whether a number of boolean options like &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/server-options.html#option_mysqld_big-tables" target="_mysql"&gt;&lt;code&gt;big_tables&lt;/code&gt;&lt;/a&gt;, (general and binary) logging, and &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-and-autocommit.html" target="_href"&gt;&lt;code&gt;autocommit&lt;/code&gt;&lt;/a&gt; are enabled or disabled.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thd_proc_info()&lt;/code&gt; - Should be used by the plug-in implementor before starting a potentially time-consuming operation so the rest of the world can monitor what this session doing. The code set here corresponds to the value reported in the &lt;code&gt;State&lt;/code&gt; column by the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/show-processlist.html"&gt;&lt;code&gt;SHOW PROCESSLIST&lt;/code&gt;&lt;/a&gt; statement&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thd_killed()&lt;/code&gt; - Can be used by the plug-in implementor to find out if the thread in which this session lives was killed. If the plug-in is involved in a potentially time-consuming process, the plugin-in implementor should periodically check this and gracefully abort the plugin-ins work when it detects that the thread was killed.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thd_alloc()&lt;/code&gt; - Allocates some memory from this session&amp;#8217;s memory pool. If the plug-in requires some small amount of memory, plug-in implementors should use this rather than the standard &lt;code&gt;malloc()&lt;/code&gt; function. Calling &lt;code&gt;thd_alloc();&lt;/code&gt;  is likely to be faster because it takes memory out of a pre-allocated pool, reducing contention. In addition, it is more convenient because the memory need not be explicitly freed: it is automatically reclaimed by the pool after handling the current statement.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested in seeing all these declarations, just open &lt;code&gt;plugin.h&lt;/code&gt; and look for comments like this:
&lt;pre&gt;
/*************************************************************************
  Miscellaneous functions for plugin implementors
*/
&lt;/pre&gt;
&lt;h4&gt;&lt;code&gt;plugin.h&lt;/code&gt; describes a public interface&lt;/h4&gt;
&lt;p&gt;In &lt;code&gt;plugin.h&lt;/code&gt;, the declarations as described in the previous section together form a &lt;em&gt;public interface&lt;/em&gt; to the current session. They are there for the convenience of plug-in implementors and represent a &amp;#8217;safe&amp;#8217; way to work with the &lt;code&gt;THD&lt;/code&gt; pointer passed to the &lt;code&gt;fill_table&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;To say that these form a public interface is to stay that these are officially supported by MySQL AB. That is: they will be supported officially once the MySQL 5.1 Server is a generally available release. From that point on you can rely on these functions when writing plug-ins in the sense that you do not have to be afraid that they will change. At least, the public interface will remain the same for all forthcoming builds of the 5.1 server. Any interface changes in future releases will involve a proper process, giving everybody the chance to update their code well in time.&lt;/p&gt;
&lt;p&gt;Unfortunately, not every function declaration in &lt;code&gt;plugin.h&lt;/code&gt; has source code comments. This means that for now, you sometimes need to do some digging in the server&amp;#8217;s source code to find out what you can do with them. I admit that this situation is not exactly perfect. However, the matter has been &lt;a href="http://bugs.mysql.com/bug.php?id=34413" target="_mysql"&gt;reported as a bug&lt;/a&gt;, and hopefully, it will be adressed soon.&lt;br /&gt;
&lt;h4&gt;Beyond the public interface&lt;/h4&gt;
&lt;p&gt;I just described the public interface plug-in implementors can rely on. A distinct advantage of the public interface is that it takes away a lot of the complexity of the underlying internals of the MySQL Server. However, there will always be cases where the public interface does not offer the features you really need. In those cases, you simply need to  be able to work directly on the server internals.  &lt;/p&gt;
&lt;p&gt;The advantage of directly referencing the server&amp;#8217;s internals is that you can access all the interesting nuts and bolts and bits and pieces. The downside is that there is absolutely no guarantee that your code will work in another version of the server. The internals are by definition the parts that are not meant to be exposed. As such, it is possible that your code does not work or behaves unexpectedly in another version of the server. &lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s not dwell too long on the disadvantages. Instead, let&amp;#8217;s focus on the merits of pluggable information schema tables. Granted, it is inconvenient that we may need to make our code resilient to each different build of the server. However, for many applications, it is not very likely that we have to constantly do that. &lt;/p&gt;
&lt;p&gt;Even if we do have to change our code, the burden will be on the developer of the plug-in. For each specific build of the server, your code may need to be different. Even if the code itself does not change, you will probably at least have to recompile your plug-in for each specific build of the server. However, your users are still not required to recompile the server itself. They can still install the plug-in without stopping or restarting the server, which in many cases seems more important than bearing the burden of changing the code.&lt;/p&gt;
&lt;p&gt;You need to break some eggs to bake an omelet - so if you&amp;#8217;re hungry, you better get over it and start breaking some eggs ;-)&lt;br /&gt;
&lt;h4&gt;The &lt;code&gt;MYSQL_SERVER&lt;/code&gt; define&lt;/h4&gt;
&lt;p&gt;In order to access the server&amp;#8217;s internals beyond the public interface, we need to use some C/C++ preprocessor magic and define &lt;code&gt;MYSQL_SERVER&lt;/code&gt;. This define needs to be present before we include any MySQL header (or source) files:
&lt;pre&gt;
#ifndef &lt;b&gt;MYSQL_SERVER&lt;/b&gt;
#define &lt;b&gt;MYSQL_SERVER&lt;/b&gt;
#endif
&lt;/pre&gt;
&lt;p&gt;Throughout the MySQL codebase, there are many sections that are conditionally included or excluded depending on whether &lt;code&gt;MYSQL_SERVER&lt;/code&gt; is defined. It is hard to pinpoint the exact effect of adding this definition, because there many spots that use this definition to control conditional compilation. &lt;/p&gt;
&lt;p&gt;Normally the &lt;code&gt;MYSQL_SERVER&lt;/code&gt; definition need be present only when compiling the server proper, but in this case we need it to let the plug-in code work with internal structures such as &lt;code&gt;THD&lt;/code&gt; instances directly, that is, without using the accessors provided by the public interface.&lt;/p&gt;
&lt;p&gt;To be absolutely clear: using the &lt;code&gt;MySQL_SERVER&lt;/code&gt; define in your code does &lt;em&gt;not&lt;/em&gt; mean you must compile your plug-in as part of the server. On the contrary - you can compile your plug-ins separately from the server, and still (un)install them at runtime. The only thing the &lt;code&gt;MySQL_SERVER&lt;/code&gt; define does, is pull in the declarations that are normally considered to be &amp;#8216;internal&amp;#8217;. They will for example allow us to work directly with the members of the &lt;code&gt;THD&lt;/code&gt; class instead of being required to use the public accessors defined in &lt;code&gt;plugin.h&lt;/code&gt;.&lt;br /&gt;
&lt;h3&gt;Implementing the &lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt; Information Schema plug-in&lt;/h3&gt;
&lt;p&gt;Now that we sketched the backgrounds, we can quickly proceed and discuss the implementation of the &lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt; information schema plug-in. (Note that you can download the source code file &lt;a href="http://www.xcdsql.org/MySQL/Plugin/mysql_is_savepoints.cc" target="_rpb"&gt;&lt;code&gt;mysql_is_savepoints.cc&lt;/code&gt; here&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Most of the things are rather similar to what was described in &lt;a href="http://rpbouman.blogspot.com/2008/02/mysql-information-schema-plugins-best.html" target="_mysql"&gt;the article describing the &lt;code&gt;MYSQL_HELLO&lt;/code&gt;&lt;/a&gt; plug-in, for which you can still download the &lt;a href="http://www.xcdsql.org/MySQL/Plugin/mysql_is_hello.cc" target="_rpb"&gt;&lt;code&gt;mysql_is_hello.cc&lt;/code&gt;&lt;/a&gt; source code.&lt;/p&gt;
&lt;p&gt;We will do like we did last week and assume the following things are in place on your system:
&lt;ul&gt;
&lt;li&gt;g++, the GNU C++ compiler&lt;/li&gt;
&lt;li&gt;The &lt;a href="http://dev.mysql.com/downloads/mysql/5.1.html#source" target="_mysql"&gt;MySQL 5.1.22 source distribution&lt;/a&gt; - we need to include some of the header files&lt;/li&gt;
&lt;li&gt;A text editor or IDE (like Eclipse with CDT)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Creating the source file&lt;/h4&gt;
&lt;p&gt;First, we need to create a C++ source file. We will assume that the working directory is ~/mysql_is_savepoints/, and that the source file is called mysql_is_savepoints.cc.&lt;br /&gt;
&lt;h4&gt;The MYSQL_SERVER define&lt;/h4&gt;
&lt;p&gt;Like we explained in the previous sections, we need to define &lt;code&gt;MYSQL_SERVER&lt;/code&gt; so we can directly reference the members of the &lt;code&gt;THD&lt;/code&gt; class passed to our &lt;code&gt;fill_table&lt;/code&gt; function.
&lt;pre&gt;
#ifndef MYSQL_SERVER
#define MYSQL_SERVER
#endif
&lt;/pre&gt;
&lt;p&gt;Because this affects how the included files are processed, we do this at the very top of our source file.&lt;br /&gt;
&lt;h4&gt;Include files&lt;/h4&gt;
&lt;p&gt;We can use the same list of includes we used for the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; plug-in - the &lt;code&gt;MYSQL_SERVER&lt;/code&gt; define is responsible for including all the additional things we require to write the &lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt; plug-ins.
&lt;pre&gt;
#include &amp;#60;mysql_priv.h&amp;#62;
#include &amp;#60;stdlib.h&amp;#62;
#include &amp;#60;ctype.h&amp;#62;
#include &amp;#60;mysql_version.h&amp;#62;
#include &amp;#60;mysql/plugin.h&amp;#62;
#include &amp;#60;my_global.h&amp;#62;
#include &amp;#60;my_dir.h&amp;#62;
&lt;/pre&gt;
&lt;h4&gt;Defining the columns&lt;/h4&gt;
&lt;p&gt;For the &lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt; plug-in, we will define two columns: &lt;code&gt;SAVEPOINT_ID&lt;/code&gt; and &lt;code&gt;SAVEPOINT_NAME&lt;/code&gt;. At the SQL level, it will look something like this:
&lt;pre&gt;
+----------------+-------------+------+-----+---------+-------+
&amp;#124; Field          &amp;#124; Type        &amp;#124; Null &amp;#124; Key &amp;#124; Default &amp;#124; Extra &amp;#124;
+----------------+-------------+------+-----+---------+-------+
&amp;#124; SAVEPOINT_ID   &amp;#124; bigint(0)   &amp;#124; NO   &amp;#124;     &amp;#124; 0       &amp;#124;       &amp;#124;
&amp;#124; SAVEPOINT_NAME &amp;#124; varchar(64) &amp;#124; NO   &amp;#124;     &amp;#124;         &amp;#124;       &amp;#124;
+----------------+-------------+------+-----+---------+-------+
&lt;/pre&gt;
&lt;p&gt;&amp;#8230;and this is what it looks like in the C/C++ source file:
&lt;pre&gt;
#define COLUMN_SAVEPOINT_ID 0
#define COLUMN_SAVEPOINT_NAME 1

static ST_FIELD_INFO mysql_is_savepoints_field_info[]=
{
  {"SAVEPOINT_ID", 0, MYSQL_TYPE_LONGLONG, 0, 0, "Savepoint Id"},
  {"SAVEPOINT_NAME", 64, MYSQL_TYPE_STRING, 0, 0, "Savepoint Name"},
  {NULL, 0, MYSQL_TYPE_NULL, 0, 0, NULL, 0}
};
&lt;/pre&gt;
&lt;p&gt;This time, in addition to creating the &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; array of column definitions, we also create &lt;code&gt;#define&lt;/code&gt;s for the array entry indexes. The defines allow us to refer to the column definitions using the names rather than the raw, literal integer array indexes. This has the advantage that we do not have to change code should we want to change the positions of the columns. Another advantage is that our &lt;code&gt;fill_table&lt;/code&gt; code will be easier to read: by consistently referring to &lt;code&gt;COLUMN_SAVEPOINT_ID&lt;/code&gt; and &lt;code&gt;COLUMN_SAVEPOINT_NAME&lt;/code&gt; rather than &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;1&lt;/code&gt; it will be much easier to see what is going on.&lt;br /&gt;
&lt;h4&gt;Filling the table&lt;/h4&gt;
&lt;p&gt;Now we come to the heart of the matter: generating a row for each SQL &lt;code&gt;SAVEPOINT&lt;/code&gt; available in the current session.&lt;/p&gt;
&lt;p&gt;The savepoints for the current session are available in the &lt;code&gt;transaction&lt;/code&gt; member of the &lt;code&gt;THD&lt;/code&gt; class. The &lt;code&gt;transaction&lt;/code&gt; member is an instance of the &lt;code&gt;st_transactions&lt;/code&gt; struct, which is declared locally inside the &lt;code&gt;THD&lt;/code&gt; class:
&lt;pre&gt;
class THD :public Statement,
           public Open_tables_state
{

...many, many lines here...

public:

  struct st_transactions {
    &lt;code&gt;SAVEPOINT *savepoints&lt;/code&gt;;

    ...a few more lines here...

  } transaction;

...many, many more lines here ...

};
&lt;/pre&gt;
&lt;p&gt;Now you might recall that the &lt;code&gt;THD&lt;/code&gt; class is declared in &lt;code&gt;sql/sql_class.h&lt;/code&gt;. However, you might have some trouble locating the &lt;code&gt;transaction&lt;/code&gt; member, because the declaration of the &lt;code&gt;THD&lt;/code&gt; class is extremely large and long-winded: in the MySQL 5.1.22-rc source distribution, it ranges from lines 960 to 1886(!!) - and those 900 something lines make up only the declaration!&lt;/p&gt;
&lt;p&gt;(Although the official explanation for the name of the &lt;code&gt;THD&lt;/code&gt; class is that it is an acronym for &lt;b&gt;TH&lt;/b&gt;read &lt;b&gt;D&lt;/b&gt;escriptor, some developers* explained that it is one of the few class names that is spelled in capitals because it is so incredibly heavy. According to this anecdote, its name should be pronounced as &amp;#8220;&lt;b&gt;&amp;#8230;THUD!!&amp;#8230;THUD!!&amp;#8230;&lt;/b&gt;&amp;#8221; because of the sound it makes each time it is dumped into the argument list of a function that makes up the servers source code. &lt;/p&gt;
&lt;p&gt;* = Thanks to Eric Herman for painting this creative and tangible likeness ;-)&lt;/p&gt;
&lt;p&gt;Anyway, you will find it easier when you look for &lt;code&gt;st_transactions&lt;/code&gt;, or go directly to line 1149, but note that the line number is likely to be different in other versions of the server code.&lt;/p&gt;
&lt;p&gt;Now, we can see that the &lt;code&gt;st_transaction&lt;/code&gt; struct contains a pointer to a &lt;code&gt;SAVEPOINT&lt;/code&gt; pointer called &lt;code&gt;savepoints&lt;/code&gt;. As we shall see later, this is actually the list of savepoints we need. But what kind of type is this &lt;code&gt;SAVEPOINT&lt;/code&gt; exactly?&lt;/p&gt;
&lt;p&gt;Well, to get past this point, you really need some patience and a set of tools that allow you to search the source code. In the case of &lt;code&gt;SAVEPOINT&lt;/code&gt;, it turns out that this is actually a &lt;code&gt;typedef&lt;/code&gt; for the &lt;code&gt;st_savepoint&lt;/code&gt; structure. Now, the odd thing is that this &lt;code&gt;typedef&lt;/code&gt; appears in &lt;code&gt;sql/handler.h&lt;/code&gt;:
&lt;pre&gt;
typedef struct st_savepoint SAVEPOINT;
&lt;/pre&gt;
&lt;p&gt;But the structure &lt;code&gt;st_savepoint&lt;/code&gt; itself is declared in &lt;code&gt;sql/sql_class.h&lt;/code&gt; - that is, the same file that declares &lt;code&gt;THD&lt;/code&gt;, which seems to prefer &lt;code&gt;SAVEPOINT&lt;/code&gt; rather than &lt;code&gt;st_savepoint&lt;/code&gt;!&lt;/p&gt;
&lt;p&gt;Well - it is beyond me why it was done like this. For our purpose it doesn&amp;#8217;t really matter though, let&amp;#8217;s examine the declaration of &lt;code&gt;st_savepoint&lt;/code&gt; instead:
&lt;pre&gt;struct st_savepoint {
  &lt;b&gt;struct st_savepoint *prev;&lt;/b&gt;
  &lt;b&gt;char                *name;&lt;/b&gt;
  uint                 length, nht;
};&lt;/pre&gt;
&lt;p&gt;Here we can see that each &lt;code&gt;st_savepoint&lt;/code&gt; has a &lt;code&gt;char *&lt;/code&gt; member called &lt;code&gt;name&lt;/code&gt;, which is presumably whatever name the user provided in the savepoint syntax:
&lt;pre&gt;
mysql&amp;#62; SAVEPOINT &lt;b&gt;my_savepoint&lt;/b&gt;;
&lt;/pre&gt;
&lt;p&gt; So in this case, the &lt;code&gt;name&lt;/code&gt; member of the &lt;code&gt;st_savepoint&lt;/code&gt; instance corresponding to this SQL &lt;code&gt;SAVEPOINT&lt;/code&gt; will point to the character string &lt;code&gt;"my_savepoint"&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Apart from the &lt;code&gt;name&lt;/code&gt; we can also see that each &lt;code&gt;st_savepoint&lt;/code&gt; has itself a pointer to another &lt;code&gt;st_savepoint&lt;/code&gt; called &lt;code&gt;prev&lt;/code&gt;. This suggests a single linked list of savepoints.&lt;/p&gt;
&lt;p&gt;This is about all the information we need to implement the &lt;code&gt;fill_table&lt;/code&gt; function. So, here it is:
&lt;pre&gt;
int mysql_is_savepoints_fill_table(THD *thd, TABLE_LIST *tables, COND *cond)
{
  int status = 0;                           /* return value for this func, 0=success, 1=error*/
  CHARSET_INFO *scs= system_charset_info;   /* need this to store field into table */
  TABLE *table= (TABLE *)tables-&gt;table;     /* handle to the I_S table. class declared in table.h */
  uint savepoint_id = 0; 

  SAVEPOINT *sv= thd-&gt;transaction.savepoints;

  while(sv &amp;&amp; !status)
  {
    /* store the savepoint sequence into the table column */
    table-&gt;field[COLUMN_SAVEPOINT_ID]-&gt;store(++savepoint_id, 0);
    /* store the savepoint name into the table column */
    table-&gt;field[COLUMN_SAVEPOINT_NAME]-&gt;store(sv-&gt;name, strlen(sv-&gt;name), scs);

    status= schema_table_store_record(thd, table);
    sv= sv-&gt;prev;
  }
  return status;
}
&lt;/pre&gt;
&lt;p&gt;The biggest difference with the &lt;code&gt;fill_table&lt;/code&gt; function we used in the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; example is that instead of just storing one single row, we have a loop, storing one row for each iteration. The loop is initialized by assigning the &lt;code&gt;SAVEPOINT&lt;/code&gt; pointer from the &lt;code&gt;transaction&lt;/code&gt; member from the &lt;code&gt;THD&lt;/code&gt; instance that is passed as the first argument to the &lt;code&gt;fill_table&lt;/code&gt; function to a local &lt;code&gt;sv&lt;/code&gt; variable:
&lt;pre&gt;
SAVEPOINT *sv= thd-&gt;transaction.savepoints;
&lt;/pre&gt;
&lt;p&gt;Of course, it is possible that there are no savepoints in the current session, in which case &lt;code&gt;thd-&gt;transaction.savepoints&lt;/code&gt; will be the &lt;code&gt;NULL&lt;/code&gt; pointer. However, if there are savepoints, a pointer to the &lt;em&gt;last&lt;/em&gt; savepoint that was created in the current session will now be stored in &lt;code&gt;sv&lt;/code&gt;. We can now set up the actual loop:
&lt;pre&gt;
  while(sv &amp;&amp; !status)
  {

    ...lines here...

    status= schema_table_store_record(thd, table);
    sv= sv-&gt;prev;
  }
&lt;/pre&gt;
&lt;p&gt;Note that the loop will be entered only if &lt;code&gt;sv&lt;/code&gt; points to a savepoint. If it does, data from the savepoint is written to the columns of our information schema table. &lt;/p&gt;
&lt;p&gt;In the bottom of the loop, we store the current record using the &lt;code&gt;schema_table_store_record&lt;/code&gt;, which we discussed already for the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; example:
&lt;pre&gt;
    status= &lt;b&gt;schema_table_store_record&lt;/b&gt;(thd, table);
&lt;/pre&gt;
&lt;p&gt;Interestingly, we were required to make a forward declaration to it in the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; example. Now, we don&amp;#8217;t have to do this, presumably because we defined &lt;code&gt;MYSQL_SERVER&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;You might recall that &lt;code&gt;schema_table_store_record&lt;/code&gt; function returns &lt;code&gt;0&lt;/code&gt; in case of success and &lt;code&gt;1&lt;/code&gt; instead of failure. Note that if a failure occurs at this point, the loop will not iterate again, as the &lt;code&gt;while&lt;/code&gt; condition requires &lt;code&gt;status&lt;/code&gt; to be not true (that is, zero). &lt;/p&gt;
&lt;p&gt;After storing the row, the last step of the loop is to move back and examine the previous savepoint:
&lt;pre&gt;
    sv= &lt;b&gt;sv-&gt;prev&lt;/b&gt;;
&lt;/pre&gt;
&lt;p&gt;If the end of the list is reached, &lt;code&gt;sv&lt;/code&gt; will be &lt;code&gt;NULL&lt;/code&gt;, preventing the loop to iterate again. However, if there is in fact a previous savepoint, the loop will run once again and create a new row for that savepoint too, on and on until we reach the end of the list of savepoints.&lt;/p&gt;
&lt;p&gt;In the top of the loop, we store data into the columns of our information schema table:
&lt;pre&gt;
    /* store the savepoint sequence into the table column */
    table-&gt;&lt;b&gt;field[COLUMN_SAVEPOINT_ID]-&gt;store&lt;/b&gt;(++savepoint_id, 0);
    /* store the savepoint name into the table column */
    table-&gt;&lt;b&gt;field[COLUMN_SAVEPOINT_NAME]-&gt;store&lt;/b&gt;(sv-&gt;name, strlen(sv-&gt;name), scs);
&lt;/pre&gt;
&lt;p&gt;This time, we use our defines &lt;code&gt;COLUMN_SAVEPOINT_ID&lt;/code&gt; and &lt;code&gt;COLUMN_SAVEPOINT_NAME&lt;/code&gt; instead of the literal numerical field indexes. We already demonstrated in the MYSQL_HELLO example how to store a string, so we won&amp;#8217;t discuss the line that stores the savepoint&amp;#8217;s name. Instead, let&amp;#8217;s find out how we can store an integer value by looking at the line that stores the savepoint id.&lt;/p&gt;
&lt;p&gt;As you can see, we stipulate the value for the &lt;code&gt;SAVEPOINT_ID&lt;/code&gt; column ourselves by simply adding one for each row:
&lt;pre&gt;
    table-&gt;field[COLUMN_SAVEPOINT_ID]-&gt;store(&lt;b&gt;++savepoint_id&lt;/b&gt;, 0);&lt;/pre&gt;
&lt;p&gt;Savepoints by no means have a numerical ID of their own, but it makes sense to make one up in order to unambigously indicate the order in which the savepoints were created during this session. Note the second argument to the &lt;code&gt;store&lt;/code&gt; method, which is always &lt;code&gt;0&lt;/code&gt; here:
&lt;pre&gt;
    table-&gt;field[COLUMN_SAVEPOINT_ID]-&gt;store(++savepoint_id, &lt;b&gt;0&lt;/b&gt;);&lt;/pre&gt;
&lt;p&gt;This second argument is there to tell the &lt;code&gt;store&lt;/code&gt; method whether the value represents a signed or an unsigned value. In this case, we are storing an &lt;code&gt;unsigned&lt;/code&gt; value - it should be &lt;code&gt;1&lt;/code&gt; for an unsigned value.&lt;br /&gt;
&lt;h4&gt;The rest of the implementation&lt;/h4&gt;
&lt;p&gt;The remainder of the implementation is quite similar to what was discussed for the &lt;code&gt;MySQL_HELLO&lt;/code&gt; example. The most important difference is actually the plug-in name, but otherwise the implementation is identical. Therefore, it is not discussed here further.&lt;br /&gt;
&lt;h3&gt;Building and Installing&lt;/h3&gt;
&lt;p&gt;The build and install process is pretty much similar to that for the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; plug-in.&lt;br /&gt;
&lt;h4&gt;Compiling&lt;/h4&gt;
&lt;p&gt;We can compile the plug-in like this:
&lt;pre&gt;g++ -DMYSQL_DYNAMIC_PLUGIN -Wall -shared
-I/home/user/mysql-5.1.22-rc/include
-I/home/user/mysql-5.1.22-rc/regex
-I/home/user/mysql-5.1.22-rc/sql
-o mysql_is_savepoints.so mysql_is_savepoints.cc
&lt;/pre&gt;
&lt;p&gt;This will create the shared library &lt;code&gt;mysql_is_savepoints.so&lt;/code&gt;.&lt;br /&gt;
&lt;h4&gt;Installing the plug-in&lt;/h4&gt;
&lt;p&gt;You might recall that the shared library needs to be moved to the plug-in directory. After that, we can install the plug-in using the &lt;code&gt;INSTALL PLUGIN&lt;/code&gt; syntax:
&lt;pre&gt;
mysql&amp;#62; INSTALL PLUGIN MYSQL_SAVEPOINTS soname 'mysql_is_savepoints.so';
Query OK, 0 rows affected (0.00 sec)
&lt;/pre&gt;
&lt;h4&gt;Using the plug-in&lt;/h4&gt;
&lt;p&gt;Now we can finally see our plug-in in action. At first, there will be no savepoints present:
&lt;pre&gt;mysql&gt; SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
&lt;b&gt;Empty set (0.02 sec)&lt;/b&gt;
&lt;/pre&gt;
&lt;p&gt;Even if we set one, we won&amp;#8217;t see it immediately:
&lt;pre&gt;
mysql&amp;#62; &lt;b&gt;SAVEPOINT&lt;/b&gt; A;
Query OK, 0 rows affected (0.00 sec)

mysql&amp;#62; SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
&lt;b&gt;Empty set (0.00 sec)&lt;/b&gt;
&lt;/pre&gt;
&lt;p&gt;This is beause by default, the session has autocommit enabled. As each statement is wrapped in its own transaction, we will never be able to see any savepoints. So we disable autocommit:
&lt;pre&gt;
mysql&gt; SET &lt;b&gt;autocommit = OFF&lt;/b&gt;;
Query OK, 0 rows affected (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;And now we can really witness the behaviour of our plug-in:
&lt;pre&gt;
mysql&amp;#62; SAVEPOINT A;
Query OK, 0 rows affected (0.00 sec)

mysql&amp;#62; SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
+--------------+----------------+
&amp;#124; SAVEPOINT_ID &amp;#124; SAVEPOINT_NAME &amp;#124;
+--------------+----------------+
&amp;#124;            1 &amp;#124; A              &amp;#124;
+--------------+----------------+
1 row in set (0.00 sec)

mysql&amp;#62; SAVEPOINT B;
Query OK, 0 rows affected (0.00 sec)

mysql&amp;#62; SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
+--------------+----------------+
&amp;#124; SAVEPOINT_ID &amp;#124; SAVEPOINT_NAME &amp;#124;
+--------------+----------------+
&amp;#124;            1 &amp;#124; B              &amp;#124;
&amp;#124;            2 &amp;#124; A              &amp;#124;
+--------------+----------------+
2 rows in set (0.00 sec)

mysql&amp;#62; ROLLBACK TO SAVEPOINT A;
Query OK, 0 rows affected (0.00 sec)

mysql&amp;#62; SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
+--------------+----------------+
&amp;#124; SAVEPOINT_ID &amp;#124; SAVEPOINT_NAME &amp;#124;
+--------------+----------------+
&amp;#124;            1 &amp;#124; A              &amp;#124;
+--------------+----------------+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;h3&gt;Learn More&lt;/h3&gt;
&lt;p&gt;This has been quite a ride! In this article it was demonstrated how we can use information schema plug-ins to report some of the things that are going on inside the current session. As such, the &lt;code&gt;MYSQL_SAVEPOINTS&lt;/code&gt; plug-in is a big step forward compared to the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; plug-in. &lt;/p&gt;
&lt;p&gt;However, this is just the tip of the iceberg - the current session harbours much more interesting information about the current session, and in a forthcoming article I will demonstrate a number of other usages. In particular, I will present a plug-in to report the user variables in the current session, and the temporary tables defined in the current session.&lt;/p&gt;
&lt;p&gt;In another article, we will also see that it is sometimes possible to look beyond the current session and report on the status of server-wide structures, such as the query cache.&lt;br /&gt;
&lt;h4&gt;Meet us at the user&amp;#8217;s conference&lt;/h4&gt;
&lt;p&gt;When I discussed the &lt;code&gt;MYSQL_HELLO&lt;/code&gt; plug-in, I already hinted that there will be a lot there for those people that want to learn more about extending the server with (information schema) plug-ins. You can find all those links in the bottom of &lt;a href="http://rpbouman.blogspot.com/2008/02/mysql-information-schema-plugins-best.html" target="_rpb"&gt;that article&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In addition, you can learn a lot about the MySQL Server internals. And&amp;#8230;you can learn it from one of the founding fathers: &lt;a href="http://monty-says.blogspot.com/" target="_monty"&gt;Monty himself&lt;/a&gt; will be doing &lt;a href="http://en.oreilly.com/mysql2008/public/schedule/detail/520" target="_conf"&gt;A tour into MySQL&amp;#8217;s internals&lt;/a&gt;. So, I guess that&amp;#8217;s going to be one of those occasions where you get the opportunity to clear up some of those details in the source code you never quite managed to wrap your head around. &lt;/p&gt;
&lt;p&gt;If you &lt;a href="https://en.oreilly.com/mysql2008/public/register" target="_mysqlconf"&gt;register&lt;/a&gt; before the 26th of februari, you&amp;#8217;ll get a $200 discount. There are more discounts available depending on whether you attended before, or if you register together with a number of colleagues; there&amp;#8217;s special student and non-profit discounts too - &lt;a href="https://en.oreilly.com/mysql2008/public/register" target="_mysqlconf"&gt;check it out here&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;See you at the conference! (Bonus points for the first one to ask Monty in the Q&amp;#38;A why &lt;code&gt;SAVEPOINT&lt;/code&gt; is typedef-ed in &lt;code&gt;sql/handler.h&lt;/code&gt; instead of &lt;code&gt;sql/sql_class.h&lt;/code&gt;; double bonus points for the first one to ask Monty if &lt;code&gt;THD&lt;/code&gt; is really called like that because it is so heavy ;-)
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>MySQL Information Schema Plugins: the best kept secret of MySQL 5.1</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2008/02/mysql_information_schema_plugi.html" />
    <id>tag:www.oreillynet.com,2008:/databases/blog//6.22892</id>
    
    <published>2008-02-01T13:01:08Z</published>
    <updated>2008-02-01T13:18:02Z</updated>
    
    <summary>MySQL 5.1 offers an extremely useful feature called information_schema plugins. This feature allows dynamic runtime loading of a shared library into the MySQL server to implement a table in the information_schema database. The SQL standard (ISO/IEC 9075-11:2003) allows database implementations...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Articles" />
            <category term="Technical" />
        <content type="html">
&lt;p&gt;&lt;a href=""&gt;MySQL 5.1&lt;/a&gt; offers an extremely useful feature called &lt;em&gt;information_schema plugins&lt;/em&gt;. This feature allows dynamic runtime loading of a shared library into the MySQL server to implement a table in the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/information-schema.html" target="_mysql"&gt;information_schema&lt;/a&gt; database. The SQL standard (ISO/IEC 9075-11:2003) allows database implementations to extend the &lt;code&gt;information_schema&lt;/code&gt;. MySQL 5.1 transfers the possibility to do this directly to privileged database users so they can extend the &lt;code&gt;information_schema&lt;/code&gt; themselves, in any way they see fit.&lt;/p&gt;
&lt;p&gt;In this article, we will demonstrate how to create a minimal &lt;a href="http://www.xcdsql.org/MySQL/Plugin/mysql_is_hello.cc" target="_xcdsql"&gt;&amp;#8220;Hello, World!&amp;#8221; MySQL information schema plugin&lt;/a&gt;. In a forthcoming article, we&amp;#8217;ll demonstrate how information schema plugins may be used to report some of the server&amp;#8217;s internals such as the contents of the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/query-cache.html" target="_mysql"&gt;query cache&lt;/a&gt;, session level objects such as the currently defined &lt;code&gt;TEMPORARY&lt;/code&gt; tables, &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/user-variables.html" target="_mysql"&gt;user-defined variables&lt;/a&gt; and &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/savepoints.html" target="_mysql"&gt;&lt;code&gt;SAVEPOINT&lt;/code&gt;s&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;The MySQL Plug-in API&lt;/h3&gt;
&lt;p&gt;Information Schema plug-ins are a subfeature of the MySQL &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugin-api.html" target="_mysql"&gt;plug-in API&lt;/a&gt;.&lt;br /&gt;
The plug-in API is one of the new features in the upcoming release of the &lt;a href="http://www.mysql.com/" target="_mysql"&gt;MySQL&lt;/a&gt; database server, &lt;a href="http://dev.mysql.com/downloads/mysql/5.1.html" target="_mysql"&gt;MySQL 5.1&lt;/a&gt; (which is currently a release candidate). In essence, the MySQL plugin API provides a &lt;em&gt;generic extension point&lt;/em&gt; to the MySQL server. It allows users to load a &lt;em&gt;shared library&lt;/em&gt; in order to add new functionality to the server. &lt;/p&gt;
&lt;p&gt;Plug-ins can be loaded and unloaded using the MySQL specific &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/install-plugin.html" target="_mysql"&gt;&lt;code&gt;INSTALL PLUGIN&lt;/code&gt;&lt;/a&gt; and &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/uninstall-plugin.html" target="_mysql"&gt;&lt;code&gt;UNINSTALL PLUGIN&lt;/code&gt;&lt;/a&gt; syntax respectively. A key feature is that this process is completely dynamic - the server need not be re-compiled and need not be stopped in order to benefit from the functionality of a new plugin. Hence, new functionality can be added without suffering any downtime.&lt;/p&gt;
&lt;p&gt;In some respects, the new plugin feature resembles the since long supported &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/create-function.html" target="_mysql"&gt;user-defined function&lt;/a&gt; (UDF) feature. Both plugins and UDFs involve dynamically loading a shared library to extend the server&amp;#8217;s functionality. Like UDFs, plug-ins are usually written in C/C++. The difference between UDFs and plug-ins is that the UDF feature can be used only for adding new functions to use within the server&amp;#8217;s SQL dialect. The concept of a plug-in is more broadly applicable and can be used to extend the server in more ways.&lt;/p&gt;
&lt;p&gt;Currently, plug-ins are not supported on Microsoft Windows. MySQL is working to lift this limitation but it is at present unclear when this feature will be available for windows.&lt;/p&gt;
&lt;h4&gt;Currently supported plug-in types&lt;/h4&gt;
&lt;p&gt;Currently, the MySQL plug-in API supports the following types of plugins:
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html" target="_mysql"&gt;Storage engine&lt;/a&gt;: can be used to implement special-purpose row stores for data, which can then be accessed through SQL. Arguably, the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/pluggable-storage.html" target="_mysql"&gt;Pluggable Storeage Engine Interface&lt;/a&gt; is one of the key benefits of MySQL 5.1&lt;/li&gt;
&lt;li&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugin-full-text-plugins.html" target="_mysql"&gt;Full-Text parser&lt;/a&gt;: can be used for custom indexing of text-data as well as specialized handling of &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html" target=""&gt;&lt;code&gt;FULLTEXT&lt;/code&gt;&lt;/a&gt; query-expressions.&lt;/li&gt;
&lt;li&gt;Daemon: a daemon plug-in can be used to execute a background process.&lt;/li&gt;
&lt;li&gt;Information Schema table: are used to implement a &amp;#8216;virtual&amp;#8217; table in the MySQL &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/information-schema.html" target="_mysql"&gt;information_schema&lt;/a&gt; to report the status of for example the operating system or the server&amp;#8217;s internals&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently the plug-in API does not provide support for UDFs but it is expected that in due time, the current user-defined function feature will be merged into the plug-in API.&lt;/p&gt;
&lt;h4&gt;A closer look at the plug-in API&lt;/h4&gt;
&lt;p&gt;(Note: in this article I will repeatedly refer to a number of C/C++ header and source files that are part of the &lt;a href="http://dev.mysql.com/downloads/mysql/5.1.html#source" target="_mysql"&gt;MySQL 5.1.22-rc source distribution&lt;/a&gt;. Any path that starts with &lt;code&gt;mysql-5.1.22-rc/&lt;/code&gt; should be taken to mean the corresponding path beneath the root of an unpacked MySQL 5.1.22-rc source distribution.)&lt;/p&gt;
&lt;p&gt;For all plug-in types, the interface comprises a generic &lt;em&gt;plug-in description structure&lt;/em&gt;, which is a &lt;code&gt;struct&lt;/code&gt; called &lt;code&gt;st_mysql_plugin&lt;/code&gt;. This struct is defined in the &lt;code&gt;plugin.h&lt;/code&gt; header file, located in the &lt;code&gt;mysql-5.1.22-rc/include/mysql&lt;/code&gt; directory. &lt;/p&gt;
&lt;p&gt;The declaration of this structure is as follows:
&lt;pre&gt;
/*
  Plugin description structure.
*/

struct st_mysql_plugin
{
  int type;             /* the plugin type (a MYSQL_XXX_PLUGIN value)   */
  void *info;           /* pointer to type-specific plugin descriptor   */
  const char *name;     /* plugin name                                  */
  const char *author;   /* plugin author (for SHOW PLUGINS)             */
  const char *descr;    /* general descriptive text (for SHOW PLUGINS ) */
  int license;          /* the plugin license (PLUGIN_LICENSE_XXX)      */
  int (*init)(void *);  /* the function to invoke when plugin is loaded */
  int (*deinit)(void *);/* the function to invoke when plugin is unloaded */
  unsigned int version; /* plugin version (for SHOW PLUGINS)            */
  struct st_mysql_show_var *status_vars;
  struct st_mysql_sys_var **system_vars;
  void * __reserved1;   /* reserved for dependency checking             */
};
&lt;/pre&gt;
&lt;p&gt;Through this structure, the plug-in implementor provides the following things:
&lt;ul&gt;
&lt;li&gt;A number of data members that convey some meta data about the plug-in. These members are used to list the plug-in in the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugins-table.html" target="_mysql"&gt;&lt;code&gt;information_schema.PLUGINS&lt;/code&gt;&lt;/a&gt; system view once it is loaded. Among these is the member &lt;code&gt;char *name&lt;/code&gt;. The value that the plug-in implementor provides for the &lt;code&gt;name&lt;/code&gt; member is the name that is to be used for the &lt;code&gt;INSTALL PLUGIN&lt;/code&gt; and &lt;code&gt;UNINSTALL PLUGIN&lt;/code&gt; SQL syntax. For information schema plug-ins, the &lt;code&gt;name&lt;/code&gt; member is also used as the table name of the information schema table implemented by the plug-in.&lt;/li&gt;
&lt;li&gt;Optionally, an array of system variables through which the plug-in may be controlled, and an array of status variables so the plug-in may report its (dynamic) status to the server. &lt;/li&gt;
&lt;li&gt;pointers to functions that are called upon when the plug-in is loaded (the &lt;code&gt;plugin_init&lt;/code&gt; function) and unloaded (the &lt;code&gt;plugin_deinit&lt;/code&gt; function)&lt;/li&gt;
&lt;li&gt;A pointer to an object or structure that provides the actual functionality to implement a plug-in of the specific type&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The type-dependent part of a plug-in implementation is usally a struct containing a number of function pointers (&amp;#8217;hooks&amp;#8217;) which are called in a particular sequence that is appropriate for that particular type of plug-in.&lt;/p&gt;
&lt;h3&gt;Information Schema Plugins&lt;/h3&gt;
&lt;p&gt;So what is an &lt;em&gt;information_schema plug-in&lt;/em&gt; exactly? Well, the &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/information-schema.html" target="_mysql"&gt;information_schema&lt;/a&gt;&lt;/code&gt; is a &lt;em&gt;virtual database&lt;/em&gt; which is primarily intended as a meta data facility: it acts as a container for a collection of read-only &amp;#8216;tables&amp;#8217; or rather &lt;em&gt;system views&lt;/em&gt; that serve to provide data about the database server itself. As such, it is defined as part of the SQL standard (ISO/IEC 9075-11:2003). &lt;/p&gt;
&lt;p&gt;The SQL standard describes a number of views that should appear in the information_schema, and mysql provides partial built-in support for these standard information schema views. The standard also expressly allows database implementors to extend the information_schema by adding new views, or extending the specified tables by adding columns. MySQL information schema plug-ins simply form an interface to allow privileged database users to extend the information schema themselves by writing their own information schema table implementations.&lt;br /&gt;
&lt;h4&gt;The type-specific API for Information Schema plug-ins&lt;/h4&gt;
&lt;p&gt;The type-specific part of the plug-in API for information schema plug-ins is formed by the &lt;code&gt;struct&lt;/code&gt; called &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt;, which is defined in &lt;code&gt;mysql-5.1.22-rc/sql/table.h&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When the plug-in is loaded, the server calls the &lt;code&gt;plugin_init&lt;/code&gt; function associated with the &lt;code&gt;int (*init)(void *)&lt;/code&gt; member of the &lt;code&gt;st_mysql_plugin&lt;/code&gt; struct. When called, the server passes  a pointer to an instance of a &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; struct to the &lt;code&gt;init&lt;/code&gt; function. Inside the &lt;code&gt;plugin_init&lt;/code&gt; function, it is expected that the plug-in implementor initializes two members of that structure:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fields_info&lt;/code&gt; - an array of &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; structures. Each entry in this array corresponds to a column in the information schema table. The &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; structure is also declared in &lt;code&gt;mysql-5.1.22-rc/sql/table.h&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fill_table&lt;/code&gt; - a pointer to a function that is called whenever the server wants to obtain rows from the information_schema table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; struct has more members still, but only these two need to be initialized by the plug-in implementor. As long as the plug-in is loaded, the &lt;code&gt;information_schema&lt;/code&gt; contains a new table that has a definition corresponding to the column definitions provided by the array assigned to the &lt;code&gt;fields_info&lt;/code&gt; member. Whenever that table is queried, the function assigned to the &lt;code&gt;fill_table&lt;/code&gt; member is called to actually construct the table&amp;#8217;s rows. &lt;/p&gt;
&lt;p&gt;These two elements really is all there is to the specific API for information schema plug-ins.&lt;br /&gt;
&lt;h3&gt;Writing an Information Schema Plugin&lt;/h3&gt;
&lt;p&gt;In this section, I will demonstrate how to write a minimal &amp;#8220;Hello World!&amp;#8221; MySQL information schema plugin. If you like, you  can &lt;a href="http://www.xcdsql.org/MySQL/Plugin/mysql_is_hello.cc" target="_xcdsql"&gt;download the C++ source code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Apart from a text editor, writing a simple, bare bones information schema plug-in requires no more than a C++ compiler and a number of MySQL&amp;#8217;s C/C++ header files. &lt;/p&gt;
&lt;p&gt;The following examples assume a GNU/Linux environment, a simple text editor, the &lt;code&gt;g++&lt;/code&gt; compiler and the MySQL header files. In order to get the necessary header files it is best to &lt;a href="http://dev.mysql.com/downloads/mysql/5.1.html#source" target="_mysql"&gt;obtain a MySQL source distribution&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;(For the purpose of this article, we will assume MySQL 5.1.22-rc. Until MySQL 5.1 is generally available, you are advised to always compile your plug-ins using only the header files shipped with the version of the product whereto the plug-in will be deployed.)&lt;br /&gt;
&lt;h4&gt;Creating the source file&lt;/h4&gt;
&lt;p&gt;First, we need to create a C++ source file. We will assume that the working directory is &lt;code&gt;~/mysql_is_hello/&lt;/code&gt;, and that the source file is called &lt;code&gt;mysql_is_hello.cc&lt;/code&gt;&lt;br /&gt;
&lt;h4&gt;Includes&lt;/h4&gt;
&lt;p&gt;In the top of our source file, we need the include the following header files:&lt;/p&gt;
&lt;pre&gt;
#include &amp;#60;mysql_priv.h&amp;#62;
#include &amp;#60;stdlib.h&amp;#62;
#include &amp;#60;ctype.h&amp;#62;
#include &amp;#60;mysql_version.h&amp;#62;
&lt;b&gt;#include &amp;#60;mysql/plugin.h&amp;#62;&lt;/b&gt;
#include &amp;#60;my_global.h&amp;#62;
#include &amp;#60;my_dir.h&amp;#62;
&lt;/pre&gt;
&lt;p&gt;The inclusion of &lt;code&gt;mysql/plugin.h&lt;/code&gt; is most important, but currently the other includes are pulled in at some point. So far, this set of includes seems to do the trick.&lt;br /&gt;
&lt;h4&gt;Defining the columns&lt;/h4&gt;
&lt;p&gt;It was just mentioned that at some point, the plug-in implementor must provide an array of &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; structures that defines the columns of the information schema table. For this example, we&amp;#8217;ll set up a minimal table definition consisting of one &lt;code&gt;VARCHAR(64)&lt;/code&gt; column in order to show a &amp;#8220;Hello, world&amp;#8221; message. If we flash-forward for a moment - this is the structure of the table we&amp;#8217;d like to define:
&lt;pre&gt;
mysql&amp;#62; desc information_schema.MYSQL_HELLO;
+-------+-------------+------+-----+---------+-------+
&amp;#124; Field &amp;#124; Type        &amp;#124; Null &amp;#124; Key &amp;#124; Default &amp;#124; Extra &amp;#124;
+-------+-------------+------+-----+---------+-------+
&amp;#124; HELLO &amp;#124; varchar(64) &amp;#124; NO   &amp;#124;     &amp;#124;         &amp;#124;       &amp;#124;
+-------+-------------+------+-----+---------+-------+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;In order to achieve that, we need to declare our &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; array like this:
&lt;pre&gt;
static ST_FIELD_INFO mysql_is_hello_field_info[]=
{
  {"HELLO", 64, MYSQL_TYPE_VARCHAR, 0, 0, "Hello", NULL},
  {NULL, 0, MYSQL_TYPE_NULL, NULL, NULL, NULL, NULL}
};
&lt;/pre&gt;
&lt;p&gt;The last entry of this array is a dummy that serves as a marker for the end of the array. It is very important to always conclude the array with one such entry. Without such a trailing entry, the server does not know where the array ends. This would of course be a very bad thing, and is likely to result in a crash as soon as the plugin is loaded or the information schema table is accessed.&lt;/p&gt;
&lt;p&gt;Now, the first &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; entry in the array defines the actual column. When we look at the declaration of &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; in &lt;code&gt;mysql-5.1.22-rc/sql/table.h&lt;/code&gt; we get a better idea of what an individual column definition is made of:
&lt;pre&gt;
typedef struct st_field_info
{
  const char* field_name;
  uint field_length;
  enum enum_field_types field_type;
  int value;
  uint field_flags;        // Field atributes(maybe_null, signed, unsigned etc.)
  const char* old_name;
  uint open_method;
} ST_FIELD_INFO;
&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;field_name&lt;/code&gt; - this is used as column name&lt;/li&gt;
&lt;li&gt;&lt;code&gt;field_length&lt;/code&gt; - for string-type columns, this is the maximum number of characters. Otherwise, it is the &amp;#8216;display-length&amp;#8217; for the column&lt;/li&gt;
&lt;li&gt;&lt;code&gt;field_type&lt;/code&gt; - a value drawn from &lt;code&gt;enum_field_types&lt;/code&gt;. This enum is declared in &lt;code&gt;mysql-5.1.22-rc/include/mysql_com.h&lt;/code&gt; and denotes a data type for the column. For the most part, there seems to be one entry in the enum for each SQL data type, although there seem to be a number of additional entries in the enum.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;value&lt;/code&gt; - Unfortunately, I haven&amp;#8217;t got the faintest idea what this is supposed to do. It does not seem to be used by any of the built-in information schema tales&lt;/li&gt;
&lt;li&gt;&lt;code&gt;field_flags&lt;/code&gt; - This is used to set column attributes. By default, columns are &lt;code&gt;NOT NULL&lt;/code&gt; and &lt;code&gt;SIGNED&lt;/code&gt;, and you can deviate from the default by setting the appopriate flags. You can use either one of the flags &lt;code&gt;MY_I_S_MAYBE_NULL&lt;/code&gt; and &lt;code&gt;MY_I_S_UNSIGNED&lt;/code&gt; or combine them using the bitwise or operator &lt;code&gt;&amp;#124;&lt;/code&gt;. Both flags are defined in &lt;code&gt;mysql-5.1.22-rec/sql/table.h&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;old_name&lt;/code&gt; - I believe this is used by the built-in information schema tables to denote the column name for the corresponding &lt;code&gt;SHOW&lt;/code&gt; statement. It&amp;#8217;s not really applicable to plug-ins I guess, but I always provide a value here, usually a lower case version of the value for the &lt;code&gt;name&lt;/code&gt; member&lt;/li&gt;
&lt;li&gt;&lt;code&gt;open_method&lt;/code&gt; - this should be one of &lt;code&gt;SKIP_OPEN_TABLE&lt;/code&gt;, &lt;code&gt;OPEN_FRM_ONLY&lt;/code&gt; or &lt;code&gt;OPEN_FULL_TABLE&lt;/code&gt;. I admit I don&amp;#8217;t really understand when to choose which option here. It seems likely that it defines in what manner the server must interact with the table, but I can&amp;#8217;t boast any detailed knowledge in this matter. However, I do know that we can simply use &lt;code&gt;SKIP_OPEN_TABLE&lt;/code&gt; for this simple example.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Filling the table&lt;/h4&gt;
&lt;p&gt;We explained that apart from defining the table columns, the plug-in implementor must also provide a function to actually deliver the rows. Before we can implement the actual &lt;code&gt;fill_table&lt;/code&gt; function, we first need a forward declaration of a function that stores one single row in an information schema table:
&lt;pre&gt;
bool schema_table_store_record(THD *thd, TABLE *table);
&lt;/pre&gt;
&lt;p&gt;This function is defined in &lt;a href="http://dev.mysql.com/sources/doxygen/mysql-5.1/sql__show_8cc.html" target="_doxy"&gt;&lt;code&gt;mysql-5.1.22-rc/sql/sql_show.cc&lt;/code&gt;&lt;/a&gt;. It&amp;#8217;s role will become clear in a moment.&lt;/p&gt;
&lt;p&gt;Now we can create the actual &lt;code&gt;fill_table&lt;/code&gt; function:
&lt;pre&gt;&lt;b&gt;int mysql_is_hello_fill_table&lt;/b&gt;(
  THD *thd
, TABLE_LIST *tables
, COND *cond
)
{
  int status;
  CHARSET_INFO *scs= system_charset_info;
  TABLE *table= (TABLE *)tables-&gt;table;

  const char *str = "plugin: hello, information_schema!!!";
  table-&gt;field[0]-&gt;store(
    str
  , strlen(str)
  , scs
  );

  status = schema_table_store_record(
    thd
  , table
  );

  return status;
}&lt;/pre&gt;
&lt;p&gt;The server passes a number of arguments to the &lt;code&gt;fill_table&lt;/code&gt; function:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;THD *thd&lt;/code&gt; - this is a pointer to an instance of the thread descriptor class. In practice, you can think of this as a direct handle to the current session. The &lt;code&gt;THD&lt;/code&gt; is declared in &lt;code&gt;mysql-5.1.22-dev/sql/sql_class.h&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TABLE_LIST *tables&lt;/code&gt; - This is a list of &lt;code&gt;st_table&lt;/code&gt; objects. The first entry in that list corresponds to the runtime representation of the table we are implementing as it appears in a query. &lt;code&gt;TABLE_LIST&lt;/code&gt; and &lt;code&gt;st_table&lt;/code&gt; are defined in &lt;code&gt;mysql-5.1.22-rc/sql/table.h&lt;/code&gt;. From what I could infer, it seems that in many cases &lt;code&gt;TABLE&lt;/code&gt; (which is a &lt;code&gt;typedef&lt;/code&gt; of &lt;code&gt;st_table&lt;/code&gt; defined in &lt;code&gt;mysql-5.1.22-dev/sql/handler.h&lt;/code&gt;) is used rather than &lt;code&gt;st_table&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;COND&lt;/code&gt; - This is used to pass the instance of the &lt;code&gt;Item&lt;/code&gt; class that holds the internal representation of the &lt;code&gt;WHERE&lt;/code&gt;-clause. This could then be used by the plugin to return only the rows that are required as per the &lt;code&gt;WHERE&lt;/code&gt; condition - something like a pushed down condition. We will ignore this argument for now - the plug-in implementor may use it to optimize the &lt;code&gt;fill_table&lt;/code&gt; function, but is not required to do so. So ignoring this argument will not lead to wrong results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fill method first obtains a handle to the runtime representation of the information schema table proper:
&lt;pre&gt;
  TABLE *table= &lt;b&gt;(TABLE *)tables-&gt;table&lt;/b&gt;;
&lt;/pre&gt;
&lt;p&gt;Note that &lt;code&gt;TABLE&lt;/code&gt; or &lt;code&gt;st_table&lt;/code&gt; structure is the &lt;em&gt;runtime&lt;/em&gt; representation of a table, i.e. a table as it appears in a SQL query. This is different from the &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; structure which is used to &lt;em&gt;define&lt;/em&gt; an information schema table.&lt;/p&gt;
&lt;p&gt;We then write the message &lt;code&gt;"plugin: hello, information_schema!!!"&lt;/code&gt; to the table&amp;#8217;s column:
&lt;pre&gt;
  char *str = "plugin: hello, information_schema!!!";
  &lt;b&gt;table-&gt;field[0]-&gt;store&lt;/b&gt;(
    str
  , strlen(str)
  , scs
  );
&lt;/pre&gt;
&lt;p&gt;So, each &lt;code&gt;TABLE&lt;/code&gt; structure (a.k.a &lt;code&gt;st_table&lt;/code&gt;) has an array of &lt;code&gt;Field&lt;/code&gt; instances which is the runtime representations of table&amp;#8217;s columns. Here, one of the &lt;code&gt;store&lt;/code&gt; methods is called on the &lt;code&gt;Field&lt;/code&gt; entry at the 0th index in the &lt;code&gt;field&lt;/code&gt; array. Note that the &lt;code&gt;Field&lt;/code&gt; at the 0th index corresponds to the &lt;em&gt;1st&lt;/em&gt; column in our SQL table.&lt;/p&gt;
&lt;p&gt;Like we just saw for &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; versus &lt;code&gt;TABLE&lt;/code&gt;, the &lt;code&gt;Field&lt;/code&gt; class is a &lt;em&gt;runtime&lt;/em&gt; representation of a table column, not to be confused with the &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; which merely &lt;em&gt;defines&lt;/em&gt; a column. &lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Field&lt;/code&gt; class is declared in &lt;code&gt;mysql-5.1.22-rc/sql/field.h&lt;/code&gt;. It declares a number of overloaded methods to store a value into the current row. In this example we use a variant that is appropriate to store a character string:
&lt;pre&gt;
/* Store functions returns 1 on overflow and -1 on fatal error */
virtual int  store(const char *to, uint length,CHARSET_INFO *cs)=0;
&lt;/pre&gt;
&lt;p&gt;Finally, we get to call the &lt;code&gt;schema_table_store_record&lt;/code&gt; function. This finalizes the process of storing a row into the information schema table. This method returns either &lt;code&gt;0&lt;/code&gt; (in case of success) or &lt;code&gt;1&lt;/code&gt; (in case of error). Because our example plugin stores just one row, we can simply conclude by returning the result of &lt;code&gt;schema_table_store_record&lt;/code&gt; as the result of our &lt;code&gt;fill_table&lt;/code&gt; function. &lt;/p&gt;
&lt;p&gt;In most practical applications, one would likely have a loop to repeatedly store a row in the table, and one would have to interrupt the normal completion of that loop as soon as the &lt;code&gt;schema_table_store_record&lt;/code&gt; returns something not equal to &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;Putting together the plug-in type-specific implementation&lt;/h4&gt;
&lt;p&gt;At this point we&amp;#8217;ve taken care of the individial elements to implement an information schema table: we created a &lt;code&gt;ST_FIELD_INFO&lt;/code&gt; array to define the columns of the table, and we provided an implementation for the &lt;code&gt;fill_table&lt;/code&gt; function to actually deliver the rows. What we still need to do though is hook these elements up with the &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; structure that forms the type-specific part of an information schema plug-in. &lt;/p&gt;
&lt;p&gt;Like mentioned before, a pointer to a &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; instance is passed to the plug-in&amp;#8217;s &lt;code&gt;init_plugin&lt;/code&gt; function. We can now implement the &lt;code&gt;init_plugin&lt;/code&gt; function for our specfic plug-in and set it up to use our &lt;code&gt;fields_info&lt;/code&gt; and &lt;code&gt;fill_table&lt;/code&gt; implementations:
&lt;pre&gt;
static int mysql_is_hello_plugin_init(void *p)
{
  ST_SCHEMA_TABLE *schema= (ST_SCHEMA_TABLE *)p;

  schema-&gt;&lt;b&gt;fields_info= mysql_is_hello_field_info;&lt;/b&gt;
  schema-&gt;&lt;b&gt;fill_table= mysql_is_hello_fill_table;&lt;/b&gt;

  return 0;
}
&lt;/pre&gt;
&lt;p&gt;As we can see, there is very little to do here - we simply assign our implementations to the appropriate members of the &lt;code&gt;ST_SCHEMA_TABLE&lt;/code&gt; instance, and return &lt;code&gt;0&lt;/code&gt; to indicate a successful initialization.&lt;/p&gt;
&lt;p&gt;We can immediately take care of the &lt;code&gt;plugin_deinit&lt;/code&gt; function too:
&lt;pre&gt;
static int mysql_is_hello_plugin_deinit(void *p)
{
  return 0;
}
&lt;/pre&gt;
&lt;p&gt;In this case, we can get away with this simple dummy implementation. &lt;/p&gt;
&lt;p&gt;In real-world examples, a plug-in might require some resource like memory or a file. In these cases, the &lt;code&gt;plugin_init&lt;/code&gt; function would claim these resources, and the &lt;code&gt;plugin_deinit&lt;/code&gt; function would be used to free up those resources again.&lt;br /&gt;
&lt;h4&gt;Putting together the generic plug-in implementation&lt;/h4&gt;
&lt;p&gt;With the previous step, we concluded the process of creating the plug-in type specific implementation for an information_schema plug-in. The final touch is that we have to hook this up to the generic part of plug-in interface, that is, we have to provide a &lt;code&gt;st_mysql_plugin&lt;/code&gt; structure for our plug-in.&lt;/p&gt;
&lt;p&gt;Rather than doing so directly, we use the predefined macros &lt;code&gt;mysql_declare_plugin&lt;/code&gt; and &lt;code&gt;mysql_declare_plugin_end&lt;/code&gt; for that. (Both these macros are defined in &lt;code&gt;mysql-5.1.22-rc/include/mysql/plugin.h&lt;/code&gt;):
&lt;pre&gt;
struct st_mysql_information_schema mysql_is_hello_plugin=
{ MYSQL_INFORMATION_SCHEMA_INTERFACE_VERSION };

&lt;b&gt;mysql_declare_plugin(mysql_is_hello)&lt;/b&gt;
{
  MYSQL_INFORMATION_SCHEMA_PLUGIN,                 /* type constant    */
  &amp;mysql_is_hello_plugin,                          /* type descriptor  */
  &lt;b&gt;"MYSQL_HELLO"&lt;/b&gt;,                                   /* Name             */
  "Roland Bouman (http://rpbouman.blogspot.com/)", /* Author           */
  "Says hello.",                                   /* Description      */
  PLUGIN_LICENSE_GPL,                              /* License          */
  &lt;b&gt;mysql_is_hello_plugin_init&lt;/b&gt;,                      /* Init function    */
  &lt;b&gt;mysql_is_hello_plugin_deinit&lt;/b&gt;,                    /* Deinit function  */
  0x0010,                                          /* Version (1.0)    */
  NULL,                                            /* status variables */
  NULL,                                            /* system variables */
  NULL                                             /* config options   */
}
&lt;b&gt;mysql_declare_plugin_end&lt;/b&gt;;
&lt;/pre&gt;
&lt;p&gt;We pass &lt;code&gt;mysql_is_hello&lt;/code&gt; to the &lt;code&gt;mysql_declare_plugin&lt;/code&gt; macro, and end our plug-in descriptor with the &lt;code&gt;mysql_declare_plugin_end&lt;/code&gt;. As far as I can see, these macros are there to take care of some plumbing to allow definition of multiple plug-ins within the same source file. &lt;/p&gt;
&lt;p&gt;An important element in putting together the plug-in descriptor is the assignment of the &lt;code&gt;plugin_init&lt;/code&gt; and &lt;code&gt;plugin_deinit&lt;/code&gt; functions which we discussed in the previous section. Assigning them here to the appropriate members of the plug-in descriptor ensures that the server knows what it must do in order to instantiate the plug-in. &lt;/p&gt;
&lt;p&gt;Another important element is assigning the plug-in name, which we chose to be &lt;code&gt;MYSQL_HELLO&lt;/code&gt;. We already explained that this name is later used in the &lt;code&gt;INSTALL PLUGIN&lt;/code&gt; and &lt;code&gt;DEINSTALL PLUGIN&lt;/code&gt; syntax, and that it is also used as the table name for the information_schema table.&lt;br /&gt;
&lt;h3&gt;Building and installing the plugin&lt;/h3&gt;
&lt;p&gt;Now that we have created the source file we must compile it and then install the plugin into our server.&lt;br /&gt;
&lt;h4&gt;Compiling the plugin source file&lt;/h4&gt;
&lt;p&gt;Assuming the source file &lt;code&gt;mysql_is_hello.cc&lt;/code&gt; is located in the current working directory and &lt;code&gt;/home/user/mysql-5.1.22-rc&lt;/code&gt; is the path to the MySQL 5.1.22 source distribution, the following line can be used to compile the source file:
&lt;pre&gt;
g++ &lt;b&gt;-DMYSQL_DYNAMIC_PLUGIN&lt;/b&gt; -Wall -shared
-I/home/roland/mysql-5.1.22-rc/include
-I/home/roland/mysql-5.1.22-rc/regex
-I/home/roland/mysql-5.1.22-rc/sql
&lt;b&gt;-o mysql_is_hello.so&lt;/b&gt; mysql_is_hello.cc&lt;/pre&gt;
&lt;p&gt;Note that this is all on one line - I added line breaks to make it easier to read. If all goes well, this should result in a shared object file called &lt;code&gt;mysql_is_hello.so&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Apart from the common &lt;code&gt;g++&lt;/code&gt; options &lt;code&gt;-I&lt;/code&gt; (include path), &lt;code&gt;-W&lt;/code&gt; (warnings, &lt;code&gt;-Wall&lt;/code&gt; means &amp;#8220;all warnings&amp;#8221;), &lt;code&gt;-shared&lt;/code&gt; (to compile as a shared, dynamically linkable library), and &lt;code&gt;-o&lt;/code&gt; (output file) we see the specific &lt;code&gt;-DMYSQL_DYNAMIC_PLUGIN&lt;/code&gt;. The &lt;code&gt;-D&lt;/code&gt; option is there to define a constant (like a &lt;code&gt;#define&lt;/code&gt; directive). &lt;code&gt;-DMYSQL_DYNAMIC_PLUGIN&lt;/code&gt; triggers some conditional compilation magic that allows the relevant plug-in definitions to be exposed so they are visible from a program that dynamically links the shared object file. So, this option is required to make the plug-in pluggable.&lt;br /&gt;
&lt;h4&gt;The plugin directory&lt;/h4&gt;
&lt;p&gt;Once we obtained the &lt;code&gt;mysql_is_hello.so&lt;/code&gt; shared object file, we must copy it to the plug-in directory of our installed MySQL 5.1.22 binary. Note that it is not necessary to build MySQL from source - you should be able to install the plug-in in a pre-built MySQL installed from a binary distribution.&lt;/p&gt;
&lt;p&gt;The exact location of the plug-in directory is dependent upon specific MySQL distribution and configuration. You can find out its current location by querying the value of the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#option_mysqld_plugin_dir" target="_mysql"&gt;plugin_dir&lt;/a&gt; system variable:
&lt;pre&gt;
mysql&amp;#62; show variables like &lt;b&gt;'plugin_dir'&lt;/b&gt;;
+---------------+-----------------------------------------+
&amp;#124; Variable_name &amp;#124; Value                                   &amp;#124;
+---------------+-----------------------------------------+
&amp;#124; plugin_dir    &amp;#124; &lt;b&gt;/home/roland/mysql-5.1.22-dev/lib/mysql&lt;/b&gt; &amp;#124;
+---------------+-----------------------------------------+
1 row in set (0.01 sec)&lt;/pre&gt;
&lt;p&gt;So in this case, the shared library &lt;code&gt;mysql_is_hello.so&lt;/code&gt; should be copied to &lt;code&gt;/home/roland/mysql-5.1.22-dev/lib/mysql&lt;/code&gt;.&lt;br /&gt;
&lt;h4&gt;Installing the plugin&lt;/h4&gt;
&lt;p&gt;Once the shared library is in place we can install it using the &lt;code&gt;INSTALL PLUGIN&lt;/code&gt; syntax:
&lt;pre&gt;
mysql&amp;#62; install plugin &lt;b&gt;MYSQL_HELLO&lt;/b&gt; soname &lt;b&gt;'mysql_is_hello.so'&lt;/b&gt;;
Query OK, 0 rows affected (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;Note that we use the name &lt;code&gt;MYSQL_HELLO&lt;/code&gt; for the plugin: this is what we defined earlier as the &lt;code&gt;name&lt;/code&gt; member of the &lt;code&gt;st_mysql_plugin&lt;/code&gt; plug-in descriptor. Likewise, we use &lt;code&gt;mysql_is_hello.so&lt;/code&gt;, which is the file name of our shared object file as &lt;code&gt;soname&lt;/code&gt;. The plug-in directory is implied - it should not be possible to install a shared library located at any place outside the plug-in directory.&lt;/p&gt;
&lt;p&gt;In order to install a plug-in in this manner, the user has to have privileges to &lt;code&gt;INSERT&lt;/code&gt; into the &lt;code&gt;mysql.plugin&lt;/code&gt; table, or have the &lt;code&gt;SUPER&lt;/code&gt; privilege.&lt;/p&gt;
&lt;p&gt;There is a common problem that might occur at this point:
&lt;pre&gt;
ERROR 1127 (HY000): Can't find symbol '_mysql_plugin_interface_version_' in library&lt;/pre&gt;
&lt;p&gt;If you see a message like this, it is likely that you forgot to include the &lt;code&gt;-DMYSQL_DYNAMIC_PLUGIN&lt;/code&gt; option when compiling the plugin. Adding this option to the &lt;code&gt;g++&lt;/code&gt; compile line is required to create a dynamically loadable plug-in.&lt;br /&gt;
&lt;h4&gt;Verifying installation&lt;/h4&gt;
&lt;p&gt;We can now check if the plug-in is correctly installed. We do this by querying the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugins-table.html" target="_mysql"&gt;&lt;code&gt;PLUGINS&lt;/code&gt;&lt;/a&gt; table in the &lt;code&gt;information_schema&lt;/code&gt;:
&lt;pre&gt;
mysql&amp;#62; select * from &lt;b&gt;information_schema.plugins&lt;/b&gt;
    -&amp;#62; where &lt;b&gt;plugin_name = 'MYSQL_HELLO'&lt;/b&gt;G
*************************** 1. row ***************************
           PLUGIN_NAME: MYSQL_HELLO
        PLUGIN_VERSION: 0.16
         PLUGIN_STATUS: ACTIVE
           PLUGIN_TYPE: INFORMATION SCHEMA
   PLUGIN_TYPE_VERSION: 50122.0
        PLUGIN_LIBRARY: mysql_is_hello.so
PLUGIN_LIBRARY_VERSION: 1.0
         PLUGIN_AUTHOR: Roland Bouman (http://rpbouman.blogspot.com/)
    PLUGIN_DESCRIPTION: Says hello.
        PLUGIN_LICENSE: GPL
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;h4&gt;Using the plug-in&lt;/h4&gt;
&lt;p&gt;Finally, we get to test our plug-in ;-)
&lt;pre&gt;
mysql&amp;#62; select * from &lt;b&gt;information_schema.mysql_hello&lt;/b&gt;;
+--------------------------------------+
&amp;#124; HELLO                                &amp;#124;
+--------------------------------------+
&amp;#124; plugin: hello, information_schema!!! &amp;#124;
+--------------------------------------+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;Of course, this is a gloriously useless application of information schema plug-ins. In a next installment I&amp;#8217;ll demonstrate that you can do pretty cool stuff with these information schema plug-ins, such as peeking inside the query cache, listing the currently defined savepoints, temporary tables, user variables and much more.&lt;br /&gt;
&lt;h4&gt;Uninstalling the plugin&lt;/h4&gt;
&lt;p&gt;When you get tired of the plugin you can uninstall it using the &lt;code&gt;UNINSTALL PLUGIN&lt;/code&gt; syntax:
&lt;pre&gt;
mysql&amp;#62; uninstall plugin MYSQL_HELLO;
Query OK, 0 rows affected (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;Note that currently, due to &lt;a href="http://bugs.mysql.com/bug.php?id=33731" target="_mysql"&gt;a bug&lt;/a&gt;, you must be sure to use the &lt;em&gt;exact&lt;/em&gt; same name for uninstalling the plugin as you did for installing it. I suspect this will be fixed soon, but for now it is best to simply always use the same name, for example the exact name used in the code, &lt;code&gt;MYSQL_HELLO&lt;/code&gt;.&lt;br /&gt;
&lt;h3&gt;Learn More&lt;/h3&gt;
&lt;p&gt;I will be posting more about information schema plugins shortly. In particular, I will demonstrate how you can report status on server internals such as the query cache to discover which queries are in the cache, the number of blocks they are using and the number of bytes they actually occupying. However, the best way to learn more about extending the MySQL server, the MySQL plug-in API, and the MySQL information_schema is to visit &lt;a href="http://en.oreilly.com/mysql2008/public/content/home" target="_mysqlconf"&gt;the MySQL user&amp;#8217;s conference&lt;/a&gt;, April 14-17 2008 in Santa Clara CA, USA. &lt;/p&gt;
&lt;p&gt;There is a number of great sessions on this and related topics:
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2008/public/schedule/detail/877"&gt;Extending MySQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2008/public/schedule/detail/367"&gt;Past, Present, and Future of the MySQL Plugin API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2008/public/schedule/detail/116" target="_mysqlconf"&gt;Code generators for MySQL Plugins and User Defined Functions (UDFs)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://en.oreilly.com/mysql2008/public/schedule/detail/65" target="_mysqlconf"&gt;Developing INFORMATION_SCHEMA Plugins&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, once you are at the conference, there will be many MySQL server developers giving you ample opportunity to ask them about some particular details, or maybe have them look at your code. And if you &lt;a href="https://en.oreilly.com/mysql2008/public/register" target="_mysqlconf"&gt;Register&lt;/a&gt; by February 26, 2008 you&amp;#8217;ll save up to $200. &lt;/p&gt;
&lt;p&gt;If you can&amp;#8217;t wait: other great sources of information may be found in the list below:
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/plugin-api.html" target="_mysql"&gt;The MySQL Plugin Interface&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://krow.net/talks/PluggageInformationSchemaVancouver2007.pdf"&gt;How to create a information schema plugins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://krow.livejournal.com/481313.html" target="_krow"&gt;Creating a Daemon Plugin&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.markleith.co.uk/?p=18" target="_leith"&gt;Monitoring OS statistics with INFORMATION_SCHEMA plugins&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cheers, and happy hacking ;-)
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Sun buys MySQL</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2008/01/sun_buys_mysql.html" />
    <id>tag:www.oreillynet.com,2008:/databases/blog//6.22779</id>
    
    <published>2008-01-16T14:35:42Z</published>
    <updated>2008-01-16T14:35:43Z</updated>
    
    <summary>Earlier today, Sun announced that it will be acquiring MySQL.  This is an interesting turn of events in the endless battle over MySQL by Oracle.</summary>
    <author>
        <name>Jonah Harris</name>
            </author>
            <category term="News" />
        <content type="html">
&lt;p&gt;Earlier today, Sun announced that it will be acquiring MySQL.  This is an interesting turn of events in Oracle&amp;#8217;s silent battle over MySQL.  With Falcon still years away from being production-ready, and Oracle owning the most popular and stable storage engine for MySQL (InnoDB), what are your thoughts on this acquisition and the effects (both positive and negative) it may bring to end-users?&lt;/p&gt;
&lt;p&gt;Several of the announcements can be found below:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://blogs.sun.com/jonathan/entry/winds_of_change_are_blowing"&gt;http://blogs.sun.com/jonathan/entry/winds_of_change_are_blowing&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://blogs.mysql.com/kaj/sun-acquires-mysql.html/"&gt;http://blogs.mysql.com/kaj/sun-acquires-mysql.html/&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://biz.yahoo.com/bw/080116/20080116005349.html?.v=1"&gt;http://biz.yahoo.com/bw/080116/20080116005349.html?.v=1&lt;/a&gt;&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Calculating the Financial Median in MySQL</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/12/calculating_the_financial_medi.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.22616</id>
    
    <published>2007-12-17T10:47:43Z</published>
    <updated>2007-12-17T10:47:43Z</updated>
    
    <summary>I believe I found a new method to calculate the median in MySQL. I would not be surprised if this method has been figured out by somebody else already. However, I can&apos;t seem to find any resources on the internet...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Technical" />
        <content type="html">
&lt;p&gt;I believe I found a new method to calculate the median in MySQL. I would not be surprised if this method has been figured out by somebody else already. However, I can&amp;#8217;t seem to find any resources on the internet describing this method, so for now I flatter myself by assuming the method is original.&lt;/p&gt;
&lt;p&gt;(Please do post your comments to this blog to correct me on that should I be wrong so I have a chance to rectify.)&lt;/p&gt;
&lt;p&gt;The method I&amp;#8217;m describing is a one-pass, pure SQL method. It does not require subqueries, cursors or user variables. However, it does rely on the MySQL specific functions &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat" target="_mysql"&gt;GROUP_CONCAT()&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substring-index" target="_mysql"&gt;SUBSTRING_INDEX()&lt;/a&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ll be maintaining &lt;a href="http://forge.mysql.com/snippets/view.php?id=114" target="_mysqlforge"&gt;a snippet&lt;/a&gt; for this method at &lt;a href="http://forge.mysql.com/" target="_mysqlforge"&gt;MySQL Forge&lt;/a&gt;.&lt;br /&gt;
If you want to know what the median is, and how my snippet works, read on.&lt;/p&gt;
&lt;h3&gt;Some background&lt;/h3&gt;
&lt;p&gt;Like the &lt;em&gt;mean&lt;/em&gt; and the &lt;em&gt;mode&lt;/em&gt;, the median is an important metric to characterize the distribution of values in a collection. If we have a ordered collection of (numerical) values, the median is the value for which the number of entries that has a value that is higher than the median is exactly equal to the number of entries that has a value that is lower than the median. If there is an odd number of entries in the collection, the value of the median corresponds to the value of the entry that lies exactly in the middle of the list. If there is an even number of entries, the median is calculated as the mean of the two middle values.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.mysql.com" target="_mysql"&gt;MySQL&lt;/a&gt; offers a number of &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html" target="_mysql"&gt;aggregate functions&lt;/a&gt;. Unfortunately, MySQL does not offer a function to calculate the &lt;a href="http://en.wikipedia.org/wiki/Median" target="_wiki"&gt;median&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;Even though MySQL does not support a &lt;code&gt;MEDIAN()&lt;/code&gt; function natively, it is still possible to calculate it. You can:
&lt;ul&gt;
&lt;li&gt;install one of the numerous &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html" target="_mysql"&gt;UDF&lt;/a&gt;&amp;#8217;s floating around on the internet&lt;/li&gt;
&lt;li&gt;Use pure SQL like suggested in &lt;a href="http://books.google.com/books?id=Hi9fMnOoRtAC&amp;printsec=frontcover#PPA514,M1" target="_smarties"&gt;Chapter 23&lt;/a&gt; of &lt;a href="http://www.celko.com/" target="_celko"&gt;Joe Celko&lt;/a&gt;&amp;#8217;s &lt;a href="http://books.google.com/books?id=Hi9fMnOoRtAC" target="_smarties"&gt;SQL for Smarties&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;The snippet&lt;/h3&gt;
&lt;p&gt;Here&amp;#8217;s a snippet that shows how to calculate the median replacement cost for a film in the &lt;a href="http://dev.mysql.com/doc/sakila/en/sakila.html" target="_mysql"&gt;sakila sample database&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;
select
        (
            substring_index(                          -- left median: max value in lower half:
                substring_index(
                    group_concat(                     --   list all values in ascending order
                        f.replacement_cost
                        order by f.replacement_cost
                    )
                ,   ','
                ,   ceiling(count(*)/2)               --   left half of the list
                )
            ,   ','
            ,   -1                                    --   keep only the last value in list
            )
        +   substring_index(                          -- right median: min value in upper half:
                substring_index(
                    group_concat(                     --   list all values in ascending order
                        f.replacement_cost
                        order by f.replacement_cost
                    )
                ,   ','
                ,   -ceiling(count(*)/2)              --   right half of the list
                )
            ,   ','
            ,   1                                     --   keep only the first value in list
            )
        ) / 2                                         -- average of left and right medians
        as median
from    sakila.film f;
&lt;/pre&gt;
&lt;p&gt;(For the latest version, refer to &lt;a href="http://forge.mysql.com/snippets/view.php?id=114" target="_mysqlforge"&gt;MySQL Forge&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;So, how does this snippet work? In the remainder of this post, I&amp;#8217;ll explain the inner workings of this method in a top-down fashion.&lt;br /&gt;
&lt;h3&gt;The mean of the left and right median&lt;/h3&gt;
&lt;p&gt;The method I&amp;#8217;m describing always takes the mean of the &amp;#8216;left&amp;#8217; and &amp;#8216;right&amp;#8217; median.
&lt;pre&gt;
select
        (
            &lt;b&gt;left-median(f.replacement_cost)&lt;/b&gt;
        +   &lt;b&gt;right-median(f.replacement_cost)&lt;/b&gt;
        ) / 2                                         -- average of left and right medians
        as median
from    sakila.film f;
&lt;/pre&gt;
&lt;p&gt;(Note that the usage of &lt;code&gt;left-median()&lt;/code&gt; and &lt;code&gt;right-median()&lt;/code&gt; is just an explanation of the structure - in reality there are not two distinct functions by that name)&lt;/p&gt;
&lt;p&gt;The terms &amp;#8216;left median&amp;#8217; and &amp;#8216;right median&amp;#8217; are not common so they need an explanation. &lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s visualize the process to determine the median. We can do this by imagining that we have an ordered list of values and that we point our left index finger to the lowest value in the list and our right index finger to the highest value in the list. &lt;/p&gt;
&lt;p&gt;Now, we look at our fingers. If there is more than one entry between our right and left finger, we move or left finger one entry to the right and our right finger one entry to the left, and we keep doing that until there are no more entries between our left and right finger. Once we&amp;#8217;re there, the value of the entry that is pointed to by our left finger is the &amp;#8216;left median&amp;#8217; and the value of the entry pointed to by our right finger is the &amp;#8216;right median&amp;#8217;. &lt;/p&gt;
&lt;p&gt;If we had an even number of entries, then the left and right median each correspond to distinct entries - if there was an odd number of entries then the left and right median correspond to one and the same entry.&lt;/p&gt;
&lt;p&gt;At any rate, once we found the left and right median, it is clear that their mean is the true median. If we have an even number of entries, we have to calculate the mean of the two middle values anyway, and if there is an odd number of entries, taking the mean of two identical values results of course in that same value which does thus result in a correct value for the median.&lt;br /&gt;
&lt;h3&gt;&lt;code&gt;GROUP_CONCAT&lt;/code&gt;: an ordered list of values&lt;/h3&gt;
&lt;p&gt;In the example, we use &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat" target="_mysql"&gt;&lt;code&gt;GROUP_CONCAT&lt;/code&gt;&lt;/a&gt; to generate a list of values in ascending order:
&lt;pre&gt;
&lt;b&gt;GROUP_CONCAT&lt;/b&gt;(                     --   list all values in ascending order
    f.replacement_cost
    &lt;b&gt;ORDER BY&lt;/b&gt; f.replacement_cost
)
&lt;/pre&gt;
&lt;p&gt;This gets us a string consisting of concatenated &lt;code&gt;replacement_cost&lt;/code&gt; values in ascending order, separated by the default separator, which is a comma (&lt;code&gt;','&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;We use the same list in both the calculation of the left and the right median.&lt;/p&gt;
&lt;p&gt;Note that the length of the concatenation result returned by &lt;code&gt;GROUP_CONCAT()&lt;/code&gt; is limited. By default, it is as small as 1024 bytes. Personally I think this is way too small so I have it configured to be 64K by default. You can set the length at runtime too like this:
&lt;pre&gt;
    SET group_concat_max_len := 65535
&lt;/pre&gt;
&lt;p&gt;You can specify larger values than 65535 too, and I suspect that the maximum packet size is the practical maximum:
&lt;pre&gt;
    SET group_concat_max_len := @@max_allowed_packet
&lt;/pre&gt;
&lt;p&gt;To inspect the current value, you can do this:
&lt;pre&gt;
    SELECT @@group_concat_max_len
&lt;/pre&gt;
&lt;h3&gt;Getting the &amp;#8216;left&amp;#8217; half of the list&lt;/h3&gt;
&lt;p&gt;Once we have the list of values, we can split it in two halves with little effort. We do this using the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substring-index" target="_mysql"&gt;&lt;code&gt;SUBSTRING_INDEX()&lt;/code&gt;&lt;/a&gt; function.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;SUBSTRING_INDEX()&lt;/code&gt; function processes a string argument and gets a substring based on the position of a particular occurrence of another substring. &lt;/p&gt;
&lt;p&gt;In this case, the comma &lt;code&gt;','&lt;/code&gt; is the substring that separates the values in our ordered list. But of all the commas in the list, which occurrence of the comma do we need to find? &lt;/p&gt;
&lt;p&gt;Suppose our list contains 4 values. Then, these values are separated by three comma&amp;#8217;s:
&lt;pre&gt;
values: '1&lt;b&gt;,&lt;/b&gt;2&lt;b&gt;,&lt;/b&gt;3&lt;b&gt;,&lt;/b&gt;4'
commas:   ^ &lt;b&gt;^&lt;/b&gt; ^
          1 &lt;b&gt;2&lt;/b&gt; 3
&lt;/pre&gt;
&lt;p&gt;If we want to divide this list in two equal halves, then the second comma is the divisor between the left and right halves of our list. With &lt;code&gt;SUBSTRING_INDEX()&lt;/code&gt; this expression would get us the left half of the this list:
&lt;pre&gt;
    SUBSTRING_INDEX('1,2,3,4', ',', 2) -- the substring up to the 2nd occurrence of ','
&lt;/pre&gt;
&lt;p&gt;and the result will be:
&lt;pre&gt;
    '1,2'
&lt;/pre&gt;
&lt;p&gt;So now we have the first half of the list, and by definition, the last entry in that list, &lt;code&gt;'2'&lt;/code&gt; is the left median of the original list &lt;code&gt;'1,2,3,4'&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;Now what if we would&amp;#8217;ve had an odd number of entries in our list? Supose our list would&amp;#8217;ve been like this:
&lt;pre&gt;
values: '1&lt;b&gt;,&lt;/b&gt;2&lt;b&gt;,&lt;/b&gt;3'
commas:   ^ &lt;b&gt;^&lt;/b&gt;
          1 &lt;b&gt;2&lt;/b&gt;
&lt;/pre&gt;
&lt;p&gt;In this case too, we need the second comma to end up with a left substring that has the left median as last entry in the list (which also happens to be the proper median because this is a list with an odd number of entries).&lt;/p&gt;
&lt;p&gt;As it turns out, we can conveniently generalize the &lt;code&gt;SUBSTRING_INDEX&lt;/code&gt; expression like this:
&lt;pre&gt;
    SUBSTRING_INDEX(list, separator, &lt;b&gt;CEILING(#entries/2)&lt;/b&gt;)
&lt;/pre&gt;
&lt;p&gt;In other words, if we divide the number of entries in our list by two, and then round to the nearest higher integer, this gives us the particular occurrence of the separator what we are looking for to halve our list as required. Of course, calculating the number of entries is simply a matter of using the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_count" target="_mysql"&gt;&lt;code&gt;COUNT()&lt;/code&gt;&lt;/a&gt; aggregate function.&lt;br /&gt;
&lt;h3&gt;Excising the &amp;#8216;left&amp;#8217; median&lt;/h3&gt;
&lt;p&gt;To actually obtain the left median itself, we just need to excise the last value from the left half of the list of values. We do this by applying &lt;code&gt;SUBSTRING_INDEX()&lt;/code&gt; again. &lt;/p&gt;
&lt;p&gt;Again, we need to make a substring in terms of the occurrence of the comma that separates the values in our list. This time, we need to get the substring found directly after the last comma in the list. With &lt;code&gt;SUBSTRING_INDEX()&lt;/code&gt; we can conveniently express this in the following manner:
&lt;pre&gt;
    SUBSTRING_INDEX(list, separator, &lt;b&gt;-1&lt;/b&gt;)
&lt;/pre&gt;
&lt;p&gt;This means: search the list from right to left and find the first occurrence of the separator. Return the substring that appears after the separator (that is, the substring appearing on the right hand of the separator).&lt;br /&gt;
&lt;h3&gt;Getting the &amp;#8216;right&amp;#8217; median&lt;/h3&gt;
&lt;p&gt;The process to obtain the right median is a mirror of obtaining the left median: instead of obtaining the &lt;em&gt;last&lt;/em&gt; value in the &lt;em&gt;left&lt;/em&gt; half of the ordered list of values, we now need to obtain the &lt;em&gt;first&lt;/em&gt; value of the &lt;em&gt;right&lt;/em&gt; half of the list. This is actually as simple as reversing the sign of the occurrence argument in the &lt;code&gt;SUBSTRING_INDEX()&lt;/code&gt; calls:
&lt;pre&gt;
    left half:   SUBSTRING_INDEX(list, separator,  CEILING(COUNT(*)/2))
    right half:  SUBSTRING_INDEX(list, separator, &lt;b&gt;-&lt;/b&gt;CEILING(COUNT(*)/2))

    last entry:  SUBSTRING_INDEX(list, separator, &lt;b&gt;-&lt;/b&gt;1)
    first entry: SUBSTRING_INDEX(list, separator,  1)
&lt;/pre&gt;
&lt;h3&gt;A few remarks&lt;/h3&gt;
&lt;p&gt;I think that in many cases, this can be a fair method to calculate the median. The advantage of this method is that it is relatively fast because the query itself is relatively simple.&lt;/p&gt;
&lt;p&gt;It would be interesting to see how this method behaves when handling millions of rows. Maybe I will run some benchmarks on that later on. &lt;/p&gt;
&lt;p&gt;In the mean while, feel free to post your thoughts, suggstions or critique on this blog.
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Random RDBMS and SQL Myths debunked</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/11/random_rdbms_and_sql_myths_deb.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.22311</id>
    
    <published>2007-11-09T09:44:25Z</published>
    <updated>2007-11-09T09:44:36Z</updated>
    
    <summary>A few times now, I&apos;ve been wanting to write this down. I know: a lot of people will go *shrug*. Others may find me pedantic. Some of will say I&apos;m being a smart-ass. Whatever...but I just got to write down...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Articles" />
        <content type="html">
&lt;p&gt;A few times now, I&amp;#8217;ve been wanting to write this down. I know: a lot of people will go &lt;b&gt;*shrug*&lt;/b&gt;. Others may find me pedantic. Some of will say I&amp;#8217;m being a smart-ass. Whatever&amp;#8230;but I just got to write down a few of these common misconceptions that keep floating around. &lt;/p&gt;
&lt;p&gt;None of these misconceptions are really harmful - in most cases, they do not lead to misunderstanding or miscommunication. However, when you are &lt;em&gt;writing&lt;/em&gt; about these subjects, you&amp;#8217;ll often find that a sloppy definition you used in some place will bite you in the tail, and make it harder to explain something later on. So, that is why I from time to time get kind of obsessed with finding just the right words. &lt;/p&gt;
&lt;p&gt;I&amp;#8217;m not pretending I have the right words though. But there are a few informal ways of saying things that at a glance look right but are in fact wrong. Here&amp;#8217;s a random list of some of them:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;PRIMARY KEY&lt;/code&gt; and &lt;code&gt;UNIQUE&lt;/code&gt; constraints are unique indexes&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Wrong - an  index is just a convenient implementation to detect duplicate entries, and this is used by all RDBMS-es I am familiar with to &lt;em&gt;implement&lt;/em&gt; &lt;code&gt;PRIMARY KEY&lt;/code&gt; and &lt;code&gt;UNIQUE&lt;/code&gt; constraints. However, the fact that there is a distinction is evident in for example the Oracle SQL syntax. For example, in &lt;code&gt;&lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#i2103845"&gt;ALTER TABLE ... DROP CONSTRAINT&lt;/a&gt;&lt;/code&gt; you can specify whether the associated index should be kept or also discarded.&lt;/p&gt;
&lt;p&gt;Some people argue that it does not make sense to make the distinction in case the RDBMS does not maintain the constraint and index as separate objects. (This is the case in for example &lt;a href="http://www.mysql.com/" target="_mysql"&gt;MySQL&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Well, maybe&amp;#8230;but I disagree. When I think about constraints, I&amp;#8217;m thinking about business rules and guarding them to maintain database integrity. When talking about indexes, I&amp;#8217;m thinking about performance and access paths. Quite different things, and in my opinion a shame to throw away the words to express the difference in my opinion.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;A table consists of rows and columns&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;No - there is nothing wrong with an empty table. In other words, it does not &lt;em&gt;consist&lt;/em&gt; of rows. It may or may not &lt;em&gt;contain&lt;/em&gt; rows, but that is a different story.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;A scalar subquery returns one column and one row&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Wrong - first of all, a scalar subquery may return zero rows, in which case it evaluates to null, which is perfectly valid. But there is more to it.&lt;/p&gt;
&lt;p&gt;Whether something is or is not a subquery is matter of syntaxis. The SQL grammer is defined so that if you encounter a query between parenthesis where a scalar value is appropriate, then that query (including the parentheses) will be &lt;em&gt;parsed&lt;/em&gt; as a scalar subquery. In other words, the text satisfies the production rule for the non-terminal symbol &amp;#8220;scalar subquery&amp;#8221;.&lt;/p&gt;
&lt;p&gt;The parser will usually be smart enough to verify whether the subquery yields one column, but the number of rows returned is a runtime affair.&lt;/p&gt;
&lt;p&gt;Suppose the query that makes up the scalar subquery would in fact return more than one row&amp;#8230;would it suddenly not be a scalar subquery anymore? Of course not. It is still a scalar subquery - it just happens to be impossible to execute it. In other words, it violates the semantics of a scalar subquery and is therefore invalid. But the mere fact that we can conlcude that must imply that it is a scalar subquery.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;A subquery is a &lt;code&gt;SELECT&lt;/code&gt; statement that appears as a part of another &lt;code&gt;SELECT&lt;/code&gt; statement&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Wrong - For the same reasons as the previous issue. A &lt;em&gt;statement&lt;/em&gt; is a syntactical construct. It has to do with discovering a pattern in a piece of text so that it satisfies a particular rule in the SQL grammer. That grammar does not have a rule that allows &lt;em&gt;statements&lt;/em&gt; to be nested - not in pure SQL anyway (Of course, in stored procedures, one can have statement blocks like &lt;code&gt;BEGIN...END&lt;/code&gt;, &lt;code&gt;IF...END IF&lt;/code&gt; etc that really can contain other statements)&lt;/p&gt;
&lt;p&gt;Of course, if we would take the &lt;code&gt;SELECT&lt;/code&gt; that makes up the subquery and run it in isolation, it would be a &lt;code&gt;SELECT&lt;/code&gt;-&lt;em&gt;statement&lt;/em&gt;. Bit that is exactly the heart of the matter: because we are regarging it as part of another statement, it cannnot be a statement itself. This is simply a matter of definition of course - most people will immediately understand what is meant.&lt;/p&gt;
&lt;p&gt;What would be better to say though is that a subquery is a &lt;em&gt;query&lt;/em&gt; or &lt;em&gt;query expression&lt;/em&gt; that appears as part of another SQL statement. However, this is also not correct: &lt;code&gt;CREATE VIEW&lt;/code&gt; for example does contain a query expression, but this would most likely not be called a &lt;em&gt;sub&lt;/em&gt;query. For this particular case, you can argue that there is nothing &lt;em&gt;sub&lt;/em&gt;-ish about the query expression, because it is simply an essential part of the &lt;code&gt;CREATE VIEW&lt;/code&gt; statement.&lt;/p&gt;
&lt;p&gt;But what to think of &lt;code&gt;CREATE TABLE...AS SELECT...&lt;/code&gt; and &lt;code&gt;INSERT INTO...SELECT&lt;/code&gt;? The query expression is certainly not an essential part of &lt;code&gt;CREATE TABLE&lt;/code&gt; and &lt;code&gt;INSERT INTO&lt;/code&gt;, and in that sense, the query does look like it is subordinate to the statement it is part of.&lt;/p&gt;
&lt;p&gt;You could argue that a query is a subquery if it appears inside another query. That seems sound, but what to think of &lt;code&gt;UPDATE ... SET = (SELECT ...)&lt;/code&gt;? Personally I am reluctant to call an &lt;code&gt;UPDATE&lt;/code&gt; statement a &lt;em&gt;query&lt;/em&gt; - I tend to think of a &lt;em&gt;query&lt;/em&gt; as a &lt;code&gt;SELECT&lt;/code&gt; statement or sometimes a &lt;code&gt;query expression&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I can think of only one thing that really is a defining characteristic of a subquery though - that is that the query expression must appear within parentheses. So, again, a matter of syntax more than a matter of semantics. I must admit I&amp;#8217;m still not very satisfied with this though&amp;#8230;What do you think?&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;NULL&lt;/code&gt; is the absence of a value&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Variants of this statement go like &amp;#8220;&lt;code&gt;NULL&lt;/code&gt; is a missing value&amp;#8221; or &amp;#8220;&lt;code&gt;NULL&lt;/code&gt; is not a value&amp;#8221;.&lt;/p&gt;
&lt;p&gt;With slight doubt, I say: wrong. It certainly is true that many people use &lt;code&gt;NULL&lt;/code&gt; to convey that something is not there or that something is not applicable. But this is a matter of choice, it does not change the meaning of &lt;code&gt;NULL&lt;/code&gt; itself. If we use the same line of reasoning as we used for the subquery myth, we must conclude that &lt;code&gt;NULL&lt;/code&gt; is certainly a valid value expression. It can legally appear anywhere where we can put a value. It is IMO also perfectly ok to say things like &amp;#8220;&amp;#8230;that expression evaluates to &lt;code&gt;NULL&lt;/code&gt;&amp;#8220;.&lt;/p&gt;
&lt;p&gt;So what does the SQL standard say? Well, here&amp;#8217;s a quote:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&amp;#8230;the null value is neither equal to any other value nor not equal to any other value &amp;#8212; it is unknown&lt;br /&gt;
whether or not it is equal to any given value&amp;#8230;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;So, I&amp;#8217;m in that camp too: &lt;code&gt;NULL&lt;/code&gt; is a value, and if we have a &lt;code&gt;NULL&lt;/code&gt; in say, the integer domain, we just don&amp;#8217;t know which of all possible integers it is.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Foreign keys must reference a primary key&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Wrong - a Unique constraint is mostly just as acceptable.&lt;/p&gt;
&lt;p&gt;In MySQLs InnoDB it is even more relaxed - the foreign key only needs to reference the prefix of an index in the parent table, although this is so exotic, it should probably be ignored.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;This table is not normalized - it still contains redundancy&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Wrong - a table is normalized when it is in the first Normal form. There are a few different opinions what that means exactly, but it usually works to say that a table is not normalized when it contains repeating groups.&lt;/p&gt;
&lt;p&gt;A slightly stronger statement is to say that a table is not normalized when it contains data that is not atomic. This is stronger because it does not cover only repeating groups, but also columns that, for a single row, do not contain a single value. For example, a first name/last name combination in one column is not atomic, and therefore, a table that contains such values is not normalized. (There are opinions that require even more than this, but for practical purposes the sense of atomic values works pretty well.)&lt;/p&gt;
&lt;p&gt;The source of confusion is in what happens beyond the first normal form. Although a table maybe normalized, it can still contain redundancy. By removing redundancy, you can progressively achieve a higher normal form. In many cases, one would require at least third normal form or the Boyce-Codd normal form for building database schemas. Many people say &amp;#8220;normalized&amp;#8221; when they actually mean &amp;#8220;in at least the third normal form&amp;#8221;.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;So - what do you think? Pedantic? Have other myths? Maybe you have a good, satisfactory definition for subqueries? Or maybe you find an error in my debunkings? &lt;/p&gt;
&lt;p&gt;Just drop me a comment on this post - thanks in advance.
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Kettle Tip: Using java locales for a Date Dimension</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/09/kettle_tip_using_java_locales.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.21558</id>
    
    <published>2007-09-03T23:18:06Z</published>
    <updated>2007-09-04T20:23:43Z</updated>
    
    <summary>The Date dimension is a well known construct in general data warehousing. In many cases, the data for a date dimension is generated using a database stored procedure or shell-script. Another approach to obtain the data for a date dimension...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Articles" />
            <category term="Technical" />
        <content type="html">
&lt;p&gt;The &lt;a href="http://kimballgroup.com/html/designtipsPDF/KimballDT61HandlingAll.pdf" target="_info"&gt;Date dimension&lt;/a&gt; is a well known construct in general &lt;a href="http://en.wikipedia.org/wiki/Data_warehouse" target="_info"&gt;data warehousing&lt;/a&gt;. In many cases, the data for a date dimension is generated using a database stored procedure or shell-script. &lt;/p&gt;
&lt;p&gt;Another approach to obtain the data for a date dimension is to generate it using an ETL tool like &lt;a href="http://www.pentaho.org/" target="_pentaho"&gt;Pentaho&lt;/a&gt; Data Integration, a.k.a. &lt;a href="http://kettle.pentaho.org/"&gt;Kettle&lt;/a&gt;. I think this approach makes sense for a number of reasons:
&lt;ul&gt;
&lt;li&gt;When you tend to use a particular ETL tool, you will be able to reuse the date dimension generator over an over, and on different database platforms.&lt;/li&gt;
&lt;li&gt;You won&amp;#8217;t need special database privileges beyond the ones you need already. Privileges for creating tables and to perform DML will usually be available, whereas you might need to convince a DBA that you require extra privileges to create and execute stored procedures.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to these general considerations, you can pull a neat little trick with Kettle to localize the data and format of the date attributes. I wouldn&amp;#8217;t go as far as to say that this feature is Kettle specific: rather, it relies on the localization support built into the &lt;a href="http://java.sun.com/" target="_java"&gt;java platform&lt;/a&gt;  and the way you can put that to use in Kettle transformations.&lt;/p&gt;
&lt;h3&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;In this tip, the steps to create a date dimension are described using &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=140317&amp;package_id=186321" target="_kettle"&gt;Kettle 2.5.1&lt;/a&gt; (Generally available Release) and &lt;a href="http://dev.mysql.com/downloads/mysql/5.1.html"&gt;MySQL 5.1.20&lt;/a&gt; (Beta). You will be able to follow through the example using earlier (and later) versions of both products though - I am not using any functionality that is specific to these particular version of the products. The recipe does not really require that you understand anything about data warehouses or date dimensions, but you will probably appreciate it better if you do ;)&lt;/p&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The transformation to generate the data for the date dimension follows a pretty straightforward design. The graphical representation of the transformation is shown below:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1296869769/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1134/1296869769_aebc0549c2_o.png" width="546" height="347" alt="localized_date_dimension1" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;First, the dimension table is created (Prepare). After that, rows are generated to fill it (Input). However, the generated rows are almost empty and barren - we still need to derive and add data to fill the attributes of the date dimension (Transformation). Finally, the data is stored in the date dimension table (Output).&lt;/p&gt;
&lt;h3&gt;Step-by-Step&lt;/h3&gt;
&lt;p&gt;The remainder of this article describes in detail how to build this transformation. The majority of steps is probably not very interesting to moderately experienced Kettle users, but may be of use to beginning users. &lt;/p&gt;
&lt;p&gt;Note for users that are completely new to Kettle - it is advisable to review the first few chapters of the &lt;em&gt;Spoon&lt;/em&gt; user guide (Spoon is the name of Kettle tool you use to design the ETL process). It explains how to start up the tool, create a new transformation, add and connect steps etc. You can find it in the docs/English directory beneath the Kettle home directory. &lt;/p&gt;
&lt;h4&gt;MySQL JDBC driver: setting the characterEncoding property to UTF8 &lt;/h4&gt;
&lt;p&gt;You need to create a (JDBC) connection to MySQL in the usual, straightforward way:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1296869773/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1120/1296869773_6114278db7_o.png" width="581" height="460" alt="kettle-mysql-connection" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In addition, you need to set the &lt;code&gt;characterEncoding&lt;/code&gt; property of the JDBC driver:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1296869777/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1133/1296869777_5ed7f179c5_o.png" width="581" height="460" alt="kettle-mysql-connection-utf8" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This ensures MySQL will be able to understand the utf8 encoded data that we may produce to generate a date dimension in the, say, Chinese language. Note that you cannot just use a statement like &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html" target="_mysql"&gt;SET NAMES&lt;/a&gt; utf8&lt;/code&gt; to do this. This is not specific to Kettle, but has to do with the way the MySQL JDBC driver (Connector/J) handles character sets. Please refer to the &amp;#8220;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-charsets.html" target="_mysql"&gt;Using character sets and unicode&lt;/a&gt;&amp;#8221; section of the Connector/J documentation for more information on this topic.&lt;br /&gt;
&lt;h4&gt;Creating the date dimension table&lt;/h4&gt;
&lt;p&gt;In this particular case, it seemed convenient to create the dimension table as part of the transformation. This is done using the &amp;#8220;Execute SQL Script&amp;#8221; step shown below:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1303185374/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1125/1303185374_66a778a699_o.png" width="734" height="660" alt="kettle-sql-script" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &amp;#8220;Execute SQL Script&amp;#8221; step executes the following script to create the date dimension table:
&lt;pre&gt;DROP TABLE IF EXISTS dim_date
;
CREATE TABLE IF NOT EXISTS dim_date (
  date_key                smallint unsigned NOT NULL,
  date                    date              NOT NULL,
  date_short              char(12)          NOT NULL,
  date_medium             char(16)          NOT NULL,
  date_long               char(24)          NOT NULL,
  date_full               char(32)          NOT NULL,
  day_in_year             smallint unsigned NOT NULL,
  day_in_month            tinyint  unsigned NOT NULL,
  is_first_day_in_month   char(10)          NOT NULL,
  is_last_day_in_month    char(10)          NOT NULL,
  day_abbreviation        char(3)           NOT NULL,
  day_name                char(12)          NOT NULL,
  week_in_year            tinyint  unsigned NOT NULL,
  week_in_month           tinyint  unsigned NOT NULL,
  is_first_day_in_week    char(10)          NOT NULL,
  is_last_day_in_week     char(10)          NOT NULL,
  month_number            tinyint  unsigned NOT NULL,
  month_abbreviation      char(3)           NOT NULL,
  month_name              char(12)          NOT NULL,
  year2                   char(2)           NOT NULL,
  year4                   year              NOT NULL,
  quarter_name            char(2)           NOT NULL,
  quarter_number          tinyint           NOT NULL,
  year_quarter            char(7)           NOT NULL,
  year_month_number       char(7)           NOT NULL,
  year_month_abbreviation char(8)           NOT NULL,
  PRIMARY KEY(date_key),
  UNIQUE(date)
)
ENGINE=MyISAM
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_unicode_ci&lt;/pre&gt;
&lt;p&gt;This is by no means a complete date dimension. The most important limitation is that it only contains attributes that are immediately derivable from the calendar. So, attributes to denote business specific periods like the fiscal year and holidays are not included.&lt;br /&gt;
&lt;h4&gt;Generating 10 years worth of days&lt;/h4&gt;
&lt;p&gt;The grain of the date dimension is days - a row in the date dimension represents a single day. In this case, the &amp;#8220;Generate Rows&amp;#8221; is configured to generate 3660 rows, which roughly corresponds with enough days to last 10 years:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1313816373/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1094/1313816373_68541842c4_o.png" width="739" height="656" alt="kettle-generating-10-years" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the example, this step is also used to provide parameters to generate the date dimension data. As we&amp;#8217;ll see in a moment, the &lt;code&gt;inital_date&lt;/code&gt; field effectively specifies the first date that goes into the date dimension. The &lt;code&gt;language_code&lt;/code&gt; and &lt;code&gt;country_code&lt;/code&gt; fields are used to localize the textual attributes of the date dimension, and the &lt;code&gt;local_yes&lt;/code&gt; and &lt;code&gt;local_no&lt;/code&gt; fields are used for boolean attributes.&lt;/p&gt;
&lt;p&gt;There are other ways to get these parameters into our transformation. For example, we could have used an &amp;#8220;Add Constants&amp;#8221; step with a similar result. Another possibility would be to get this data from the environment using a &amp;#8220;Get Variables&amp;#8221; step, and this would allow the parameters to be specified at transformation run-time.&lt;br /&gt;
&lt;h4&gt;Counting the days&lt;/h4&gt;
&lt;p&gt;Although we certainly generate enough rows, they are all identical. In order to have each row represent a single distinct day, we need a way to &amp;#8216;count&amp;#8217; the generated rows. We do this by adding a &amp;#8220;Sequence&amp;#8221; step:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1305401137/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1202/1305401137_744465aa5a_o.png" width="744" height="818" alt="kettle-adding-sequence" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this case, we use the &amp;#8220;Add sequence&amp;#8221; step to generate an incrementing number within the scope of the transformation. As we&amp;#8217;ll see later on, we can add this to our initial date to get a series of consecutive dates.&lt;br /&gt;
&lt;h4&gt;Calculating date dimension Attributes&lt;/h4&gt;
&lt;p&gt;The previous steps form a basis from which we can derive all of the attributes that currently make up our date dimension. To actually calculate the date attributes, we use a &amp;#8220;Modified Java Script Value&amp;#8221; step:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1306307546/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1103/1306307546_c9fd7d7c90_o.png" width="837" height="889" alt="kettle-javascript" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kettle comes with an embedded &lt;a href="http://www.mozilla.org/rhino/" target="_rhino"&gt;Rhino&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Javascript" target="_javascript"&gt;javascript&lt;/a&gt; engine. The &amp;#8220;Modified Java Script Value&amp;#8221; step lets you use it to run javascript code to as part of the transformation.&lt;/p&gt;
&lt;p&gt;The javascript code is executed for each row that comes out of the previous steps. In the script code, one can reference the values from the input rows, perform some processing on them, and generate new output fields.&lt;/p&gt;
&lt;p&gt;One of the fortunate characteristics of the Rhino engine is that it lets us use &lt;a href="http://en.wikipedia.org/wiki/Java_%28programming_language%29" target="_java"&gt;java&lt;/a&gt; classes inside the javascript code. Let&amp;#8217;s take a look at the script to see how we can use that to generate the localized data for our data dimension attributes.&lt;br /&gt;
&lt;h5&gt;Initialization&lt;/h5&gt;
&lt;p&gt;The first thing we do in the javascript code is to get data from the current input row. In the &amp;#8220;Generate Rows&amp;#8221; step, we added the &lt;code&gt;language_code&lt;/code&gt; and &lt;code&gt;country_code&lt;/code&gt; fields to specify a locale. Here, in the script, we use the following piece of code to turn that into a &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Locale.html" target="_java"&gt;java.util.Locale&lt;/a&gt;&lt;/code&gt; object:
&lt;pre&gt;//Create a Locale according to the specified language code
var locale = new &lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Locale.html#Locale(java.lang.String,%20java.lang.String)" target="_java"&gt;java.util.Locale&lt;/a&gt;(
    language_code.getString()    //get the &lt;a href="http://ftp.ics.uci.edu/pub/ietf/http/related/iso639.txt" target="_iso"&gt;ISO639&lt;/a&gt; language_code from the input row
,   country_code.getString()     //get the &lt;a href="http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_3166.html" target="_iso"&gt;ISO3166&lt;/a&gt; country_code from the input row
);&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;java.util.Locale&lt;/code&gt; class represents a particular cultural region. It forms a cornerstone of the internationalization support built into the java platform, and provides information to many other classes to generate appropriately localized output. &lt;/p&gt;
&lt;p&gt;We will be using the locale on a number of occasions, but first, we use our it to initialize a &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html"&gt;java.util.Calendar&lt;/a&gt;&lt;/code&gt; object:
&lt;pre&gt;
//Create a calendar, use the specified locale
var calendar = new &lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/GregorianCalendar.html#GregorianCalendar(java.util.Locale)" target="_java"&gt;java.util.GregorianCalendar&lt;/a&gt;(locale);
&lt;/pre&gt;
&lt;p&gt;(Note that the java platform currently only provides one concrete Calender Class: the &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/GregorianCalendar.html" target="_java"&gt;java.util.GregorianCalendar&lt;/a&gt;&lt;/code&gt;. Unfortunately, java does not seem to provide a built-in recipe for dealing with, for example, Islamic or Hebrew calendars).&lt;/p&gt;
&lt;p&gt;We require the calendar object to obtain an instance of the &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Date.html" target="_java"&gt;java.util.Date&lt;/a&gt;&lt;/code&gt; Class that represents the date corresponding to the current row. To do that, we first set the calendar&amp;#8217;s current date using the &lt;code&gt;initial_date&lt;/code&gt; field that was specified in the &amp;#8220;Generate Rows&amp;#8221; step:
&lt;pre&gt;//Set the initial date
calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#setTime(java.util.Date)" target="_java"&gt;setTime&lt;/a&gt;(initial_date.getDate());&lt;/pre&gt;
&lt;p&gt;We need this to add the number of days generated by our &amp;#8220;Add Sequence&amp;#8221; step:
&lt;pre&gt;//set the calendar to the current date by adding DaySequence days
calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#add(int,%20int)" target="_java"&gt;add&lt;/a&gt;(calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#DAY_OF_MONTH" target="_java"&gt;DAY_OF_MONTH&lt;/a&gt;,DaySequence.getInteger() - 1);&lt;/pre&gt;
&lt;p&gt;(Note that we substract &lt;code&gt;1&lt;/code&gt; from the DaySequence value. This is because our sequence starts at &lt;code&gt;1&lt;/code&gt;, and we want the specified initial date to be included in our date dimension).&lt;/p&gt;
&lt;p&gt;We conclude the initialization of the script by retrieving a &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Date.html" target="_java"&gt;java.util.Date&lt;/a&gt;&lt;/code&gt; object that represents the date for the current row.&lt;/p&gt;
&lt;pre&gt;//get the calendar date
var date = new &lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Date.html#Date(long)" target="_java"&gt;java.util.Date&lt;/a&gt;(calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#getTimeInMillis()" target="_java"&gt;getTimeInMillis&lt;/a&gt;());&lt;/pre&gt;
&lt;p&gt;This &lt;code&gt;java.util.Date&lt;/code&gt; instance is assigned to the &lt;code&gt;date&lt;/code&gt; variable in the script, allowing it to be used as an output field of the javascript step. We require this in order to fill the &lt;code&gt;date&lt;/code&gt; column of the date dimension table. We will also be using the &lt;code&gt;date&lt;/code&gt; variable later on in this script to derive the value of other date dimension attributes.&lt;br /&gt;
&lt;h5&gt;Getting Text representations of full dates&lt;/h5&gt;
&lt;p&gt;Our date dimension has a number of attributes to denote a complete date containing day, month and year parts, in various formats: &lt;code&gt;date_short&lt;/code&gt;, &lt;code&gt;date_medium&lt;/code&gt;, &lt;code&gt;date_long&lt;/code&gt; and &lt;code&gt;date_full&lt;/code&gt;. These are all generated using the &lt;code&gt;&lt;a target="_java"  href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html" target=""&gt;java.text.DateFormat&lt;/a&gt;&lt;/code&gt; class. &lt;/p&gt;
&lt;p&gt;To do that, we first need to create an appropriate &lt;code&gt;DateFormat&lt;/code&gt; instance using the static &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#getDateInstance(int,%20java.util.Locale)" target="_java"&gt;getDateInstance()&lt;/a&gt;&lt;/code&gt; method, passing our locale object as well as a constant that specifies whether we want to short, medium, long or full format. Then, we can pass the &lt;code&gt;java.util.Date&lt;/code&gt; object for which we want to obtain the textual representation to the &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#format(java.util.Date)" target="_java"&gt;format&lt;/a&gt;&lt;/code&gt; method of the newly created &lt;code&gt;java.text.DateFormat&lt;/code&gt; instance:
&lt;pre&gt;//en-us example: 9/3/07
var date_short  = java.text.DateFormat.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#getDateInstance(int,%20java.util.Locale)" target="_java"&gt;getDateInstance&lt;/a&gt;(
                      java.text.DateFormat.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#SHORT" target="_java"&gt;SHORT&lt;/a&gt;
                  ,   locale
                  ).&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#format(java.util.Date)" target="_java"&gt;format&lt;/a&gt;(date);
//en-us example: Sep 3, 2007
var date_medium = java.text.DateFormat.getDateInstance(
                      java.text.DateFormat.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#MEDIUM" target="_java"&gt;MEDIUM&lt;/a&gt;
                  ,   locale
                  ).format(date);
//en-us example: September 3, 2007
var date_long   = java.text.DateFormat.getDateInstance(
                      java.text.DateFormat.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#LONG" target="_java"&gt;LONG&lt;/a&gt;
                  ,   locale
                  ).format(date);
//en-us example: Monday, September 3, 2007
var date_full   = java.text.DateFormat.getDateInstance(
                      java.text.DateFormat.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/DateFormat.html#FULL" target="_java"&gt;FULL&lt;/a&gt;
                  ,   locale
                  ).format(date);&lt;/pre&gt;
&lt;h5&gt;Formatting date parts&lt;/h5&gt;
&lt;p&gt;Extracting and formatting different date parts is most easily done by applying the &lt;code&gt;format&lt;/code&gt; function on a subclass of &lt;code&gt;java.text.Dateformat&lt;/code&gt;, the &lt;code&gt;&lt;a target="_java" href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html"&gt;java.text.SimpleDateFormat&lt;/a&gt;&lt;/code&gt; class. The &lt;code&gt;java.text.SimpleDateFormat&lt;/code&gt; class allows formatting of dates based on date and time &lt;em&gt;patterns&lt;/em&gt;:
&lt;pre&gt;//day in year: 1..366
var simpleDateFormat = java.text.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html#SimpleDateFormat(java.lang.String,%20java.util.Locale)" target="_java"&gt;SimpleDateFormat&lt;/a&gt;(&lt;b&gt;"D"&lt;/b&gt;,locale);
var day_in_year              = simpleDateFormat.format(date);
&lt;/pre&gt;
&lt;p&gt;In this example, we pass both the locale and a date pattern to the constructor to create an instance of the &lt;code&gt;java.text.SimpleDateFormat&lt;/code&gt; class. The pattern is passed as the string &lt;code&gt;"D"&lt;/code&gt;, specifying a day-in-year format. &lt;/p&gt;
&lt;p&gt;Once we created the &lt;code&gt;java.text.SimpleDateFormat&lt;/code&gt; instance, we can apply a new pattern to it using the &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html#applyPattern(java.lang.String)" target="_java"&gt;applyPattern()&lt;/a&gt;&lt;/code&gt; method. Calling the &lt;code&gt;format&lt;/code&gt; method again, we obtain the date in the desired format:
&lt;pre&gt;//day in month: 1..31
simpleDateFormat.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html#applyPattern(java.lang.String)" target="_java"&gt;applyPattern&lt;/a&gt;("d");
var day_in_month       = simpleDateFormat.format(date);
//en-us example: "Monday"
simpleDateFormat.applyPattern("EEEE");
var day_name           = simpleDateFormat.format(date);
//en-us example: "Mon"
simpleDateFormat.applyPattern("E");
var day_abbreviation   = simpleDateFormat.format(date);
//week in year, 1..53
simpleDateFormat.applyPattern("ww");
var week_in_year       = simpleDateFormat.format(date);
//week in month, 1..5
simpleDateFormat.applyPattern("W");
var week_in_month      = simpleDateFormat.format(date);
//month number in year, 1..12
simpleDateFormat.applyPattern("MM");
var month_number       = simpleDateFormat.format(date);
//en-us example: "September"
simpleDateFormat.applyPattern("MMMM");
var month_name         = simpleDateFormat.format(date);
//en-us example: "Sep"
simpleDateFormat.applyPattern("MMM");
var month_abbreviation = simpleDateFormat.format(date);
//2 digit representation of the year, example: "07" for 2007
simpleDateFormat.applyPattern("y");
var year2              = simpleDateFormat.format(date);
//4 digit representation of the year, example:  2007
simpleDateFormat.applyPattern("yyyy");
var year4              = simpleDateFormat.format(date);&lt;/pre&gt;
&lt;h5&gt;Dealing with Quarters&lt;/h5&gt;
&lt;p&gt;Although the &lt;code&gt;java.text.SimpleDateFormat&lt;/code&gt; class is useful, it does not provide any functionality for working with quarters. We do want our date dimension to contain attributes to represent the quarter, so we have to reside to computing these manually:
&lt;pre&gt;//handling Quarters is a DIY
var quarter_name = "Q";
var quarter_number;
switch(parseInt(month_number)){
    case 1: case 2: case 3: quarter_number = "1"; break;
    case 4: case 5: case 6: quarter_number = "2"; break;
    case 7: case 8: case 9: quarter_number = "3"; break;
    case 10: case 11: case 12: quarter_number = "4"; break;
}
quarter_name += quarter_number;&lt;/pre&gt;
&lt;p&gt;Although this will do for now, this solution doesn&amp;#8217;t really cut it because it does not produce localized output. Anyway, it is better than nothing so we&amp;#8217;ll just have to make do with it.&lt;br /&gt;
&lt;h5&gt;Period demarcation flags&lt;/h5&gt;
&lt;p&gt;Our date dimension has a few attributes that are used to indicate the start and end of week and month periods. We use simple yes/no type flags, but we allow the actual &amp;#8220;yes&amp;#8221; and &amp;#8220;no&amp;#8221; values to be specified by the user in the &amp;#8220;Generate Rows&amp;#8221; step. We retrieve them with the following piece of code:
&lt;pre&gt;//get the local yes/no values
var yes = local_yes.getString();
var no = local_no.getString();&lt;/pre&gt;
&lt;p&gt;We can now use them these to flag the start and end of week and month periods.&lt;/p&gt;
&lt;p&gt;The start (and of course, also the end) of the week are subject to the locale. In order to find out if we are dealing with the first day of a week, we use the &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#getFirstDayOfWeek()" target="_java"&gt;getFirstDayOfWeek()&lt;/a&gt;&lt;/code&gt; method of the &lt;code&gt;java.util.Calendar&lt;/code&gt; class. By comparing its return value with the day of week of the current row, we can see if we happen to be dealing with the first day of the week:
&lt;pre&gt;//initialize for week calculations
var first_day_of_week = calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#getFirstDayOfWeek()" target="_java"&gt;getFirstDayOfWeek&lt;/a&gt;();
var day_of_week = java.util.Calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#DAY_OF_WEEK" target="_java"&gt;DAY_OF_WEEK&lt;/a&gt;;

//find out if this is the first day of the week
var is_first_day_in_week;
if(first_day_of_week==calendar.&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#get(int)" target="_java"&gt;get&lt;/a&gt;(day_of_week)){
    is_first_day_in_week = yes;
} else {
    is_first_day_in_week = no;
}&lt;/pre&gt;
&lt;p&gt;Note that we obtain the current day of the week by passing the value of the &lt;code&gt;&lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#DAY_OF_WEEK" target="_java"&gt;DAY_OF_WEEK&lt;/a&gt;&lt;/code&gt; constant to the &lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Calendar.html#get(int)" target="_java"&gt;get&lt;/a&gt; method of the &lt;code&gt;java.util.Calendar&lt;/code&gt; object that we initialized at the start of the script.&lt;/p&gt;
&lt;p&gt;In order to set the value for the &lt;code&gt;is_last_day_in_week&lt;/code&gt; attribute of the date dimension, we simply find out if the &lt;em&gt;next day&lt;/em&gt; happens to be the first day of the week. If it is, then by definition, the current row represents the last day of the week:
&lt;pre&gt;//calculate the next day
calendar.add(calendar.DAY_OF_MONTH,1);

//get the next calendar date
var next_day = new java.util.Date(calendar.getTimeInMillis());

//find out if this is the first day of the week
var is_last_day_in_week;
if(first_day_of_week==calendar.get(day_of_week)){
    is_last_day_in_week = yes;
} else {
    is_last_day_in_week = no;
}&lt;/pre&gt;
&lt;p&gt;(Note that we have already used similar code to add a day to a date when we added the day sequence to the initial date.)&lt;/p&gt;
&lt;p&gt;We can use similar logic to calculate the values for the &lt;code&gt;is_first_day_in_month&lt;/code&gt; and &lt;code&gt;is_last_day_in_month&lt;/code&gt; indicators. This is actually easier, because the first day in the month is not dependant upon the locale (at least - not within one calendar). So, we only need to find out if the day of month is equal to one:
&lt;pre&gt;//find out if this is the first day of the month
var is_first_day_of_month;
if(day_in_month == 1){
    is_first_day_in_month = yes;
} else {
    is_first_day_in_month = no;
}

//find out if this is the last day in the month
var is_last_day_of_month;
if(java.text.SimpleDateFormat("d",locale).format(next_day)==1){
    is_last_day_in_month = yes;
} else {
    is_last_day_in_month = no;
}&lt;/pre&gt;
&lt;h5&gt;A few more date attributes&lt;/h5&gt;
&lt;p&gt;We conclude the computation of the date attributes by adding a few more useful labels:
&lt;pre&gt;//a few useful labels
var year_quarter            = year4 + "-" + quarter_name;
var year_month_number       = year4 + "-" + month_number;
var year_month_abbreviation = year4 + "-" + month_abbreviation;&lt;/pre&gt;
&lt;p&gt;Like when we calculated the quarters, this is actually not a very good method because the results will not be localized. That said, the result will make sense for many locales, and we don&amp;#8217;t really have a better way to deal with it right now.&lt;br /&gt;
&lt;h5&gt;Defining the step outputs&lt;/h5&gt;
&lt;p&gt;We just calculated all the required values to fill the attributes of our date dimension. We just need to get them out of the script and into the outputs of the step.&lt;/p&gt;
&lt;p&gt;Every variable declared in the javascript (using the &lt;code&gt;var&lt;/code&gt; keyword) can be used as an output field of the javascript step. The easiest way to generate the outputs is by hitting the &amp;#8220;Get Variables&amp;#8221; button at the bottom of the dialog. This simply adds an output field for each variable declared in the script:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1314242017/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1102/1314242017_9c7a26b005_o.png" width="628" height="462" alt="kettle-javascript-outputs" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By default, the data type for all the outputs added in this way is set to the String type. Although it is good practice to choose a more specific data type, it is almost always unncessary in this case, as all integer type values will be correctly converted implicitly when we insert them into the database. There is one exception in this case, and that is the &lt;code&gt;date&lt;/code&gt; output. Inside the script, it is an instance of a &lt;code&gt;java.util.Date&lt;/code&gt; class, and we must set the type to &amp;#8220;Date&amp;#8221; in the output too. Otherwise, the (java) string representation of the &lt;code&gt;java.util.Date&lt;/code&gt; object will be sent as output, and this is not automatically recognized as a &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/datetime.html" target="_mysql"&gt;date&lt;/a&gt;&lt;/code&gt; by MySQL.&lt;br /&gt;
&lt;h4&gt;Discarding Fields&lt;/h4&gt;
&lt;p&gt;We are now almost ready to insert the rows into the date dimension table. We only need to discard all fields in the stream that do not correspond with any of the columns in our date dimension table. We use a &amp;#8220;Select Values&amp;#8221; step to do that:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1315316092/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1289/1315316092_90337c2dc6_o.png" width="504" height="376" alt="kettle-select-values" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We use the &amp;#8220;Get Fields To Select&amp;#8221; button to pull in all available fields, and after that, simply select and delete each field that we do not need. As a final step, we rename the &lt;code&gt;DaySequence&lt;/code&gt; field to &lt;code&gt;date_key&lt;/code&gt; to map it to the &lt;code&gt;date_key&lt;/code&gt; column in our date dimension table.&lt;br /&gt;
&lt;h4&gt;Inserting data into the table&lt;/h4&gt;
&lt;p&gt;In the final step, we add  the generated data to the &lt;code&gt;dim_date&lt;/code&gt; table we created in the very first step of the transformation:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1315443700/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1326/1315443700_5b7cd320de_o.png" width="556" height="560" alt="kettle-table-output" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We only need to specify the connection and the table name here, and the step will then automatically attempt to map the fields of the incoming rows to table columns. &lt;/p&gt;
&lt;p&gt;We could have used the &amp;#8220;Insert / Update&amp;#8221; step, or even the &amp;#8220;Execute SQL Script&amp;#8221; step too to write the data to the dimension table, but that would require a little bit of extra work.&lt;/p&gt;
&lt;h3&gt;Running the transformation&lt;/h3&gt;
&lt;p&gt;After building the transformation, you can run it by hitting the &amp;#8220;running man&amp;#8221; icon on the toolbar. This will open a dialog where you can set a number of properties for the transformation. Hit &amp;#8220;Launch&amp;#8221; button there and after that, the transformation will be executed:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.flickr.com/photos/15655867@N00/1315520722/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1344/1315520722_9be28e22ee_o.png" width="732" height="685" alt="kettle-run-transformation" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Closing Notes&lt;/h3&gt;
&lt;p&gt;I hope you enjoyed this tip. If you want to, you can &lt;a target="_xcdsql" href="http://www.xcdsql.org/Pentaho/Kettle/localized_date_dimension/LOCALIZED_DATE_DIMENSION.ktr"&gt;download the kettle transformation here&lt;/a&gt;, and use it as you see fit.&lt;/p&gt;
&lt;p&gt;If you are interested in open source data warehousing, register for the &lt;a href="http://www.mysql.com/news-and-events/web-seminars/display-41.html" target="_mysql"&gt;MySQL Enterprise Data Warehousing Seminar&lt;/a&gt;, Thursday, September 06, 2007 and hear what Robin Schumacher has to say about that subject. (Note that this is a general MySQL data warehousing seminar - this post and the seminar are unrelated)
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Top 5 considerations while setting up your MySQL backup</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/08/top_5_considerations_while_set.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.21449</id>
    
    <published>2007-08-22T00:03:22Z</published>
    <updated>2007-08-22T01:29:36Z</updated>
    
    <summary>List of top 5 items that have to be considered before deciding on a MySQL backup implementation are: * How fast and how easy do you want the MySQL Recovery process to be? * What will be the impact of...</summary>
    <author>
        <name>Paddy Sreenivasan</name>
            </author>
            <category term="Articles" />
        <content type="html">
&lt;p&gt;List of top 5 items that have to be considered before deciding on a MySQL backup implementation are:&lt;br /&gt;
* How fast and how easy do you want the MySQL Recovery process to be?&lt;br /&gt;
* What will be the impact of MySQL Backup process on your Application?&lt;br /&gt;
* What will your backup configuration look like? (the What, Where, When, and How of MySQL Backup)?&lt;br /&gt;
* How will you manage your backup process and backed up data?&lt;br /&gt;
* What kind of tracking, reporting and compliance requirements does your business have from your MySQL backup implementation?&lt;/p&gt;
&lt;p&gt;The &lt;a href="http://www.zmanda.com/mysql-backup-considerations.html"&gt;white paper&lt;/a&gt; provides detailed insights about the above considerations. Your feedback is welcome.&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Seeking MySQL backup console feedback</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/06/seeking_mysql_backup_console_f.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.20726</id>
    
    <published>2007-06-26T21:59:16Z</published>
    <updated>2007-06-26T21:59:16Z</updated>
    
    <summary>We are working on Zmanda Management Console for our MySQL backup product line: Zmanda Recovery Manager (ZRM) for MySQL. ZRM for MySQL is an enterprise backup and recovery solution for MySQL. ...</summary>
    <author>
        <name>Paddy Sreenivasan</name>
            </author>
            <category term="News" />
        <content type="html">
&lt;p&gt;We are working on &lt;a title="Zmanda Management Console for MySQL backup" target="_blank" href="http://www.zmanda.com/images/home/ZMC_for_MySQL_backup.jpg"&gt;Zmanda Management Console&lt;/a&gt; for our MySQL backup product line: &lt;a href="http://www.zmanda.com/backup-mysql.html"&gt;Zmanda Recovery Manager (ZRM) for MySQL&lt;/a&gt;. ZRM for MySQL is an enterprise backup and recovery solution for MySQL.&lt;/p&gt;
&lt;p&gt;&lt;img title="Zmanda Management Console for MySQL backup" alt="Zmanda Management Console for MySQL backup" src="http://www.zmanda.com/images/home/ZMC_for_MySQL_backup1.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;We are looking for MySQL administrators in San Francisco bay area who would be interested in providing functionality and usability feedback for the user interface.  We are particularly interested in MySQL administrators responsible for implementing backup solution for their MySQL databases. Please send an email to me (paddy-at-zmanda-dot-com) with your availability and why you would meet our requirement. We will pay a small stipend for your time.&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>What MySQL can do to enter the off-line Web</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/06/what_mysql_can_do_to_enter_the.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.20437</id>
    
    <published>2007-06-03T10:16:10Z</published>
    <updated>2008-03-03T21:45:50Z</updated>
    
    <summary>Disclaimer - views expressed in this blog (and this entry) are my own and do not necessarily reflect the views of MySQL AB Ever since I wrote my blog entry about Google Gears and the query tool for the browser...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Opinion" />
        <content type="html">
&lt;p&gt;&lt;em&gt;Disclaimer - views expressed in this blog (and this entry) are my own and do not necessarily reflect the views of MySQL AB&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Ever since I wrote &lt;a href="http://rpbouman.blogspot.com/2007/06/google-gears-webbrowser-embedded.html" href="_blog"&gt;my blog entry&lt;/a&gt; about &lt;a href="http://code.google.com/apis/gears/index.html" target=""&gt;Google Gears&lt;/a&gt; and the &lt;a href="http://xcdsql.org/ggqt/" target="_ggqt"&gt;query tool&lt;/a&gt; for the browser embedded offline &lt;A href="http://code.google.com/apis/gears/api_database.html" target="_gg"&gt;Google Gears database service&lt;/a&gt;, I have been wondering how &lt;a href="http://www.mysql.com/" target="_mysql"&gt;MySQL&lt;/a&gt; might fit in here.&lt;/p&gt;
&lt;p&gt;I have heard an idea to write a MySQL storage engine for &lt;a href="http://www.sqlite.org/" target="_sqlite"&gt;SQLite&lt;/a&gt; and although I do not think this is necessarily a bad idea, I don&amp;#8217;t think it will be immediately useful for typical applications powered by Google Gears. Personally, I think the following things might be of more use:
&lt;dl&gt;
&lt;dt&gt;A modification of the Google Gears browser extension that allows a local MySQL database to be used instead of the embedded SQLite database&lt;/dt&gt;
&lt;dd&gt;This would be interesting in case you would need a particular MySQL feature that is not available in SQLite. I&amp;#8217;m thinking mainly of &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/stored-procedures.html" target="_mysql"&gt;stored procedures&lt;/a&gt; but more so about &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/replication.html" target="_mysql"&gt;replication&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;The local MySQL server could be setup as a master that replicates to a remote slave. The remote slave would be hosted by your company and that backend would somehow merge all the private slaves from its employees into the corporate database. &lt;/p&gt;
&lt;p&gt;Of course, the idea for the SQLite storage engine could fit here too - a local MySQL Server with the SQLite storage engine could be set up as master to replicate to the corporate slave.&lt;/dd&gt;
&lt;dt&gt;A Google Gears &lt;a href="http://code.google.com/apis/gears/api_workerpool.html"&gt;Worker Pool&lt;/a&gt; application that synchronizes the embedded SQLite database with the remote, corporate MySQL database&lt;/dt&gt;
&lt;dd&gt;This idea would of course require some server component that may be accessed by the Worker application, and it would presumably take the form of a web-service. This could be implemented either as a piece of middleware that sits in a HTTP server or as a component that is part of the MySQL server itself. This product would achieve in part the same functionality as replication from the client&amp;#8217;s master to the remote slave but I think there are a number of advantages here.&lt;/p&gt;
&lt;p&gt;First of all, the implementation would not be intrusive to Google Gears - users can still use the ordinary Google Gears extension, and MySQL AB does not need to ensure that their modifications to the browser extension are compatible with future Google Gears developments. &lt;/p&gt;
&lt;p&gt;Second, this application could in principle allow a true &lt;em&gt;synchronization&lt;/em&gt; rather than just replication. In other words, it could pull data from the corporate database as needed, and push local modifications to be merged with the remote database. Obviously the possibilities here would depend on the flexibility of the server-side component accessed by the worker pool application.&lt;/p&gt;
&lt;p&gt;Thirdly, users would not need to install an extra MySQL server on their webclient host. Arguably this could be seen as a disadvantage for MySQL AB as it would result in fewer installations of MySQL Server, but personally I don&amp;#8217;t think this is the right angle. I think that for most people, an extra MySQL Server that sits on their machine just for the purpose of replication will not be very attractive.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;To me, using the Google Gears Worker Pool to drive synchronization seems like the most sensible choice that fits good with the whole offline-web idea. But what do you think? Can MySQL play a role in the offline web like SQLite can now?
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Google Gears: Webbrowser embedded Database</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/06/post_1.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.20432</id>
    
    <published>2007-06-02T15:10:08Z</published>
    <updated>2007-06-02T15:15:07Z</updated>
    
    <summary>Google Gears is an open source browser extension created by Google. It provides a framework that allows the creation of offline webbrowser applications. At the moment it provides three services: Local Server A data store for static resources. This allows...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Articles" />
        <content type="html">
&lt;p&gt;&lt;a href="http://code.google.com/apis/gears/index.html" target="_gg"&gt;Google Gears&lt;/a&gt; is an open source browser extension created by Google. It provides a framework that allows the creation of offline webbrowser applications. At the moment it provides three services:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;a href="http://code.google.com/apis/gears/api_localserver.html" target="_gg"&gt;Local Server&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;A data store for static resources. This allows efficient caching of images, scripts and webpages&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://code.google.com/apis/gears/api_database.html" target="_gg"&gt;Database&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;An embedded relation database management system, based on &lt;a href="http://www.sqlite.org/" target="_sqlite"&gt;SQLite&lt;/a&gt;. SQLite should be familiar to most &lt;a href="http://www.php.net/" target="_php"&gt;PHP&lt;/a&gt; developers, as it is being &lt;a href="http://www.php.net/manual/en/ref.sqlite.php" target="_php"&gt;shipped with PHP&lt;/a&gt; since version 5&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://code.google.com/apis/gears/api_workerpool.html" target="_gg"&gt;Worker Pool&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;A form of threading support inside the browser that allows webapplications to initiate long running processes without hampering the responsiveness of the user interface.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;All these services can be accessed from within the browser using a javascript API.&lt;/p&gt;
&lt;p&gt;Users only needs to &lt;a href="http://code.google.com/apis/gears/install.html" target="_gg"&gt;install the extension&lt;/a&gt; in order for the browser to be able to access the services when browsing pages.&lt;/p&gt;
&lt;p&gt;If you want to get an immediate taste of the database service, be sure to install Google Gears and take a look at my offline, &lt;a href="http://www.xcdsql.org/ggqt/" target="_ggqt"&gt;browser-based database client&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Currently, the extension is available for mozilla &lt;a href="http://www.mozilla.com/en-US/firefox/" target="_ff"&gt;Firefox&lt;/a&gt; and Internet Explorer.&lt;br /&gt;
&lt;h3&gt;What does this buy us?&lt;/h3&gt;
&lt;p&gt;Google Gears offers an enormous potential for web application developers, for a number of reasons:
&lt;ul&gt;
&lt;li&gt;The local server can be used to setup a specialized cache that can help to run web applications off-line.&lt;/li&gt;
&lt;li&gt;The database service offers not only a structural solution for maintaining state at the client side, it even allows for persistence. That is, if you close the browser, shut down the computer, and restart, the application can pick up where it left off!&lt;/li&gt;
&lt;li&gt;I am not really sure about the worker pool but I can imagine this can be very useful to perform tasks to perform tasks that require a connection with a server. For example, the worker could be instantiated that attempts to synchronize the local state and data gathered by the application to a webservice that acts as the terminal destination.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, we all have to see real-world results, but I think this truly is a revolutionary development. It finally makes sense to start thinking of &amp;#8220;browser applications&amp;#8221; instead of &amp;#8220;web applications&amp;#8221;, because the application does not necessarily have to connect to a HTTP server anymore to be useful. &lt;/p&gt;
&lt;p&gt;I would not be surprised if this would initiate a new wave of emancipation of relational databases. Literally everybody that can operate a webbrowser will now own a database, and all the glue to make it work together with the user interface is already there. Also, a number of mobile service applications might be implemented on top the Google Gears embedded SQLite database instead of a specific standalone database product. &lt;/p&gt;
&lt;p&gt;The web browser offers a reasonably sufficient environment for creating data oriented end-user applications, and the combination might prove to be a killer solution. The SQLite database can still be accessed outside the browser too, and be integrated with other applications.&lt;br /&gt;
&lt;h3&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;There are few preparations that need to be taken before you can use google gears in a web page. The prerequisites are that the browser that will view the page has the Google Gears &lt;a href="http://code.google.com/apis/gears/install.html" target="_gg"&gt;browser extension&lt;/a&gt; installed. The page itself needs to include the &lt;a href="http://code.google.com/apis/gears/resources/gears_init.js" target="_gg"&gt;&lt;code&gt;gears_init.js&lt;/code&gt;&lt;/a&gt; script in order to use the javascript API to access the services. This single line in the &lt;code&gt;&amp;#60;head&amp;#62;&lt;/code&gt; element of the page allows access to the API:
&lt;pre&gt;
    &amp;#60;script type="text/javascript" src="&lt;b&gt;gears_init.js&lt;/b&gt;"&amp;#62;&amp;#60;/script&amp;#62;
&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;gears_init.js&lt;/code&gt; script needs to reside at the same location as the page in this case, but it may be referenced from any location. &lt;/p&gt;
&lt;p&gt;One of the fascinating things is that neither the page nor the script need be on a webserver. You can simple store the script and the page on the local filesystem. &lt;/p&gt;
&lt;p&gt;When opening the page in the browser, you will be prompted to allow the page to access the Google Gears services:&lt;br /&gt;
&lt;a href="http://www.flickr.com/photos/15655867@N00/525310126/" title="Photo Sharing"&gt;&lt;img src="http://farm2.static.flickr.com/1062/525310126_f9e8324ae6_o.png" width="348" height="241" alt="GG_Warning" /&gt;&lt;/a&gt;&lt;br /&gt;
So the user needs to allow access for each different site that wants to use the services.&lt;/p&gt;
&lt;h3&gt;A closer look at the Database service&lt;/h3&gt;
&lt;p&gt;The Google Gears database service can be accessed using a simple &lt;a href="http://code.google.com/apis/gears/api_database.html" target="_gg"&gt;API&lt;/a&gt;.  All of the Google Gears APIs are javascript (ecmascript) APIs.&lt;/p&gt;
&lt;h4&gt;Creating a database object and opening a database&lt;/h4&gt;
&lt;p&gt;First a database object needs to be created using a line like this:
&lt;pre&gt;
    var db = &lt;b&gt;google.gears.factory.create&lt;/b&gt;("beta.database", "1.0");
&lt;/pre&gt;
&lt;p&gt;The variable &lt;code&gt;db&lt;/code&gt; can then be used as a handle to open the actual database. The following line will create a database named &lt;code&gt;GGDB&lt;/code&gt;:
&lt;pre&gt;
    db.&lt;b&gt;open&lt;/b&gt;("GGDB");
&lt;/pre&gt;
&lt;p&gt;Database names are unique per application, or rather &lt;em&gt;origin&lt;/em&gt;. All pages that are located in a particular &lt;em&gt;scheme&lt;/em&gt;, &lt;em&gt;host&lt;/em&gt; and &lt;em&gt;port&lt;/em&gt; are considered to be part of the same application. (For the web location &lt;code&gt;http://www.foo.com/&lt;/code&gt;, &lt;code&gt;http&lt;/code&gt; is the scheme, &lt;code&gt;www.foo.com&lt;/code&gt; is the domain and the port is the default port for the scheme, which is 80 in this case). Pages can only access databases that were created by a page in the same domain. &lt;/p&gt;
&lt;p&gt;This &amp;#8217;same domain&amp;#8217; restriction is an understandable security measure: if we visit &lt;code&gt;https://www.mybank.com/&lt;/code&gt;, we do not want the data stored in any of the databases used by that site to be accessible by pages from &lt;code&gt;http://www.bankrobbers.com/&lt;/code&gt;.&lt;br /&gt;
&lt;h4&gt;Creating a table&lt;/h4&gt;
&lt;p&gt;The database object implements the &lt;code&gt;execute&lt;/code&gt; method, which allows one to execute SQL statements. For example, if we want to create a table to store contacts, we could try something like this:
&lt;pre&gt;
db.&lt;b&gt;execute&lt;/b&gt;(
    "CREATE TABLE persons("
+   "    id         INTEGER PRIMARY KEY AUTOINCREMENT"
+   ",   first_name"
+   ",   last_name"
+   ")"
);&lt;/pre&gt;
&lt;p&gt;When you are not used to SQLite&amp;#8217;s &lt;a href="http://www.sqlite.org/datatype3.html" target="_sqlite"&gt;specific philosophy&lt;/a&gt; of data storage, the previous statement must&amp;#8217;ve made you frown at least once. &lt;/p&gt;
&lt;p&gt;Yup people, SQLite does not require us to specify column data types. It is allowed to include an identifier where you&amp;#8217;d normally put a data type name. However, this &amp;#8216;data type&amp;#8217; does not restrict the values that maybe stored in that column. There is only one exception to that rule: when a column is defined with &lt;code&gt;INTEGER PRIMARY KEY AUTOINCREMENT&lt;/code&gt;, it denotes a surrogate key that will store only integral values.&lt;/p&gt;
&lt;p&gt;In spite of the fact that SQLite does not use data types for table definitions, SQLite does have a type system. SQLite has a concept called storage classes: each value entered into the database is associated with one of the following storage classes:
&lt;ul&gt;
&lt;dt&gt;&lt;code&gt;NULL&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;A special storage class for SQL &lt;code&gt;NULL&lt;/code&gt; values&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;INTEGER&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Storeas integral values&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;REAL&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Stores numerical floating point values&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;TEXT&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Variable length strings&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;BLOB&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Variable length binary strings&lt;/dd&gt;
&lt;/ul&gt;
&lt;p&gt;However, a storage class is bound on the level of a &lt;em&gt;value&lt;/em&gt;, not a column. So, one column can contain multiple values, and each is associated to one of the storage classes.&lt;br /&gt;
&lt;h4&gt;Inserting data&lt;/h4&gt;
&lt;p&gt;In Google Gears, all interaction with the database is performed by issuing SQL statements through the &lt;code&gt;execute&lt;/code&gt; method of the &lt;code&gt;Database&lt;/code&gt; object. So, if we want to insert data into our &lt;code&gt;persons&lt;/code&gt; table, we need to do this with a SQL &lt;code&gt;INSERT&lt;/code&gt; statement.&lt;/p&gt;
&lt;p&gt;The first argument of the &lt;code&gt;execute&lt;/code&gt; method is the actual SQL statement, and we can use this to pass a complete &lt;code&gt;INSERT&lt;/code&gt; statement, including any value literals. However, we can also use parameterized statements, using placeholders instead of value literals. The remainder of the argumentlist must then be used to provide values for the placeholders in the SQL statement text:
&lt;pre&gt;
db.execute(
    "INSERT INTO persons (first_name,last_name)"
+   "VALUES (&lt;b&gt;?&lt;/b&gt;,&lt;b&gt;?&lt;/b&gt;)"
,   &lt;b&gt;"Roland"&lt;/b&gt;
,   &lt;b&gt;"Bouman"&lt;/b&gt;
);&lt;/pre&gt;
&lt;p&gt;This manner of passing values offers the advantage that you do not need to escape quotes from the values: this is automatically taken care of by SQLite.&lt;br /&gt;
&lt;h4&gt;Processing resultsets&lt;/h4&gt;
&lt;p&gt;A call to the &lt;code&gt;execute&lt;/code&gt; method always returns a &lt;code&gt;ResultSet&lt;/code&gt; object. In the previous examples, execution of the statements did not result in a set of rows, but a javascript &lt;code&gt;ResultSet&lt;/code&gt; object is returned regardless.&lt;/p&gt;
&lt;p&gt;The following snippet illustrates how you can process the rows in the &lt;code&gt;ResultSet&lt;/code&gt; object:
&lt;pre&gt;
var result = db.execute(                 //execute query, get result
    "SELECT * FROM persons"
);
var fieldCount = result.fieldCount();   //get number of columns
while(result.isValidRow()){             //as long as we did not process all rows
    for(var i=0; i&amp;#60;fieldCount; i++){    //loop over all columns
        var value = result.field(i);    //get value of the current column
    }
    result.next();                      //process the next row
}
result.close();
&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;fieldCount&lt;/code&gt; method of the &lt;code&gt;ResultSet&lt;/code&gt; object can be used to return the number of columns associated with the &lt;code&gt;ResultSet&lt;/code&gt; object. If the SQL statement that yielded the &lt;code&gt;ResultSet&lt;/code&gt; object did not return a set of rows, the &lt;code&gt;fieldCount&lt;/code&gt; method returns &lt;code&gt;0&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;The &lt;code&gt;isValidRow()&lt;/code&gt; method of the &lt;code&gt;ResultSet&lt;/code&gt; object returns a boolean that indicates whether there are still rows associated with the &lt;code&gt;ResultSet&lt;/code&gt; object that may be processed. The &lt;code&gt;next()&lt;/code&gt; method of the &lt;code&gt;ResultSet&lt;/code&gt; object proceeds to the next row in the resultset. After calling the &lt;code&gt;next()&lt;/code&gt; method, the &lt;code&gt;isValidRow()&lt;/code&gt; method must be used to detect whether the resultset is already exhausted. Together, the &lt;code&gt;isValidRow()&lt;/code&gt; and &lt;code&gt;next()&lt;/code&gt; methods can be uses this to drive a loop to iterate over all the rows associated with the  &lt;code&gt;ResultSet&lt;/code&gt; object.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;field(columnPosition)&lt;/code&gt; method of the &lt;code&gt;ResultSet&lt;/code&gt; object can be used to obtain the value for the column at the specified position. The &lt;code&gt;fieldByName(columnName)&lt;/code&gt; method of the &lt;code&gt;ResultSet&lt;/code&gt; object can also be used to obtain the value of a column, but as indicated by the method name, the column must be specified using the name of the column. The &lt;code&gt;fieldName(columnPosition)&lt;/code&gt; may be used to retrieve the name of the column at the specified position.&lt;/p&gt;
&lt;p&gt;When we are done processing the resultset, we are required to call the &lt;code&gt;close()&lt;/code&gt; method of the &lt;code&gt;ResultSet&lt;/code&gt; object. There are plans to implement a way to automatically close the &lt;code&gt;ResultSet&lt;/code&gt; object when it isn&amp;#8217;t used anymore, but even then it can&amp;#8217;t hurt to close the &lt;code&gt;ResultSet&lt;/code&gt; object explicitly.&lt;br /&gt;
&lt;h4&gt;Handling Errors&lt;/h4&gt;
&lt;p&gt;Although Google Gears is great, we still are not in a perfect world. Runtime errors may and will occur, and to handle them you need to use the javascript &lt;code&gt;try...catch&lt;/code&gt; syntax. This is not so much a Google Gears thing, but as the database API will throw a runtime error for syntax errors, database constraint violations etcetera, you simply myst use &lt;code&gt;try...catch&lt;/code&gt; blocks everywhere. &lt;/p&gt;
&lt;p&gt;The following snippet illustrates how to do this:
&lt;pre&gt;
&lt;b&gt;try&lt;/b&gt; {

    //stuff that can go wrong    

} &lt;b&gt;catch(e)&lt;/b&gt;{

    //use the message property
    //from the Exception object
    alert(e.message);

&lt;b&gt;}&lt;/b&gt;&lt;/pre&gt;
&lt;h3&gt;A Quick and Dirty Command Line Client&lt;/h3&gt;
&lt;p&gt;To play around with the Google Gears Database service, I made a quick and dirty browser-based Command line client for the SQLite database. You can use the &lt;a href="http://www.xcdsql.org/ggqt/" target="_ggqt"&gt;online version&lt;/a&gt; here. If you want to use it offline, no problem, just download save the page anywhere on your local disk. Just be sure to save a copy of the &lt;a href="http://code.google.com/apis/gears/resources/gears_init.js" target="_gg"&gt;&lt;code&gt;gears_init.js&lt;/code&gt;&lt;/a&gt; script in the same directory. That&amp;#8217;s it&amp;#8230;.on to the offline web!
&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Debunking GROUP BY Myths</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.20247</id>
    
    <published>2007-05-21T18:16:07Z</published>
    <updated>2008-03-03T21:48:19Z</updated>
    
    <summary>There is a popular myth about the SQL GROUP BY clause. The myth holds that &apos;standard SQL&apos; requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause, unless these columns appear exclusively...</summary>
    <author>
        <name>Roland Bouman</name>
            </author>
            <category term="Articles" />
        <content type="html">
&lt;p&gt;There is a popular myth about the SQL &lt;code&gt;GROUP BY&lt;/code&gt; clause. The myth holds that &amp;#8217;standard SQL&amp;#8217; requires columns referenced in the &lt;code&gt;SELECT&lt;/code&gt; list of a query to also appear in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, unless these columns appear exclusively in an aggregated expression. &lt;a href="http://www.mysql.com/" target="_mysql"&gt;MySQL&lt;/a&gt; is often accused of violating this standard.&lt;/p&gt;
&lt;p&gt;In this article I will attempt to debunk this myth, and to provide a more balanced view regarding MySQL&amp;#8217;s treatment of &lt;code&gt;GROUP BY&lt;/code&gt; at the same time.&lt;/p&gt;
&lt;p&gt;To do that, I will first demonstrate that MySQL can be instructed to only accept &lt;code&gt;GROUP BY&lt;/code&gt; clauses that include all non-aggregated expressions referred to in the &lt;code&gt;SELECT&lt;/code&gt; list, thus making MySQL&amp;#8217;s behaviour conform more to that of other well-known rdbms-products. &lt;/p&gt;
&lt;p&gt;Second, I will show that it is very important to clearly define which version of the SQL-standard is being referred to. The two most recent versions use a rather sophisticated way of defining the required relationships between expressions appearing in the &lt;code&gt;GROUP BY&lt;/code&gt; clause and the &lt;code&gt;SELECT&lt;/code&gt; list. Contrary to a popular belief, these standards do not literally require that all non-aggregated &lt;code&gt;SELECT&lt;/code&gt; list columns appear in the &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;p&gt;Third, I will use a simple yet realistic example to illustrate in an informal manner what I believe is the intent expressed in the more recent versions of the SQL standard.  Hopefully I will be able to convince you why it may even be better to not blindly include all non-aggregated columns from the &lt;code&gt;SELECT&lt;/code&gt; list in the &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;p&gt;Before we dive into the details, I&amp;#8217;ll start with a brief introduction with regard to &lt;code&gt;GROUP BY&lt;/code&gt; for those that are not too familiar at all with the construct. In the introduction, I will illustrate why most database products require all non-aggregated columns that are referenced in the &lt;code&gt;SELECT&lt;/code&gt; list to appear in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, and why users run into trouble sometimes due to MySQL&amp;#8217;s treatment of &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;h3&gt;The &lt;code&gt;GROUP BY&lt;/code&gt;-clause&lt;/h3&gt;
&lt;p&gt;So, what is the &lt;code&gt;GROUP BY&lt;/code&gt;-clause, and what does it do? &lt;/p&gt;
&lt;p&gt;The &lt;code&gt;GROUP BY&lt;/code&gt;-clause is an optional element of SQL &lt;code&gt;SELECT&lt;/code&gt; expressions. Syntactically, the &lt;code&gt;GROUP BY&lt;/code&gt;-clause consists of the keyword sequence &lt;code&gt;GROUP BY&lt;/code&gt;, followed by a comma-separated list of (scalar) expressions. If the &lt;code&gt;SELECT&lt;/code&gt; expression contains a &lt;code&gt;GROUP BY&lt;/code&gt;-clause, it must appear after the &lt;code&gt;WHERE&lt;/code&gt; clause. (If the &lt;code&gt;WHERE&lt;/code&gt; clause is omitted, the &lt;code&gt;GROUP BY&lt;/code&gt; clause will immediately follow after the &lt;code&gt;FROM&lt;/code&gt; clause.)&lt;/p&gt;
&lt;p&gt;When included, &lt;code&gt;GROUP BY&lt;/code&gt; specifies that rows from the intermediate result set are to be divided in a number of groups, returning one single row for each such group. The list of expressions provided in the &lt;code&gt;GROUP BY&lt;/code&gt; list defines how the grouping takes place. All rows that have the same combination of values for all expressions specified in the &lt;code&gt;GROUP BY&lt;/code&gt; are in the same group.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s do a few simple example queries to illustrate the effect of the &lt;code&gt;GROUP BY&lt;/code&gt;-clause. (For these examples I&amp;#8217;ll use the &lt;code&gt;pet&lt;/code&gt; table from the &lt;a href="http://downloads.mysql.com/docs/menagerie-db.zip" target="_mysql"&gt;&lt;code&gt;menagerie&lt;/code&gt;&lt;/a&gt; database.) The following query retrieves all rows from the &lt;code&gt;pet&lt;/code&gt; table in the &lt;code&gt;menagerie&lt;/code&gt; database: &lt;/p&gt;
&lt;pre&gt;
SELECT  *
FROM    menagerie.pet
&lt;/pre&gt;
&lt;p&gt;The query returns a result that might look like this:&lt;/p&gt;
&lt;pre&gt;
+----------+--------+---------+------+------------+------------+
&amp;#124; name     &amp;#124; owner  &amp;#124; species &amp;#124; sex  &amp;#124; birth      &amp;#124; death      &amp;#124;
+----------+--------+---------+------+------------+------------+
&amp;#124; Fluffy   &amp;#124; Harold &amp;#124; cat     &amp;#124; f    &amp;#124; 1993-02-04 &amp;#124; NULL       &amp;#124;
&amp;#124; Claws    &amp;#124; Gwen   &amp;#124; cat     &amp;#124; m    &amp;#124; 1994-03-17 &amp;#124; NULL       &amp;#124;
&amp;#124; Buffy    &amp;#124; Harold &amp;#124; dog     &amp;#124; f    &amp;#124; 1989-05-13 &amp;#124; NULL       &amp;#124;
&amp;#124; Fang     &amp;#124; Benny  &amp;#124; dog     &amp;#124; m    &amp;#124; 1990-08-27 &amp;#124; NULL       &amp;#124;
&amp;#124; Bowser   &amp;#124; Diane  &amp;#124; dog     &amp;#124; m    &amp;#124; 1979-08-31 &amp;#124; 1995-07-29 &amp;#124;
&amp;#124; Chirpy   &amp;#124; Gwen   &amp;#124; bird    &amp;#124; f    &amp;#124; 1998-09-11 &amp;#124; NULL       &amp;#124;
&amp;#124; Whistler &amp;#124; Gwen   &amp;#124; bird    &amp;#124; NULL &amp;#124; 1997-12-09 &amp;#124; NULL       &amp;#124;
&amp;#124; Slim     &amp;#124; Benny  &amp;#124; snake   &amp;#124; m    &amp;#124; 1996-04-29 &amp;#124; NULL       &amp;#124;
&amp;#124; Puffball &amp;#124; Diane  &amp;#124; hamster &amp;#124; f    &amp;#124; 1999-03-30 &amp;#124; NULL       &amp;#124;
+----------+--------+---------+------+------------+------------+
&lt;/pre&gt;
&lt;p&gt;(Because we did not specify an &lt;code&gt;ORDER BY&lt;/code&gt;-clause, the rows are returned in some order determined by the database, so your results might not look exactly like this. However, for this example, the actual rows are important - not the order.)&lt;/p&gt;
&lt;p&gt;Now, suppose we want to make groups for each species. The following addition of the &lt;code&gt;GROUP BY&lt;/code&gt;-clause does just that:&lt;/p&gt;
&lt;pre&gt;
SELECT   species
FROM     menagerie.pet
&lt;b&gt;GROUP BY species&lt;/b&gt; -- make one group for each species
&lt;/pre&gt;
&lt;p&gt;The query returns this result:&lt;/p&gt;
&lt;pre&gt;
+---------+
&amp;#124; species &amp;#124;
+---------+
&amp;#124; bird    &amp;#124;
&amp;#124; cat     &amp;#124;
&amp;#124; dog     &amp;#124;
&amp;#124; hamster &amp;#124;
&amp;#124; snake   &amp;#124;
+---------+
&lt;/pre&gt;
&lt;p&gt;At a glance, it seems as if the &lt;code&gt;GROUP BY&lt;/code&gt; clause does nothing more than scan for unique occurrences in the &lt;code&gt;species&lt;/code&gt; column and return those. However, it is better to think of each row in the &lt;code&gt;GROUP BY&lt;/code&gt; result as a summary row that represents a group of rows that have the same value in the &lt;code&gt;species&lt;/code&gt; column. So, in this case, the &lt;code&gt;bird&lt;/code&gt; row represents the group of pets that are birds, (&amp;#8221;Chirpy&amp;#8221; and &amp;#8220;Whistler&amp;#8221;); the &lt;code&gt;cat&lt;/code&gt; row represents the group of pets that are cats (&amp;#8221;Fluffy&amp;#8221; and &amp;#8220;Claws&amp;#8221;), and so on and so forth.&lt;/p&gt;
&lt;h4&gt;Calculating Aggregates for a group of rows&lt;/h4&gt;
&lt;p&gt;A &lt;code&gt;GROUP BY&lt;/code&gt; query allows one to apply &lt;em&gt;aggregate functions&lt;/em&gt; on the collection of rows associated with each group defined by the &lt;code&gt;GROUP BY&lt;/code&gt; clause. An aggregate function can process expressions for each row in a group of rows to compute a single return value. A number of well-known standard aggregate functions are &lt;code&gt;COUNT&lt;/code&gt;, &lt;code&gt;MIN&lt;/code&gt;, &lt;code&gt;MAX&lt;/code&gt;, and &lt;code&gt;SUM&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;(Aggregate functions can also be used without a &lt;code&gt;GROUP BY&lt;/code&gt; clause, in which case the entire intermediate resultset is treated as one big group. Try imagining the effect of the &lt;code&gt;GROUP BY&lt;/code&gt; operation with an empty &lt;code&gt;GROUP BY&lt;/code&gt; list: the query will return just one row that summarizes all rows from the intermediate result set).&lt;/p&gt;
&lt;p&gt;Expanding our previous example query for &lt;code&gt;GROUP BY&lt;/code&gt;, the following example illustrates the effect of some of these aggregate functions:&lt;/p&gt;
&lt;pre&gt;
SELECT   species
,        &lt;b&gt;GROUP_CONCAT(name)&lt;/b&gt; -- make a list of pets per species
,        &lt;b&gt;COUNT(*)&lt;/b&gt;           -- count pets per species
,        &lt;b&gt;MIN(birth)&lt;/b&gt;         -- birthdate of oldest pet per species
,        &lt;b&gt;MAX(birth)&lt;/b&gt;         -- birthdate of youngest pet per species
FROM     menagerie.pet
GROUP BY species
&lt;/pre&gt;
&lt;p&gt;This example also includes usage of the &lt;a href="http://www.mysql.com/" href="_mysql"&gt;MySQL&lt;/a&gt; specific &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat" target="_mysql"&gt;GROUP_CONCAT&lt;/a&gt;&lt;/code&gt; aggregate function which will prove to be very useful to illustrate the effect of the &lt;code&gt;GROUP BY&lt;/code&gt;-clause.&lt;/p&gt;
&lt;p&gt;The result looks something like this:&lt;/p&gt;
&lt;pre&gt;
+---------+--------------------+----------+------------+------------+
&amp;#124; species &amp;#124; GROUP_CONCAT(name) &amp;#124; COUNT(*) &amp;#124; MIN(birth) &amp;#124; MAX(birth) &amp;#124;
+---------+--------------------+----------+------------+------------+
&amp;#124; bird    &amp;#124; Chirpy,Whistler    &amp;#124;        2 &amp;#124; 1997-12-09 &amp;#124; 1998-09-11 &amp;#124;
&amp;#124; cat     &amp;#124; Fluffy,Claws       &amp;#124;        2 &amp;#124; 1993-02-04 &amp;#124; 1994-03-17 &amp;#124;
&amp;#124; dog     &amp;#124; Buffy,Fang,Bowser  &amp;#124;        3 &amp;#124; 1979-08-31 &amp;#124; 1990-08-27 &amp;#124;
&amp;#124; hamster &amp;#124; Puffball           &amp;#124;        1 &amp;#124; 1999-03-30 &amp;#124; 1999-03-30 &amp;#124;
&amp;#124; snake   &amp;#124; Slim               &amp;#124;        1 &amp;#124; 1996-04-29 &amp;#124; 1996-04-29 &amp;#124;
+---------+--------------------+----------+------------+------------+
&lt;/pre&gt;
&lt;p&gt;Again, we see one row for each group of rows that have an identical value in the &lt;code&gt;species&lt;/code&gt; column, but this time, we also see the effect of processing the individual rows for each species using aggregate functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;GROUP_CONCAT&lt;/code&gt; function was applied to the &lt;code&gt;name&lt;/code&gt; column. For each species in our pet table, there maybe multiple pets, and &lt;code&gt;GROUP_CONCAT&lt;/code&gt; concatenates their names, separating the individual names by default with a comma. Thus in this example, the &lt;code&gt;GROUP_CONCAT&lt;/code&gt; expression reveals the make-up of each group of pets of a single species.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;COUNT&lt;/code&gt; function is used with a wildcard &lt;code&gt;*&lt;/code&gt;, instructing it to count the number of rows associated with each group. Verifying this with the previous column, we can immediately see that the number of rows in the group is consistent with the number of names concatenated by the &lt;code&gt;GROUP_CONCAT&lt;/code&gt; expression&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;MIN&lt;/code&gt; and &lt;code&gt;MAX&lt;/code&gt; functions are applied to the &lt;code&gt;birth&lt;/code&gt; column and for each species, respectively report the birth date of whichever pet is oldest (&lt;code&gt;MIN(birth)&lt;/code&gt;, the birth date that is smaller than any of the other birth dates) and youngest (&lt;code&gt;MAX(birth)&lt;/code&gt;, the birth date that is larger than any of the other birth dates).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, aggregate functions are applied to expressions taken from a group of rows and have the effect of &amp;#8216;condensing&amp;#8217; (aggregating) the group, yielding a single value. Most aggregate functions calculate or determine some kind of statistical metric which serves to characterize the group as a whole. The MySQL specific &lt;code&gt;GROUP_CONCAT&lt;/code&gt; function is an exception: it simply enumerates all members in the group passed to the function, and as such it is not a statistical function. However, it still exposes the main property of aggregate functions, namely the ability to turn the expressions from a group of rows into a single value.&lt;/p&gt;
&lt;h3&gt;Running into trouble with &lt;code&gt;GROUP BY&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;So far, we&amp;#8217;ve seen a few examples with &lt;code&gt;GROUP BY&lt;/code&gt; that make perfect sense. Yet is easy to run into trouble with &lt;code&gt;GROUP BY&lt;/code&gt;. Take a look at the following query:&lt;/p&gt;
&lt;pre&gt;
SELECT   species
,        MIN(&lt;b&gt;birth&lt;/b&gt;)  -- birthdate of oldest pet per species
,        MAX(&lt;b&gt;birth&lt;/b&gt;)  -- birthdate of youngest pet per species
,        &lt;b&gt;birth&lt;/b&gt;       -- birthdate of ... uh oh...!
FROM     menagerie.pet
&lt;/pre&gt;
&lt;p&gt;This query is similar to the previous query, where we calculated a few aggregates for each group of pets belonging to the same species. However, this time, we also include the a plain reference to the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;SELECT&lt;/code&gt; list.&lt;/p&gt;
&lt;p&gt;An attempt to run this query on Oracle results in an error:&lt;/p&gt;
&lt;pre&gt;
SQL&amp;#62; SELECT  species
 2  ,        MIN(&lt;b&gt;birth&lt;/b&gt;)
 3  ,        MAX(&lt;b&gt;birth&lt;/b&gt;)
 4  ,        &lt;b&gt;birth&lt;/b&gt;
 5  FROM     sakila.pet
 6  GROUP BY species;
,       birth
       *
ERROR at line 4:
&lt;b&gt;ORA-00979: not a GROUP BY expression&lt;/b&gt;
&lt;/pre&gt;
&lt;p&gt;Running this query on MySQL however does return a result, which may look something like this:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT   species
    -&amp;#62; ,        MIN(birth)  -- birthdate of oldest pet per species
    -&amp;#62; ,        MAX(birth)  -- birthdate of youngest pet per species
    -&amp;#62; ,        birth       -- birthdate of ... uh oh...!
    -&amp;#62; FROM     menagerie.pet
    -&amp;#62; GROUP BY species;
+---------+------------+------------+------------+
&amp;#124; species &amp;#124; MIN(birth) &amp;#124; MAX(birth) &amp;#124; birth      &amp;#124;
+---------+------------+------------+------------+
&amp;#124; bird    &amp;#124; 1997-12-09 &amp;#124; &lt;b&gt;1998-09-11&lt;/b&gt; &amp;#124; &lt;b&gt;1998-09-11&lt;/b&gt; &amp;#124;
&amp;#124; cat     &amp;#124; &lt;b&gt;1993-02-04&lt;/b&gt; &amp;#124; 1994-03-17 &amp;#124; &lt;b&gt;1993-02-04&lt;/b&gt; &amp;#124;
&amp;#124; dog     &amp;#124; 1979-08-31 &amp;#124; 1990-08-27 &amp;#124; &lt;b style="color: red"&gt;1989-05-13&lt;/b&gt; &amp;#124;
&amp;#124; hamster &amp;#124; 1999-03-30 &amp;#124; 1999-03-30 &amp;#124; 1999-03-30 &amp;#124;
&amp;#124; snake   &amp;#124; 1996-04-29 &amp;#124; 1996-04-29 &amp;#124; 1996-04-29 &amp;#124;
+---------+------------+------------+------------+
5 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;What is happening here? Why do we see such different behaviours? In fact, what is MySQL&amp;#8217;s behaviour in this case? Sometimes, the &lt;code&gt;birth&lt;/code&gt; column reports a value that looks like the maximum value for &lt;code&gt;birth&lt;/code&gt; within the species (first row), and sometimes we see the maximum value (second row). We even see one case where the returned value is in between the minimum and maximum values (row 3). How can we explain this seemingly random behaviour?&lt;/p&gt;
&lt;h3&gt;Understanding the Problem&lt;/h3&gt;
&lt;p&gt;It&amp;#8217;s not too hard to deduce what is happening here. All we need to do is go back to our explanation of the effect of the &lt;code&gt;GROUP BY&lt;/code&gt; clause, and see how it applies to our last query. &lt;/p&gt;
&lt;p&gt;It was already explained that the &lt;code&gt;GROUP BY&lt;/code&gt; clause returns one row for each group of rows in the intermediate result, and that the groups are defined by the expression list defined in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. So, in this case, we are creating one result row for each group of rows that belong to the same species because the &lt;code&gt;GROUP BY&lt;/code&gt; list only contains the &lt;code&gt;species&lt;/code&gt; column. Yet there are several pets that may belong to a specific species, so the &lt;code&gt;birth&lt;/code&gt; column may have (and often has) a different value for each row in a particular species group. &lt;/p&gt;
&lt;p&gt;Having realized that, we can now ask ourselves: assuming there are multiple values in the &lt;code&gt;birth&lt;/code&gt; column for a particular value of &lt;code&gt;species&lt;/code&gt;, which one should be returned? What did we mean when we specified the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;SELECT&lt;/code&gt; list?&lt;/p&gt;
&lt;p&gt;There is no good answer to this question. It is certainly possible to select just one of the possible values for the &lt;code&gt;birth&lt;/code&gt; column: in fact, this is exactly what MySQL does. However, it is impossible to define the significance of which ever of the possible values is chosen. That is, it makes no sense to want to mix the plain values from the rows that belong to the group with the group itself. Therefore it does not make sense to even include the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;SELECT&lt;/code&gt; list of the query.&lt;/p&gt;
&lt;p&gt;Another way of looking at it is to say that the &amp;#8216;grain&amp;#8217; of &lt;code&gt;species&lt;/code&gt; values is different (and therefore, incompatible) with the grain of the &lt;code&gt;birth&lt;/code&gt; values. It does not mean we cannot access the values in the &lt;code&gt;birth&lt;/code&gt; column; it merely means we must use an aggregate function to compute the &amp;#8216;right&amp;#8217; value from a whole group of them. &lt;/p&gt;
&lt;h4&gt;Avoiding the Problem&lt;/h4&gt;
&lt;p&gt;What about the behaviour such as exercised by Oracle? Doesn&amp;#8217;t it make more sense to issue an error message rather than returning non-sense data? Put this way, most people will probably agree. Of course, the error message itself is somewhat puzzling:&lt;/p&gt;
&lt;pre&gt;
,       birth
       *
ERROR at line 4:
ORA-00979: &lt;b&gt;not a GROUP BY expression&lt;/b&gt;
&lt;/pre&gt;
&lt;p&gt;This seems to suggest that the problem is that the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;SELECT&lt;/code&gt; list is not included in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. In turn, this raises the question whether the problem would be solved if we would have included the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. &lt;/p&gt;
&lt;p&gt;Well, including the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;GROUP BY&lt;/code&gt; clause certainly gets rid of the error message. However, a lot of users with entry level skills in SQL fail to understand that this yields quite a different query. The original &lt;code&gt;GROUP BY species&lt;/code&gt; yields one group for each species, whereas &lt;code&gt;GROUP BY species, birth&lt;/code&gt; yields a group for each combination of values in &lt;code&gt;species&lt;/code&gt; and &lt;code&gt;birth&lt;/code&gt; - most probably not at all what is intended.&lt;/p&gt;
&lt;p&gt;On the other hand, we cannot expect the database management system to know what we were thinking when we included the &lt;code&gt;birth&lt;/code&gt; in the &lt;code&gt;SELECT&lt;/code&gt; list in the first place. So, despite that the error message may seem a bit puzzling, it is still preferable over silently returning non-sense data. &lt;/p&gt;
&lt;p&gt;But do we really have to put up with this? The answer is &amp;#8220;No!&amp;#8221;. &lt;/p&gt;
&lt;h3&gt;Including &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; in MySQL&amp;#8217;s &lt;code&gt;sql_mode&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Nowadays, MySQL is capable of detecting this problem too, and it is perfectly possible to make MySQL reject the previous query to avoid the problem of returning non-sense data. This is achieved by including &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; in the &lt;code&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html" target="_mysql"&gt;sql_mode&lt;/a&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Like many server settings, we can specify the &lt;code&gt;sql_mode&lt;/code&gt; using the &lt;code&gt;--sql-mode&lt;/code&gt; &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/server-options.html" target="_mysql"&gt;command line argument&lt;/a&gt; to the MySQL server executable (&lt;a href="http://dev.mysql.com/doc/refman/5.1/en/mysqld.html" target="_mysql"&gt;&lt;code&gt;mysqld&lt;/code&gt;&lt;/a&gt;), or we can include it in an &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/option-files.html" target="_mysql"&gt;option file&lt;/a&gt;. For example, including the following line in the option file will enable &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; when the server starts up:&lt;/p&gt;
&lt;pre&gt;
sql_mode=ONLY_FULL_GROUP_BY
&lt;/pre&gt;
&lt;p&gt;Beginning with MySQL 4.1, it is also possible to set the &lt;code&gt;sql_mode&lt;/code&gt; at runtime using the &lt;code&gt;SET&lt;/code&gt; syntax. In this way, the &lt;code&gt;sql_mode&lt;/code&gt; can be set globally or for the session level. The latter is the most useful, as it allows one to setup a &lt;code&gt;sql_mode&lt;/code&gt; most suited for a particular application without affecting any other applications that run on the server. (Some applications don&amp;#8217;t expect anything other than the default setting and may run into trouble with a particular &lt;code&gt;sql_mode&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;The following snippet illustrates how to include &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; in the &lt;code&gt;sql_mode&lt;/code&gt; at runtime:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SET sql_mode := CONCAT(&lt;b&gt;'ONLY_FULL_GROUP_BY,'&lt;/b&gt;,@@sql_mode);
Query OK, 0 rows affected (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;@@sql_mode&lt;/code&gt; server variable contains a possibly empty, comma-separated string of current &lt;code&gt;sql_mode&lt;/code&gt; settings. The &lt;code&gt;CONCAT&lt;/code&gt; expression prepends whatever the current setting is of the &lt;code&gt;sql_mode&lt;/code&gt; with &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt;. Note the comma immediately following &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt;. If the value of &lt;code&gt;@@sql_mode&lt;/code&gt; is the empty string, the value of the &lt;code&gt;CONCAT&lt;/code&gt; expression will have a trailing comma, but this is allowed in the assignment (trimming off the comma in the process).&lt;/p&gt;
&lt;p&gt;When we now attempt to execute our query again, it fails with an error message:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT   species
    -&amp;#62; ,        MIN(birth)  -- birthdate of oldest pet per species
    -&amp;#62; ,        MAX(birth)  -- birthdate of youngest pet per species
    -&amp;#62; ,        birth       -- birthdate of ... uh oh...!
    -&amp;#62; FROM     menagerie.pet
    -&amp;#62; GROUP BY species;
&lt;b&gt;ERROR 1055 (42000): 'menagerie.pet.birth' isn't in GROUP BY&lt;/b&gt;
&lt;/pre&gt;
&lt;p&gt;The error messages indicates that we did not include the &lt;code&gt;birth&lt;/code&gt; column in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. Now, MySQL behaves similar to Oracle for this query.&lt;/p&gt;
&lt;p&gt;(Alas, MySQL&amp;#8217;s &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; is not as clever as it should be, and there are particular cases where &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; is too restrictive in enforcing only full &lt;code&gt;GROUP BY&lt;/code&gt; clauses. The details are described &lt;a href="http://bugs.mysql.com/bug.php?id=8510" target="_bugs"&gt;here&lt;/a&gt;. The good news is that &lt;a href=""&gt;the community can help&lt;/a&gt; to fix this bug! Go to MySQL Forge  and check out &lt;a href="http://forge.mysql.com/worklog/task.php?id=2489" target="_forge"&gt; Worklog task 2489&lt;/a&gt;.)&lt;/p&gt;
&lt;h3&gt;What does &amp;#8216;the&amp;#8217; SQL standard say&lt;/h3&gt;
&lt;p&gt;In the previous sections, we&amp;#8217;ve seen how Oracle and MySQL react very differently to the same SQL &lt;code&gt;GROUP BY&lt;/code&gt; query. But what do the standards say? How is the &lt;code&gt;GROUP BY&lt;/code&gt; clause supposed to behave? &lt;/p&gt;
&lt;p&gt;In the introduction of this article, I claimed I would debunk a popular myth that holds that&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&amp;#8230;standard SQL requires columns referenced in the &lt;code&gt;SELECT&lt;/code&gt; list of a query to also appear in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, unless these columns appear exclusively in an aggregated expression.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Now, I don&amp;#8217;t want to pretend I&amp;#8217;m an expert as far as the SQL standard (ISO/IEC 9075) is concerned. In fact, I&amp;#8217;ve noticed repeatedly that the sheer volume of the documentation as well as the persistent formal wording prevent me from obtaining a clear overview of it. But, let&amp;#8217;s give it a try anyway. &lt;/p&gt;
&lt;p&gt;The &lt;a href="http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt"&gt;1992 version of the standard&lt;/a&gt;, 7.9 - 7 states that:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;If &lt;em&gt;T&lt;/em&gt; is a grouped table, then each &amp;#60;column reference&amp;#62; in each &amp;#60;value expression&amp;#62; that references a column of &lt;em&gt;T&lt;/em&gt; shall reference a grouping column or be specified within a &amp;#60;set function specification&amp;#62;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;As said, I&amp;#8217;m not an expert in this area, but the way I read it it boils down to:
&lt;ul&gt;
&lt;li&gt;Queries that include a &lt;code&gt;GROUP BY&lt;/code&gt; clause can only include column references in &lt;code&gt;SELECT&lt;/code&gt;-ed expressions if the column appears in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, or if that column appears as part of an aggregate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now, in the 7.12 - 15 of the 2003 version of the standard we find this:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;
If T is a grouped table, then let G be the set of grouping columns of T. In each &amp;#60;value expression&amp;#62; contained in &amp;#60;select list&amp;#62;, each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a &amp;#60;set function specification&amp;#62; whose aggregation query is QS.
&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The 1999 version of the standard contains a similar rule. The important thing to note here is that both versions that succeeded the 1992 version stopped requiring explicitly that all non-aggregated columns in the &lt;code&gt;SELECT&lt;/code&gt; list must be present in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. Instead they require that any non-aggregated column appearing in the &lt;code&gt;SELECT&lt;/code&gt; list is &lt;em&gt;functionally dependent&lt;/em&gt; upon the &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;h3&gt;Functional dependencies&lt;/h3&gt;
&lt;p&gt;What would the 1999 and 2003 version of the SQL standard mean by the term &amp;#8220;functionally dependent&amp;#8221;? The answer to that question is also defined by the standard. Unfortunately, it cannot be illustrated by a simple quote, as the formal definition of what exactly constitutes a functional dependency according to the standard is fairly extensive and complicated. &lt;/p&gt;
&lt;p&gt;Luckily, the concept of functional dependencies can be easily illustrated in a less formal way. Suppose we have two expressions, &lt;em&gt;A&lt;/em&gt; and &lt;em&gt;B&lt;/em&gt;. Now, B is functionally dependent upon A if B has exactly one value for a particular value of A. Consider this snippet of code:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT @A:=1      AS A
    -&amp;#62; ,      @B:=&lt;b&gt;@A + 1&lt;/b&gt; AS B;
+---+------+
&amp;#124; A &amp;#124; B    &amp;#124;
+---+------+
&amp;#124; 1 &amp;#124;    2 &amp;#124;
+---+------+
&lt;/pre&gt;
&lt;p&gt;Here, the column &lt;code&gt;B&lt;/code&gt; is functionally dependent upon column &lt;code&gt;A&lt;/code&gt;. The value of &lt;code&gt;B&lt;/code&gt; can be derived from the value of &lt;code&gt;A&lt;/code&gt; in a very straightforward manner, namely by adding &lt;code&gt;1&lt;/code&gt; to whatever the value of &lt;code&gt;A&lt;/code&gt; happens to be. We know how to compute the value of &lt;code&gt;B&lt;/code&gt; for any given value of &lt;code&gt;A&lt;/code&gt;, and for any given value of &lt;code&gt;A&lt;/code&gt;, the corresponding value of &lt;code&gt;B&lt;/code&gt; will always be the same. &lt;/p&gt;
&lt;p&gt;The functional dependency concept can also be be applied to multiple columns:&lt;/p&gt;
&lt;pre&gt;
SELECT CONCAT(A,B) C
FROM   someTable
&lt;/pre&gt;
&lt;p&gt;Column &lt;code&gt;C&lt;/code&gt; is defined as the result of &lt;code&gt;CONCAT(A,B)&lt;/code&gt; expression. If we have the values of both &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt; we can compute the result of &lt;code&gt;CONCAT(A,B)&lt;/code&gt;. Of course, for a given pair of expressions for &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt; the result of &lt;code&gt;CONCAT(A,B)&lt;/code&gt; will always be the same. Therefore, &lt;code&gt;C&lt;/code&gt; is functionally dependent upon the column pair &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note that it is not enough just to only know the method to calculate &lt;code&gt;B&lt;/code&gt; out of &lt;code&gt;A&lt;/code&gt;. Consider this example:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT @A:=1           AS A
    -&amp;#62; ,      @B:=&lt;b&gt;@A + RAND()&lt;/b&gt; AS B;
&lt;/pre&gt;
&lt;p&gt;Here, we know the recipe to derive the value of &lt;code&gt;A&lt;/code&gt; from the value of &lt;code&gt;A&lt;/code&gt;: we have to take the value of &lt;code&gt;A&lt;/code&gt; and add the value returned by a call to the &lt;code&gt;RAND()&lt;/code&gt; function. However, the &lt;code&gt;RAND()&lt;/code&gt; function will return a different value each time it is called. Therefore, the value of &lt;code&gt;B&lt;/code&gt; will be different too every time, and hence we cannot say that &lt;code&gt;B&lt;/code&gt; is functionally dependent upon &lt;code&gt;A&lt;/code&gt;. There is thus not a single value for &lt;code&gt;B&lt;/code&gt; for a given value of &lt;code&gt;A&lt;/code&gt;, and thus &lt;code&gt;B&lt;/code&gt; is not functionally dependent upon &lt;code&gt;A&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;Functional dependency and Normalization&lt;/h4&gt;
&lt;p&gt;The term &lt;code&gt;functional depencency&lt;/code&gt; is also used with regard to &lt;a href="http://en.wikipedia.org/wiki/Database_normalization" target="_wiki"&gt;normalization&lt;/a&gt;. Part of the normalization process involves discovering functional dependencies between different groups of columns. &lt;/p&gt;
&lt;p&gt;Normalization requires that each table has at least one &lt;em&gt;key&lt;/em&gt;. A key is a column or group of columns that may be used to identify a single record in the table. By definition, if we have a key, all non-key columns are functionally dependent upon the key. Another way to think about this is to imagine that we look up a row using a key. If the key entry exists, this will result in exactly one row. By definition, each column in that row has exactly one value, so the value of each non-key column can be determined by the key.&lt;/p&gt;
&lt;p&gt;Functional dependencies between a group of columns that makes up a key and any other group of columns is allowed, but functional dependencies between two groups of non-key columns are eliminated by the normalization process. (This is done by splitting off the groups of columns that expose the functional dependency to a new table, making one of the groups of columns a key in the new table.)&lt;/p&gt;
&lt;h4&gt;Functional dependency and &lt;code&gt;GROUP BY&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;We have just seen that the 1999 and 2003 versions of the SQL standard require that the columns appearing in the &lt;code&gt;SELECT&lt;/code&gt; list are functionally dependent upon the groups defined by the &lt;code&gt;GROUP BY&lt;/code&gt; clause. In other words, if we know that a column contains only one value for any given combination of values in the columns appearing in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, we may reference the column in the &lt;code&gt;SELECT&lt;/code&gt; list even if it does not appear in an aggregate expression. &lt;/p&gt;
&lt;p&gt;We&amp;#8217;ve also seen that if we have a (primary or unique) key, all columns that are not included in the key are by definition functionally dependent upon the key. This means that if we include all key columns in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, we can reference any column we like in the &lt;code&gt;SELECT&lt;/code&gt; list, even if they appear outside an aggregate expression.&lt;/p&gt;
&lt;p&gt;The following example from the &lt;a href="http://dev.mysql.com/doc/sakila/en/sakila.html" target="_mysql"&gt;sakila&lt;/a&gt; sample database might help to illustrate this:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT   film_id       -- primary key
    -&amp;#62; ,        &lt;b&gt;title&lt;/b&gt;         -- non-key column
    -&amp;#62; ,        COUNT(*)      -- one row per group
    -&amp;#62; FROM     sakila.film
    -&amp;#62; &lt;b&gt;GROUP BY film_id&lt;/b&gt;;      -- group by on primary key
+---------+-----------------------------+----------+
&amp;#124; film_id &amp;#124; title                       &amp;#124; COUNT(*) &amp;#124;
+---------+-----------------------------+----------+
&amp;#124;       1 &amp;#124; ACADEMY DINOSAUR            &amp;#124;        1 &amp;#124;
.         .                             .          .
.         .                             .          .
[...not showing 998 rows...]
.         .                             .          .
.         .                             .          .
&amp;#124;    1000 &amp;#124; ZORRO ARK                   &amp;#124;        1 &amp;#124;
+---------+-----------------------------+----------+
1000 rows in set (0.05 sec)
&lt;/pre&gt;
&lt;p&gt;Here, we query the &lt;code&gt;film&lt;/code&gt; table. The primary key of the &lt;code&gt;film&lt;/code&gt; table consists of only the &lt;code&gt;film_id&lt;/code&gt; column. The &lt;code&gt;GROUP BY&lt;/code&gt; clause contains the &lt;code&gt;film_id&lt;/code&gt; column. As a result, the query returns a collection of groups, each of which summarizes only one row. Of course, because there is only one row per group, there can be only one value in each of the other columns of the &lt;code&gt;film&lt;/code&gt; table. Therefore, it is safe to include whatever column we like in the &lt;code&gt;SELECT&lt;/code&gt; list. For this reason, it is perfectly ok to include the &lt;code&gt;film_title&lt;/code&gt; column in the &lt;code&gt;SELECT&lt;/code&gt; list.&lt;/p&gt;
&lt;p&gt;Of course, the &lt;code&gt;GROUP BY&lt;/code&gt; in the previous query does not make sense logically. Because the primary key of the &lt;code&gt;film&lt;/code&gt; table consists of only the &lt;code&gt;film_id&lt;/code&gt; column, we already know there can be only one row for any given value of &lt;code&gt;film_id&lt;/code&gt;. However, it becomes interesting when we include another table in the query. Consider the next example:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT    f.film_id
    -&amp;#62; ,         &lt;b&gt;f.title&lt;/b&gt;
    -&amp;#62; ,         COUNT(fa.actor_id)
    -&amp;#62; FROM      film        f
    -&amp;#62; LEFT JOIN film_actor  fa
    -&amp;#62; ON        f.film_id = fa.film_id
    -&amp;#62; &lt;b&gt;GROUP BY  f.film_id&lt;/b&gt;;
+---------+-----------------------------+----------+
&amp;#124; film_id &amp;#124; title                       &amp;#124; COUNT(*) &amp;#124;
+---------+-----------------------------+----------+
&amp;#124;       1 &amp;#124; ACADEMY DINOSAUR            &amp;#124;       10 &amp;#124;
.         .                             .          .
.         .                             .          .
[...not showing 998 rows...]
.         .                             .          .
.         .                             .          .
&amp;#124;    1000 &amp;#124; ZORRO ARK                   &amp;#124;        3 &amp;#124;
+---------+-----------------------------+----------+
1000 rows in set (0.02 sec)
&lt;/pre&gt;
&lt;p&gt;Here, we have added a &lt;code&gt;LEFT JOIN&lt;/code&gt; to calculate the number of actors per film. This time, the &lt;code&gt;GROUP BY&lt;/code&gt; clause on the &lt;code&gt;film_id&lt;/code&gt; does make sense: we now get one group of actors that play a role in each film. At the same time, we know that all columns in the &lt;code&gt;film&lt;/code&gt; table are functionally dependent upon the &lt;code&gt;film_id&lt;/code&gt; column. Each group returned by the &lt;code&gt;GROUP BY&lt;/code&gt; clause corresponds to exactly one row from the &lt;code&gt;film&lt;/code&gt; table, and this means that for each group, there is only one value in any column of the &lt;code&gt;film&lt;/code&gt; table. So, it&amp;#8217;s perfectly safe to reference the columns from the &lt;code&gt;film&lt;/code&gt; table in the &lt;code&gt;SELECT&lt;/code&gt; list, even if we don&amp;#8217;t use them in an aggregate expression.&lt;/p&gt;
&lt;p&gt;It is important to realize that we cannot reference just any column in the &lt;code&gt;SELECT&lt;/code&gt; list: we can only reference those columns that are functionally dependent upon the &lt;code&gt;film_id&lt;/code&gt; column of the &lt;code&gt;film&lt;/code&gt; table. This means that it is wrong to reference any column of the &lt;code&gt;film_actor&lt;/code&gt; table in the &lt;code&gt;SELECT&lt;/code&gt; list directly: we may do so only in an aggregate expression.&lt;/p&gt;
&lt;p&gt;The previous example demonstrates a pattern. The &lt;code&gt;film&lt;/code&gt; table acts as the so-called &lt;em&gt;master&lt;/em&gt;, and the &lt;code&gt;film_actor&lt;/code&gt; table acts as the &lt;em&gt;detail&lt;/em&gt;. The master-detail pattern is very common: Order and Order Items, Vendor and Products, Country and Cities are all examples of this pattern. &lt;/p&gt;
&lt;h3&gt;So why would I do that?&lt;/h3&gt;
&lt;p&gt;Ok, hopefully, I&amp;#8217;ve been able to explain under which circumstances it is safe for &lt;code&gt;GROUP BY&lt;/code&gt; queries to reference columns in the &lt;code&gt;SELECT&lt;/code&gt; list directly. One might wonder though what the advantages and disadvantages are. I mean, just because you can doesn&amp;#8217;t mean you should, right?&lt;/p&gt;
&lt;h4&gt;Disadvantages?&lt;/h4&gt;
&lt;p&gt;Well, there certainly are reasons for always writing a full &lt;code&gt;GROUP BY&lt;/code&gt; clause. &lt;/p&gt;
&lt;p&gt;First of all, many rdbms products will only allow a full &lt;code&gt;GROUP BY&lt;/code&gt; clause anyway, so in those cases there really is no choice. In MySQL however, we do have the choice as long as we are not using &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; in the &lt;code&gt;sql_mode&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;Another reason for always writing the &lt;code&gt;GROUP BY&lt;/code&gt; clause in full is that other developers might not understand under which circumstances it is ok to use a partial &lt;code&gt;GROUP BY&lt;/code&gt; clause. In many cases, they&amp;#8217;ve spent considerable time to learn to blindly repeat all &lt;code&gt;SELECT&lt;/code&gt; columns in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, and they will usually point out that it is wrong to not adhere to that rule. &lt;/p&gt;
&lt;p&gt;Of course, it is impossible to distinguish between a query that intentionally omits columns from the &lt;code&gt;GROUP BY&lt;/code&gt; clause and one that accidentally forgot to include them. When it is the intention to always write a full &lt;code&gt;GROUP BY&lt;/code&gt; clause, it is easy to verify whether the column references in the &lt;code&gt;GROUP BY&lt;/code&gt; clause and the the &lt;code&gt;SELECT&lt;/code&gt; list match. &lt;/p&gt;
&lt;p&gt;I have heard people argue that including only the key in the &lt;code&gt;GROUP BY&lt;/code&gt; clause will lead to problems when the definition of the key is changed. Personally, I think this is a bogus argument. When you are considering to change the definition of the key, you are most likely going to review all your queries anyway, because all your joins will need to reflect this change too. I just mean to say that changing a few &lt;code&gt;GROUP BY&lt;/code&gt;&amp;#8217;s here and there is probably least of your problems when you are considering to change the definition of a key.&lt;/p&gt;
&lt;p&gt;Another argument I have heard is that it is somehow &amp;#8216;more clear&amp;#8217;, &amp;#8216;cleaner&amp;#8217; or &amp;#8216;prettier&amp;#8217; to repeat all columns referenced in the &lt;code&gt;SELECT&lt;/code&gt; list in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. Personally, I think this is a bogus argument too. At the very least, these are all a matter of opinion.&lt;/p&gt;
&lt;h4&gt;Advantages&lt;/h4&gt;
&lt;p&gt;Personally, I feel it is more clear and prettier to &lt;code&gt;GROUP BY&lt;/code&gt; only on key columns where possible. I argued that it is a matter of opinion what is &amp;#8216;clear&amp;#8217; or &amp;#8216;pretty&amp;#8217;, so I must discard this argument likewise.&lt;/p&gt;
&lt;p&gt;I would argue that full &lt;code&gt;GROUP BY&lt;/code&gt; clauses are harder to maintain. Many changes will require two edits of the code instead of one. Of course, this might or might not outweigh any of the advantages of a full &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;p&gt;A full &lt;code&gt;GROUP BY&lt;/code&gt; clause might be slower than a partial one. The following query finds all film titles that gathered more than 300 $ worth of payments:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT    &lt;b&gt;f.film_id&lt;/b&gt;
    -&amp;#62; ,         &lt;b&gt;f.title&lt;/b&gt;
    -&amp;#62; ,         sum(p.amount) sum_amount
    -&amp;#62; FROM      film f
    -&amp;#62; LEFT JOIN inventory i
    -&amp;#62; ON        f.film_id = i.film_id
    -&amp;#62; LEFT JOIN rental r
    -&amp;#62; ON        i.inventory_id = r.inventory_id
    -&amp;#62; LEFT JOIN payment p
    -&amp;#62; ON        r.rental_id = p.rental_id
    -&amp;#62; &lt;b&gt;GROUP BY  f.film_id&lt;/b&gt;
    -&amp;#62; HAVING    sum_amount &amp;#62; 300;
Empty set (0.18 sec)
&lt;/pre&gt;
&lt;p&gt;Using only the &lt;code&gt;film_id&lt;/code&gt; column in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, it takes &lt;code&gt;0.18&lt;/code&gt; seconds to discover there are no film titles that satisfy this criterion. Now, let&amp;#8217;s compare this to the equivalent query using a full &lt;code&gt;GROUP BY&lt;/code&gt; clause:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT    &lt;b&gt;f.film_id&lt;/b&gt;
    -&amp;#62; ,         &lt;b&gt;f.title&lt;/b&gt;
    -&amp;#62; ,         sum(p.amount) sum_amount
    -&amp;#62; FROM      film f
    -&amp;#62; LEFT JOIN inventory i
    -&amp;#62; ON        f.film_id = i.film_id
    -&amp;#62; LEFT JOIN rental r
    -&amp;#62; ON        i.inventory_id = r.inventory_id
    -&amp;#62; LEFT JOIN payment p
    -&amp;#62; ON        r.rental_id = p.rental_id
    -&amp;#62; &lt;b&gt;GROUP BY  f.film_id&lt;/b&gt;
    -&amp;#62; ,         &lt;b&gt;f.title&lt;/b&gt;
    -&amp;#62; HAVING    sum_amount &amp;#62; 300;
Empty set (0.51 sec)
&lt;/pre&gt;
&lt;p&gt;This query takes almost three times as long to complete! With &lt;code&gt;EXPLAIN&lt;/code&gt; we can retrieve the execution plans for these queries. Without a full &lt;code&gt;GROUP BY&lt;/code&gt; clause, we see a fairly normal execution plan:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; EXPLAIN
    -&amp;#62; SELECT    f.film_id
    -&amp;#62; ,         f.title
    -&amp;#62; ,         sum(p.amount) sum_amount
    -&amp;#62; FROM      film f
    -&amp;#62; LEFT JOIN inventory i
    -&amp;#62; ON        f.film_id = i.film_id
    -&amp;#62; LEFT JOIN rental r
    -&amp;#62; ON        i.inventory_id = r.inventory_id
    -&amp;#62; LEFT JOIN payment p
    -&amp;#62; ON        r.rental_id = p.rental_id
    -&amp;#62; GROUP BY  f.film_id
    -&amp;#62; HAVING    sum_amount &amp;#62; 300
    -&amp;#62; G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: f
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 953
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: i
         type: ref
possible_keys: idx_fk_film_id
          key: idx_fk_film_id
      key_len: 2
          ref: sakila.f.film_id
         rows: 2
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: r
         type: ref
possible_keys: idx_fk_inventory_id
          key: idx_fk_inventory_id
      key_len: 3
          ref: sakila.i.inventory_id
         rows: 1
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
         type: ref
possible_keys: fk_payment_rental
          key: fk_payment_rental
      key_len: 5
          ref: sakila.r.rental_id
         rows: 1
        Extra:
4 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;p&gt;With a full &lt;code&gt;GROUP BY&lt;/code&gt; list, we notice a difference for the &lt;code&gt;film&lt;/code&gt; table:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; EXPLAIN
    -&amp;#62; SELECT    f.film_id
    -&amp;#62; ,         f.title
    -&amp;#62; ,         sum(p.amount) sum_amount
    -&amp;#62; FROM      film f
    -&amp;#62; LEFT JOIN inventory i
    -&amp;#62; ON        f.film_id = i.film_id
    -&amp;#62; LEFT JOIN rental r
    -&amp;#62; ON        i.inventory_id = r.inventory_id
    -&amp;#62; LEFT JOIN payment p
    -&amp;#62; ON        r.rental_id = p.rental_id
    -&amp;#62; GROUP BY  f.film_id
    -&amp;#62; ,         f.title
    -&amp;#62; HAVING    sum_amount &amp;#62; 300
    -&amp;#62; G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: f
         type: index
possible_keys: NULL
          key: idx_title
      key_len: 767
          ref: NULL
         rows: 953
        &lt;b&gt;Extra: Using index; Using temporary; Using filesort&lt;/b&gt;
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: i
         type: ref
possible_keys: idx_fk_film_id
          key: idx_fk_film_id
      key_len: 2
          ref: sakila.f.film_id
         rows: 2
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: r
         type: ref
possible_keys: idx_fk_inventory_id
          key: idx_fk_inventory_id
      key_len: 3
          ref: sakila.i.inventory_id
         rows: 1
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
         type: ref
possible_keys: fk_payment_rental
          key: fk_payment_rental
      key_len: 5
          ref: sakila.r.rental_id
         rows: 1
        Extra:
4 rows in set (0.01 sec)
&lt;/pre&gt;
&lt;p&gt;In case you did not yet notice, the &lt;code&gt;film&lt;/code&gt; table now has &lt;code&gt;Extra: Using index; Using temporary; Using filesort&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;What I think that is happening is that MySQL takes the &lt;code&gt;GROUP BY&lt;/code&gt; clause literally and performs the &lt;code&gt;GROUP BY&lt;/code&gt; algorithm for each of the specified expressions. MySQL implements &lt;code&gt;GROUP BY&lt;/code&gt; by sorting the rows according to the &lt;code&gt;GROUP BY&lt;/code&gt; expressions. In this particular case, adding the &lt;code&gt;title&lt;/code&gt; column to the &lt;code&gt;GROUP BY&lt;/code&gt; clause does not allow the server to sort the rows in-memory, and forces the &lt;code&gt;GROUP BY&lt;/code&gt; to be evaluated using a temporary table and a file sort. This requires extra IO operations which is causing a decrease in performance.&lt;/p&gt;
&lt;p&gt;Of course, it would be nice if MySQL was smart enough to deduce that the result cannot possibly be different due to including the &lt;code&gt;title&lt;/code&gt; column in the &lt;code&gt;GROUP BY&lt;/code&gt; list. It could attempt to detect that a key of the &lt;code&gt;film&lt;/code&gt; table is included in the &lt;code&gt;GROUP BY&lt;/code&gt; list and that the &lt;code&gt;title&lt;/code&gt; column can be completely ignored when evaluating the &lt;code&gt;GROUP BY&lt;/code&gt; clause, because there will be exactly one value in the &lt;code&gt;title&lt;/code&gt; column for each &lt;code&gt;film_id&lt;/code&gt; column. But, then again, MySQL does not require us to write a full group by list. So if performance is paramount, be smart and do not write a full &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;h4&gt;Aggregating on functionally dependent columns&lt;/h4&gt;
&lt;p&gt;We just argued that it is safe to include columns in the &lt;code&gt;SELECT&lt;/code&gt; list as long as these columns are functionally dependent upon the &lt;code&gt;GROUP BY&lt;/code&gt; list. The reasoning is that since the columns are functionally dependent upon the &lt;code&gt;GROUP BY&lt;/code&gt; clause, there will be one value for each result group anyway. It was also shown that it may be a bad idea to include these columns in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, as it can hurt performance.&lt;/p&gt;
&lt;p&gt;For some people, it may still seem unacceptable to include functionally dependent columns in the &lt;code&gt;SELECT&lt;/code&gt; list without referencing these columns in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. For example, you might be using a rdbms that requires all columns that are not referenced in the &lt;code&gt;GROUP BY&lt;/code&gt; to be aggregated. In those cases is might be better to apply an aggregate function to the functionally dependent column rather than including it in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. Consider the following query:&lt;/p&gt;
&lt;pre&gt;
mysql&amp;#62; SELECT    f.film_id
    -&amp;#62; ,         &lt;b&gt;MAX(f.title) AS title&lt;/b&gt;
    -&amp;#62; ,         sum(p.amount) sum_amount
    -&amp;#62; FROM      film f
    -&amp;#62; LEFT JOIN inventory i
    -&amp;#62; ON        f.film_id = i.film_id
    -&amp;#62; LEFT JOIN rental r
    -&amp;#62; ON        i.inventory_id = r.inventory_id
    -&amp;#62; LEFT JOIN payment p
    -&amp;#62; ON        r.rental_id = p.rental_id
    -&amp;#62; &lt;b&gt;GROUP BY  f.film_id&lt;/b&gt;
    -&amp;#62; HAVING    sum_amount &amp;#62; 300;
Empty set (0.20 sec)
&lt;/pre&gt;
&lt;p&gt;Again, we &lt;code&gt;GROUP BY&lt;/code&gt; on the &lt;code&gt;film_id&lt;/code&gt; column, which makes up the primary key of the &lt;code&gt;film&lt;/code&gt; table. This time however, we apply the &lt;code&gt;MAX&lt;/code&gt; aggregate function on the &lt;code&gt;title&lt;/code&gt; column. We know that there is only one value in the &lt;code&gt;title&lt;/code&gt; column for each value in the &lt;code&gt;film_id&lt;/code&gt; column, so the aggregation will not influence the result. In fact, we could&amp;#8217;ve used &lt;code&gt;MIN&lt;/code&gt; equally well. &lt;/p&gt;
&lt;p&gt;These aggregate functions will return the right value for exactly the same reason why it is safe to not include the functionally dependent column in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. Technically the aggregation is therefore unnecessary: it is just a trick to fool the rdbms.&lt;/p&gt;
&lt;p&gt;Most likely, applying the aggregate function will be somewhat slower than not applying the aggregate. However, it will in most cases be faster than including the functionally dependent column in the &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Contrary to popular belief, the SQL standard does not require &lt;code&gt;GROUP BY&lt;/code&gt; queries to reference all non-aggregated columns from the &lt;code&gt;SELECT&lt;/code&gt; list in the &lt;code&gt;GROUP BY&lt;/code&gt; clause. As of the 1999 version of the SQL standard, it is  explicitly allowed for the &lt;code&gt;SELECT&lt;/code&gt; list to reference non-aggregated expressions as long as they are functionally dependent upon the &lt;code&gt;GROUP BY&lt;/code&gt; list. &lt;/p&gt;
&lt;p&gt;Each expression that has exactly one value for each group defined by the &lt;code&gt;GROUP BY&lt;/code&gt; clause is functionally dependent upon the &lt;code&gt;GROUP BY&lt;/code&gt; clause. Functional dependencies can be witnessed in a common query pattern: whenever we have a join between a master and a detail table to calculate aggregates over the detail rows for each row from the master, we can &lt;code&gt;GROUP BY&lt;/code&gt; over the primary or unique key from the master. All non-key columns of the master row will be functionally dependent upon the key, and can thus appear in the &lt;code&gt;SELECT&lt;/code&gt;-list outside an aggregate expression.&lt;/p&gt;
&lt;p&gt;In MySQL, one can write &lt;code&gt;GROUP BY&lt;/code&gt; queries that reference non-aggregated columns in the &lt;code&gt;SELECT&lt;/code&gt; list that are not included in the &lt;code&gt;GROUP BY&lt;/code&gt; clause, even if these columns are not functionally dependent upon the &lt;code&gt;GROUP BY&lt;/code&gt; clause. This behaviour conforms to none of the SQL standard&amp;#8217;s versions. It is possible to avoid this behaviour by including &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; in the &lt;code&gt;sql_mode&lt;/code&gt; server setting, but it might make more sense to take advantage of the ability to write only partial &lt;code&gt;GROUP BY&lt;/code&gt; clauses. &lt;/p&gt;
&lt;p&gt;In a nutshell:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is completely safe to write partial &lt;code&gt;GROUP BY&lt;/code&gt; clauses as long as all non-aggregated columns in the &lt;code&gt;SELECT&lt;/code&gt; list are functionally dependent upon the &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/li&gt;
&lt;li&gt;A partial &lt;code&gt;GROUP BY&lt;/code&gt; list can result in better performance, because it keeps the server from evaluating the entire &lt;code&gt;GROUP BY&lt;/code&gt; list.&lt;/li&gt;
&lt;li&gt;If one does not want to write partial &lt;code&gt;GROUP BY&lt;/code&gt; clauses, consider using &lt;code&gt;MIN&lt;/code&gt; or &lt;code&gt;MAX&lt;/code&gt; to &amp;#8216;aggregate&amp;#8217; the functionally dependent columns in the &lt;code&gt;SELECT&lt;/code&gt; list rather than moving the functionally dependent columns to the &lt;code&gt;GROUP BY&lt;/code&gt; clause.&lt;/li&gt;
&lt;/ul&gt;
    </content>
</entry>
<entry>
    <title>Hacking MySQL table logs</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/05/hacking_mysql_table_logs.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.20042</id>
    
    <published>2007-05-08T18:30:23Z</published>
    <updated>2007-11-26T18:06:01Z</updated>
    
    <summary>Shortly before MySQL Users Conference I announced that I would be cover new ground in table logs management. I am keeping that promise, and in addition I am also showing some related hacks. The announced facts from last year usability...</summary>
    <author>
        <name>Giuseppe Maxia</name>
            </author>
            <category term="Technical" />
        <content type="html">
&lt;p&gt;Shortly before MySQL Users Conference I &lt;a href="http://datacharmer.blogspot.com/2007/04/logs-on-demand-dbas-prayer-come-true.html"&gt;announced&lt;/a&gt; that I would be cover new ground in table logs management.&lt;br /&gt;
I am keeping that promise, and in addition I am also showing some related hacks.&lt;/p&gt;
&lt;p&gt;The announced facts from &lt;a href="http://datacharmer.org/drafts/usability_report.html"&gt; last year usability report&lt;/a&gt;  were that you can&amp;#39;t change log tables at will, as you can do with log files, and you can&amp;#39;t change the log table engine to &lt;span class="caps"&gt;FEDERATED.&lt;/span&gt; Both claims, as it turned out, were incorrect. You can do such things, albeit not in a straightforward manner. As a bonus side effect, you can also:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;add triggers to log tables;&lt;/li&gt;
&lt;li&gt;filter log tables depending on user defined criteria, such as query type, user database, or time;&lt;/li&gt;
&lt;li&gt;centralize logs from several servers.  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;span class="caps"&gt;DISCLAIMER&lt;/span&gt;&lt;/strong&gt;. The information in this article is essentially a hack. It is not recommended nor endorsed by MySQL &lt;span class="caps"&gt;AB.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h2&gt;Overview: switching logs basics&lt;/h2&gt;
&lt;p&gt;Switching log files is easily done in just one step, assuming that you are using files as log output.&lt;br /&gt;
To enable file usage, you must enable the &lt;em&gt;general_log&lt;/em&gt; variable and set the &lt;em&gt;log_output&lt;/em&gt; variable to &amp;#39;FILE&amp;#39;.&lt;br /&gt;
In this situation, changing to a dedicated  &lt;em&gt;sometask.log&lt;/em&gt; file requires just one command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
set global general_log_file=&amp;#39;/tmp/sometask.log&amp;#39;;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Going back to the default log file, or to any other, is just a command away:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
set global general_log_file=&amp;#39;/usr/local/mysql/data/general.log&amp;#39;;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Switching tables is a little trickier. The default log table location (_mysql.general_log_) is not negotiable. Thus, we need to adopt a more devious approach.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;First, we create a table with the same structure as the default one.
&lt;pre&gt;&lt;code&gt;
DROP TABLE IF EXISTS gl0, gl1,
CREATE TABLE mysql.gl1 like mysql.general_log;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;If needed, we change the engine to a something more efficient than &lt;span class="caps"&gt;CSV.&lt;/span&gt; This means basically MyISAM, since any other engine will be refused. But the following hacks will circumvent this limitation as well. One of the reasons for changing engine is if we want to add a key that we plan to user later for searches.
&lt;pre&gt;&lt;code&gt;
ALTER TABLE gl1 ENGINE=MyISAM, KEY (user_host);
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;Then, we rename both tables at once:
&lt;pre&gt;&lt;code&gt;
RENAME TABLE general_log to gl0, gl1 to general_log;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That&amp;#39;s all. The default table has always the same name, but we can switch to one table to another, if we need to log action during a given task.&lt;br /&gt;
The above method is also important because it is the basis for the following hacks, which make dynamic logging much more interesting.&lt;/p&gt;
&lt;h2&gt;How to: smarter logs&lt;/h2&gt;
&lt;p&gt;We are now ready to take the hack to a new dimension. The original usability report mentions that you can only use the &lt;span class="caps"&gt;CSV &lt;/span&gt;and MyISAM engines for logging. Any attempt to assign a different engine will fail. While this is technically accurate, we can use the above trick to cheat the server into using a &lt;span class="caps"&gt;FEDERATED &lt;/span&gt;table for logging.&lt;/p&gt;
&lt;p&gt;If you have dealt with general logs before, you would know that logs introduce two administrative problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;they grow large very quickly, eating up precious storage space;&lt;/li&gt;
&lt;li&gt;if the server crashes for a hardware failure, your logs are lost.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A simple way of dealing with the first problem is using the Archive storage engine for logging. You can&amp;#39;t assign it directly, but you can rename an existing table.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
# Squeezing table logs
DROP TABLE IF EXISTS gl0, gl1,
CREATE TABLE gl1 like general_log;
ALTER TABLE gl1 ENGINE=ARCHIVE;
RENAME TABLE general_log to gl0, gl1 to general_log;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;span class="caps"&gt;ARCHIVE &lt;/span&gt;table will work exactly as the &lt;span class="caps"&gt;CSV &lt;/span&gt;one, with the important difference that it will only take 15~20% of its storage. Since log tables must only support &lt;span class="caps"&gt;INSERT &lt;/span&gt;statements, using an Archive engine makes sense.&lt;/p&gt;
&lt;h2&gt;How to: centralized remote logs&lt;/h2&gt;
&lt;p&gt;Another way is to send the logs to a different server. The second issue can be solved with the same trick. Current table logging implementation does not allow triggers, but if you have federated your log table to a remote one, nothing prevents the usage of triggers as well.&lt;/p&gt;
&lt;p&gt;To create remote logs, act as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install a MySQL server on a box with sufficient storage for your purposes. The version is not important, but if you want to use triggers, version 5.0 or higher is necessary.&lt;/li&gt;
&lt;li&gt;On the remote server, create a database for logging.&lt;/li&gt;
&lt;li&gt;Inside that database, create a table with the same data structure of the general_log table, using your favorite storage engine.&lt;/li&gt;
&lt;li&gt;In the local server, create a table with the same structure as the general log, federated to the remote server.&lt;/li&gt;
&lt;li&gt;Use the technique illustrated in the previous paragraph to rename the new table to the general_log table.&lt;/li&gt;
&lt;li&gt;Enable table logging and start seeing the effects.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;
# remote server
CREATE SCHEMA logs;
USE logs;
CREATE TABLE mylog LIKE mysql.general_log;
ALTER TABLE mylog ENGINE=MyISAM, KEY (user_host);

# local server
create server logserver foreign data wrapper mysql options (
host &amp;#39;remote_server.net&amp;#39;,
database &amp;#39;logs&amp;#39;,
port 3306,
user &amp;#39;remote_user_name&amp;#39;,
password &amp;#39;remote_secret&amp;#39;);

use mysql;
DROP TABLE IF EXISTS gl1, gl0;
CREATE TABLE gl1 (
event_time timestamp NOT NULL,
user_host mediumtext,
thread_id int DEFAULT NULL,
server_id int DEFAULT NULL,
command_type varchar(64) DEFAULT NULL,
argument mediumtext
) ENGINE = FEDERATED
CONNECTION = &amp;#39;logserver/mylog&amp;#39;;
# Note: MySQL 5.1.18 is required for the above syntax
# For older versions, you can use the old syntax instead:
# CONNECTION=&amp;#39;mysql://user:password@hostname:port/schema/table&amp;#39;

RENAME TABLE general_log to gl0, gl1 to general_log;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the preparation is over, your general log will end up in the remote server.&lt;/p&gt;
&lt;p&gt;Once this organization is in place, you may start thinking about expanding it. If you can relocate log activity from one server, you can do the same for multiple ones. &lt;br /&gt;
&lt;img src="http://datacharmer.org/img/logs1.jpg" /&gt;&lt;br /&gt;
What happens if you repeat the same steps for another server, pointing to the same remote server,schema and table? Exactly what you would expect. All the queries from both servers will end up in the same table. You will be centralizing your logs, but you need a way of telling the queries from different servers apart. It can be easily accomplished. The log table includes a &lt;em&gt;server_id&lt;/em&gt; column, which records the unique ID of the server. If you assign a different server ID to each server, then you can use a centralized remote log for several servers.&lt;br /&gt;
On each server, issue the command&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
SET GLOBAL server_id = &amp;#60;a unique number&amp;#62;;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and your centralized logs will be also easily searchable.&lt;/p&gt;
&lt;h2&gt;More hacks: filtering with triggers&lt;/h2&gt;
&lt;p&gt;We mentioned before that logs don&amp;#39;t allow triggers. That&amp;#39;s a limitation builtin in the &lt;em&gt;mysql&lt;/em&gt; database, where no table can have triggers. However, we are federating a table to an remote server, in a &lt;em&gt;normal&lt;/em&gt; database, where triggers are not restricted.&lt;br /&gt;
Thus, although we can&amp;#39;t add a trigger to the log table itself, nothing prevents us from adding one to the shadow table in the remote server.&lt;/p&gt;
&lt;p&gt;Let&amp;#39;s consider the common case where the &lt;span class="caps"&gt;DBA &lt;/span&gt;wants to log in two separate tables events related to sales and customers, and discard all the rest.&lt;br /&gt;
That is simply done. Filtering the wanted records and inserting them into the appropriate tables can be done inside a trigger. Discarding the rest is tricky, and it could not be done with standard &lt;span class="caps"&gt;SQL.&lt;/span&gt; You need to assign the remote log table to the BlackHole storage engine, which is an engine that simply discards everything. The data is passed to the table, and thus a &lt;em&gt;&lt;span class="caps"&gt;BEFORE INSERT&lt;/span&gt;&lt;/em&gt; index will work just fine.&lt;br /&gt;
The steps to achieve our goal are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create the remote log tables that will store the wanted records;&lt;/li&gt;
&lt;li&gt;Alter the remote log table, so that it uses the BlackHole engine;&lt;/li&gt;
&lt;li&gt;create a trigger that filters the records and inserts the right ones in the auxiliary tables.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the following example, the trigger performs a brutish check to see if the query contains &amp;#39;customer&amp;#39; or &amp;#39;sales&amp;#39;. In a real world situation you may want to perform a more exhaustive check, but for now this would do.&lt;/p&gt;
&lt;p&gt;&lt;img src="http://datacharmer.org/img/logs2.jpg" /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
delimiter //

create trigger logs_bi
    BEFORE INSERT
    ON logs
    FOR EACH ROW
begin
    if (new.argument regexp &amp;#39;customer&amp;#39;)
    then
        insert into log_customers
            (event_time, user_host, thread_id,
            server_id, command_type, argument)
        values
            (new.event_time, new.user_host, new.thread_id,
            new.server_id, new.command_type, new.argument);
    end if;
    if (new.argument regexp &amp;#39;sales&amp;#39;)
    then
        insert into log_sales
            (event_time, user_host, thread_id,
            server_id, command_type, argument)
        values
            (new.event_time, new.user_host, new.thread_id,
            new.server_id, new.command_type, new.argument);
    end if;
end //

delimiter ;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt; Wrapping up&lt;/h2&gt;
&lt;p&gt;Table logs are an excellent addition in MySQL 5.1. They allow logging on demand, which is one of the most appreciated features for &lt;span class="caps"&gt;DBA&lt;/span&gt;s. &lt;br /&gt;
If you add some flexibility in your choice of storage engines for logging, this feature can be truly a bless for demanding administrative tasks.&lt;/p&gt;
&lt;p&gt;It would be desirable to change log tables storage engine without much fuss. MySQL developers are working on it. In the meantime, you can use the workarounds in this article to twist your logs the way you want them.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;I would like to thank Tobias Asplund and Petr Chardin for useful hints to complete these hacks.&lt;/i&gt;&lt;/p&gt;
    </content>
</entry>
<entry>
    <title>Your Turn to Play Stump the Author...</title>
    <link rel="alternate" type="text/html" href="http://www.oreillynet.com/databases/blog/2007/05/your_turn_to_play_stump_the_au.html" />
    <id>tag:www.oreillynet.com,2007:/databases/blog//6.20041</id>
    
    <published>2007-05-08T17:51:05Z</published>
    <updated>2007-05-08T17:51:06Z</updated>
    
    <summary>Thought I&apos;d pass along this note I received, that may be of interest to the MySQL addicts out there: &gt;&gt; Sasha Pachev, whose book Understanding MySQL Internals was released &gt;&gt; last month by O&apos;Reilly, is leading an online seminar at...</summary>
    <author>
        <name>James Turner</name>
            </author>
            <category term="News" />
        <content type="html">
&lt;p&gt;Thought I&amp;#8217;d pass along this note I received, that may be of interest to the MySQL addicts out there:&lt;/p&gt;
&lt;p&gt;&gt;&gt; Sasha Pachev, whose book Understanding MySQL Internals was released&lt;br /&gt;
&gt;&gt; last month by O&amp;#8217;Reilly, is leading an online seminar at MySQL AB on&lt;br /&gt;
&gt;&gt; &amp;#8220;Improving query performance through a better understanding of the&lt;br /&gt;
&gt;&gt; optimizer&amp;#8221;:&lt;br /&gt;
&gt;&gt;&lt;br /&gt;
&gt;&gt;   http://www.mysql.com/news-and-events/web-seminars/sasha.php&lt;br /&gt;
&gt;&gt;&lt;br /&gt;
&gt;&gt; You can present Sasha with your own SQL queries during this webinar&lt;br /&gt;
&gt;&gt; and learn how to interpret output of the EXPLAIN command to improve&lt;br /&gt;
&gt;&gt; your performance. This webinar is also a useful accompaniment to&lt;br /&gt;
&gt;&gt; Understanding MySQL Internals, which contains extensive information&lt;br /&gt;
&gt;&gt; on EXPLAIN and the behavior of the optimizer exposed by it&lt;/p&gt;
    </content>
</entry>
</feed> 
