Technical Archives

Roland Bouman

AddThis Social Bookmark Button

MySQL 5.1 offers an extremely useful feature called information_schema plugins. This feature allows dynamic runtime loading of a shared library into the MySQL server to implement a table in the information_schema database. The SQL standard (ISO/IEC 9075-11:2003) allows database implementations to extend the information_schema. MySQL 5.1 transfers the possibility to do this directly to privileged database users so they can extend the information_schema themselves, in any way they see fit.

In this article, we will demonstrate how to create a minimal “Hello, World!” MySQL information schema plugin. In a forthcoming article, we’ll demonstrate how information schema plugins may be used to report some of the server’s internals such as the contents of the query cache, session level objects such as the currently defined TEMPORARY tables, user-defined variables and SAVEPOINTs.

Roland Bouman

AddThis Social Bookmark Button

I believe I found a new method to calculate the median in MySQL. I would not be surprised if this method has been figured out by somebody else already. However, I can’t seem to find any resources on the internet describing this method, so for now I flatter myself by assuming the method is original.

(Please do post your comments to this blog to correct me on that should I be wrong so I have a chance to rectify.)

The method I’m describing is a one-pass, pure SQL method. It does not require subqueries, cursors or user variables. However, it does rely on the MySQL specific functions GROUP_CONCAT() and SUBSTRING_INDEX()

I’ll be maintaining a snippet for this method at MySQL Forge.
If you want to know what the median is, and how my snippet works, read on.

Roland Bouman

AddThis Social Bookmark Button

The Date dimension is a well known construct in general data warehousing. In many cases, the data for a date dimension is generated using a database stored procedure or shell-script.

Another approach to obtain the data for a date dimension is to generate it using an ETL tool like Pentaho Data Integration, a.k.a. Kettle. I think this approach makes sense for a number of reasons:

  • When you tend to use a particular ETL tool, you will be able to reuse the date dimension generator over an over, and on different database platforms.
  • You won’t need special database privileges beyond the ones you need already. Privileges for creating tables and to perform DML will usually be available, whereas you might need to convince a DBA that you require extra privileges to create and execute stored procedures.

In addition to these general considerations, you can pull a neat little trick with Kettle to localize the data and format of the date attributes. I wouldn’t go as far as to say that this feature is Kettle specific: rather, it relies on the localization support built into the java platform and the way you can put that to use in Kettle transformations.

Giuseppe Maxia

AddThis Social Bookmark Button

Shortly before MySQL Users Conference I announced that I would be cover new ground in table logs management.
I am keeping that promise, and in addition I am also showing some related hacks.

The announced facts from last year usability report were that you can't change log tables at will, as you can do with log files, and you can't change the log table engine to FEDERATED. Both claims, as it turned out, were incorrect. You can do such things, albeit not in a straightforward manner. As a bonus side effect, you can also:

  • add triggers to log tables;
  • filter log tables depending on user defined criteria, such as query type, user database, or time;
  • centralize logs from several servers.
Roland Bouman

AddThis Social Bookmark Button

Some people believe that the ability to commit or rollback multiple statements as a single unit is a defining characteristic of the concept of transactions. They draw the -false- conclusion that enabling autocommit is the same as disabling transactions. It is easy to demonstrate why this is not true.

chromatic

AddThis Social Bookmark Button

When Chris Dolan isn’t helping the fantastic Perl::Critic project, he’s automating away other painful rough edges of the world’s software. He’s just posted a journal entry on improving MySQL’s foreign key syntax. I’ll borrow that technique for certain. Kudos also have to go to the SQL::Translator team.

AddThis Social Bookmark Button

There are several methods to do live MySQL database backups. These methods are either storage engine specific (InnoDB hot backup, mysqlhotcopy), or require read locks (mysqldump), or require additional hardware (backup using replication slaves, LVM snapshot). Some of these backup methods can do backups of remote MySQL servers also.  So far, there is no backup  method that will provide storage engine agnostic, consistent full backups of local and remote servers.   Goal of Zmanda Recovery Manager (ZRM) of MySQL is to consolidate all these and future methods of MySQL backup and use the optimal method for the MySQL configuration.

One of exciting developments in MySQL is the development of MySQL Online backup. The functional specification for the MySQL Online Backup APIs are available in MySQL forge and currently, the initial implementation of ARCHIVE storage engine backup and recovery is available in the bitkeeper source tree.

You can read more about MySQL Online Backup and ZRM for MySQL in our blog.

Giuseppe Maxia

AddThis Social Bookmark Button

MySQL has recently started a campaign of open contribution, inviting the community to participate to the MySQL project in many ways.

The next target, also considering the higer stakes coming from the MySQL Enterprise challenge, will be Quality Assurance.

AddThis Social Bookmark Button

MySQL databases are integral part of Web based applications such as Wikis, Forums, Blogs and even enterprise applications such as Sugar CRM. These applications allow for minimal downtime and are expected to provide web application services 24 hours a day and 365 days a year. MySQL administrators have a difficult time in figuring out when to do the back ups. For example: If they schedule the application backup at midnight local time, it will be prime time for users in other side of the world.

ZRM for MySQL 1.1 release provides an interesting feature that allows administrators to create flexible scheduling plugins. The plugins can be used to delay a backup run or retry the backup run later or even the skip the backup run. The plugins can use application load (for example: number of user sessions in a Wiki) or server load (for example: number of unique visitors accessing the webserver) or any other performance load measurement to determine when to start the backup run. The plugin interface is simple. Administrator can create a simple plugin script to measure the application performance and return information on when to schedule the backup run.

This plugin provides a flexible backup window for the administrator that keeps the impact of backup process on application to the minimum. Another use case for the plugin is to avoid doing backups when there are application changes (for example: doing scheduled backups in the middle of application upgrade).

Roland Bouman

AddThis Social Bookmark Button

Peter Zaitsev’s blog entry on Duplicate indexes and redundant indexes certainly made a bit of a stir! It has already led to a new (mini) project by Baron Schwartz, the Duplicate index/foreign key finder which is publicly available on MySQLForge as a perl script. Daniel Schneller has entered the arena as well, devoting an entire blog to his own java implementation to tackle this and other index trouble.

I figured it would be fun to add a pure SQL solution for the problem. This is not too hard to implement with the STATISTICS system view found in the MySQL information_schema database. This solution is limited in that it will only work for MySQL 5.0 (and higher). On the other hand, it has the advantage that it does not require any external programming languages or runtime environments.

The solution I’m presenting here also takes the uniqueness of indexes into account, something that - to the best of my knowledge - has not been discussed by the other authors (although I believe that Daniel Schneller’s tool does take uniqueness into account).

Those that don’t feel like reading the article, hey - you decide. The code is available on MySQLForge as a snippet. The article just explains a little bit of the background of the problem, and explores some techniques to solve it using a pure SQL solution. It finishes with a detailed explanation of the actual query, which you may or may not find interesting to read.

AddThis Social Bookmark Button

With the use of MySQL in the enterprise applications and Web 2.0 applications, Backup and recovery of MySQL databases has become one of the important tasks for a database administrator. There are lots of methods to do application consistent MySQL database backups. The backup method used depends on MySQL configuration as well as the MySQL storage engines used by the MySQL database. These backup methods mostly deal with full
backup of the database.

MySQL maintains binary logs for supporting MySQL server replication. These binary logs can be used to selectively recover MySQL databases. MySQL binary logs contain all database events (SQL statements) that modify data or could have modified data. These binary logs can be used as incremental backups
of the MySQL server.

MySQL server does not create binary logs by default. It has to be enabled as a command line option to MySQL server. Enabling MySQL binary logging has minimal impact on the MySQL database application performance. MySQL provides mysqlbinlog tool to process binary logs. This tool can be used for selectively restoring database events based on when it occurred or position in the binary log.

Zmanda Recovery Manager(ZRM) for MySQL , a comprehensive MySQL backup and recovery manager, provides a mechanism to list the database events, time when the event occurred and the position in the binary log. This information can be used for selective recovery of database to any point in time between backups. I’m one of the ZRM for MySQL project developers. This is an open source project (GPL).

Selective recovery of databases can also be used for recovery from operator errors such as execution of DROP TABLE or DROP DATABASE statements or intentional sabotage of the database by an user. Of course, without regular backups, it is not possible to do selective recovery. It is also important to do restore audits of the backup images so that the backups are available when things go wrong.

chromatic

AddThis Social Bookmark Button

MySQL’s Brian Aker posts about removing the BDB storage engine from MySQL. There’s no conspiracy, just a dual desire to remove a bottleneck from maintaining unmaintained code and the desire to move everything behind useful and usable interfaces. Brian calls it eating their own dogfood, at least for using the plugin API.

I like days when I can remove code and add features.

Giuseppe Maxia

AddThis Social Bookmark Button

Sometimes my customers (especially the ones who are used to different database servers,) ask me about the CPU and memory consumption of a MySQL server process.
Initially I thought that this kind of requests were rather odd, mostly because the whole box was dedicated to the database, thus measuring the global usage of CPU and memory could give a rough estimate of how well the database server is performing.
With time, I came to appreciate the request, and I started digging into the operating system intricacies to find out as much as I could about the resources used by a single process. This information can be useful if there are different services running in the same machine, and it could be of paramount importance to find out if your DBMS process can exceed the operating system resources.

My measurements are referred to the GNU/Linux operating system. Other systems have different ways of gathering the same data.

Giuseppe Maxia

AddThis Social Bookmark Button

This documentation is OBSOLETE.
Please refer to https://launchpad.net/mysql-sandbox for up-to date material.

Installing a side instance of MySQL for testing purpose is a task that many administrators can perform without breaking a sweat. If you need to do that only once in a while, you need just to read the manual carefully, or to have some experience in this matter, and the task is accomplished quite easily.

If, however, your skills are below the Guru level, even to get this task done just once you may find yourself in trouble. And, let’s face it, even experienced administrators, when they need to do this several times, with different versions of MySQL, may have trouble doing it right. It would be nice to have a tool that takes care of the dirty details for you and gets the job done quietly, without interfering with existing installations, and without side effects.

Such a tool exists, it’s The MySQL Sandbox. It is a framework for testing features under any version of MySQL from 3.23 to 5.1. Whitout fuss, it will install one server under your home directory, and it will provide some useful commands to start and stop it, and to use it within the sandbox.

There are many reasons for installing a side server. One is testing a potentially dangerous application, and you don’t want to try it on a production server. Another reason is to try different versions of MySQL on a piece of code when hunting a bug. Or you are a consultant, your customers are all using different versions of the DBMS, and you need to test your procedures in an environment that is as close as possible to the your clients are using. I don’t know about you, but in my job I have all the above needs, sometimes all at once.

Giuseppe Maxia

AddThis Social Bookmark Button

One of the recent MySQL enhancements was the INFORMATION_SCHEMA data dictionary, which gives a more coherent view on the DBMS meta-data.

Useful as it is, the information schema is becoming widely used, and many DBAs are learning how to use it as an administration and debugging tool. So useful, in fact, that it would be convenient, from time to time, to export its contents and pass it to a colleague to have a second opinion when a problem arises.

Unfortunately, there is no built-in tool to achieve this result. The standard mysqldump utility will skip the information schema on purpose. Since they are tables that are generated on-the-fly at server start-up, dumping them would not make sense to the normal user. However, the DBA is not a normal user, and such a dump would be highly desirable.

Here is a quick workaround that will produce a dump suitable for transferring the data to a new database.

Stéphane Faroult

AddThis Social Bookmark Button

All right, in the natural vs surrogate key religious war, I rather feel on the natural key side (I have nothing against using a surrogate key as a shorthand for a complex primary key, but I have seen too many people adding to a table a system-incremented column and calling it primary key without other analysis).
But I must admit that counters, whether you call them identity or auto-increment columns, or simply sequences, are extremely valuable for relating the techy’s to the bean counter’s vision of activity. In most applications, there is at least one such counter that represents fairly acurately the business activity - whether it is an increasing order number, invoice number, transaction identifier or whatever. Rather than execute complex queries with conditions on date columns to collect those so precious “key performance indicators”, it may be easier to check the data dictionary, look for sequences or identity columns, and take a daily (or anything) snapshot of the current highest values, together with statistics about the number of executed statements and whatever your DBMS has to offer.

chromatic

AddThis Social Bookmark Button

Suppose you’re a college student performing a research project. Part of that research is a survey with several types of questions. To make your life easier, you’ve decided to create a small web application to ask the questions and record the answers — this also gives you a SQL database you can query to analyze the results.

(No, I’m not a college student — a friend is and the question stumped me.)

Stéphane Faroult

AddThis Social Bookmark Button

I have been recently involved in trying to improve the performance of an application that primarily handles messages (more and more of them). It is a kind of state machine. Three different types of entities are managed (one of the types happens to be sets of another type), and messages are received that tell what is the most recent status of each entity. So far so good, except that everything revolves around the current status of each entity, and that the main bottleneck seems to be queries that hit repeatedly a table MESSAGE_HISTORY(msg_id, entity1_id, entity2_id, entity3_id, status, changed_by, timestamp), finding out which is the most recent status for a given entity1_id, entity2_id or entity3_id (as you may have guessed, only one of those contains a value in each row), associated with the greatest msg_id for the enity in question.
The solution that looked best was what was presented to me as “denormalizing”, and associating its current status to each and every entity.