Pitfalls of Transactions with PHPby Kimberlee Jensen
Transactions -- a database management system (DBMS) feature that enables data recovery on internal errors -- raise significant controversy among developers. Some view transactions as a crutch, while others view them as a necessity.
Using transactions can save time and money, because transactions enable databases to restore themselves. But inappropriate use of transactions means wasted overhead and impeded performance. And transactions do not replace programming; sadly, some developers have foregone error trapping and data validation, thinking that transactions will catch all errors.
Transaction support in a DBMS or its table types is defined using the ACID test:
- Atomicity is an all-or-none proposition.
- Consistency guarantees that a transaction never leaves your database in a half-finished state.
- Isolation keeps transactions separated until they're finished.
- Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.
Initially, the MySQL DBMS did not support transactions. As of version 3.23-max, MySQL supports transactions with two additional table types, InnoDB and BDB. InnoDB is recommended for its CPU efficiency and support of row-level and multiple-concurrency locking. Now that MySQL has full transaction support, PHP developers need to incorporate them effectively. Although PHP 4.x lacks native transaction functions, transaction statements can be used the same way as standard SQL queries.
When to Use Transactions
To maximize the use of transactions, developers should follow these simple guidelines:
Use transactions for mission-critical actions that must all complete. Completing an online purchase is mission-critical. Browsing a list of top sellers is not.
Use transactions for data subject to race conditions. Even though the database server and operating system are multi-threaded and can process multiple queries concurrently, the specific group of queries must be processed in order and cannot be processed independently out of order.
Use transactions in conjunction with, not as a replacement for data validation and programmatic error trapping.
By supporting transactions, the DBMS provides an extreme level of reliability and drastically reduces the need for manual recovery and data integrity validation. However, any transaction, by definition, adds overhead because at some level, the DBMS must lock data and take a snapshot of the data at the outset. So transactions must be used appropriately.
Transactions Versus Errors
Developers can sometimes misunderstand the definition of a transaction
failure. The transaction will fail when the database engine itself has an error
that causes a query or queries not to complete. The most common misconception
is that 0 rows returned or affected are considered a failure. For example, if a
DELETE FROM tablex WHERE fieldy = 10 returns 0 affected
rows, that is considered a successful query according to the transaction
definition because the query completed and returned its affected rows. Whether
the number of affected rows matches the expected result is a matter for
programming, not transactions.
Some developers believe transactions mean the elimination of programming, even using transactions in lieu of programmatic data validation. This is wrong: transactions are intended to recover errors made internally by the database engine or externally by extreme measures such as a power outage. Transactions were never intended to account for user errors.
Web developers must trap errors because transactions do not prevent data from being lost between the database and the web application. Theoretically, a transaction could complete successfully, but the results of the transaction could fail to reach the web application. If the web application connects to the database server through a local host connection (using the UNIX socket), this is generally not an issue unless the socket becomes corrupted or disabled. However, if the web application connects to the database via the TCP/IP socket, a packet can be dropped in transmission. From the database standpoint, the transaction has succeeded, but if the web application has lost data, it has failed.
For example, the database could execute a
successfully and return a result set, but if those data rows are lost in the
transfer to the web application, the application would show 0 as the number
of rows. This is why before looping through a result set, a developer always
needs to verify that
mysql_num_rows(result identifier) is non-zero
to make sure the web application has received the data successfully.
Assuming transactions cover all possible failures is a mistake. Programmers still need to program!
As well, database developers still need to plan. Before the programmers
begin to code the web application, the database developers must make informed
decisions about when to use a transaction-safe table. MyISAM table types
SELECT queries much faster than InnoDB types.
According to recent PHP Oracle/MySQL benchmarks, InnoDB table types handle more inserts per second than BDB table types.
For this example we will use InnoDB for our transaction-essential tables and MyISAM for tables not requiring transactions.
An Illustrated Example
Let's model a fictional e-commerce site that sells MP3 files. The database will need a table to store customer contact information and another table to represent each individual MP3 file the customer is purchasing. We will also assume the use of temporary data to hold the user's shopping cart information. We will then transfer the information to permanent tables upon checkout. Any shopping cart information that is not transferred to the permanent tables can be purged. We will use temporary tables instead of sessions to hold the shopping cart data because we want to keep statistics on user behavior, such as placing items in a shopping cart but not completing the transaction. (Once our fictional web site is as successful as Amazon, we may switch to sessions that hold shopping cart data).
The tables involved with the completion of an order will include:
Here are two of the proposed tables in the model:
CREATE TABLE order_detail ( detail_ID INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, order_ID INT UNSIGNED, mp3_ID INT UNSIGNED)
CREATE TABLE orders ( order_ID INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, customer_ID INT UNSIGNED, order_date TIMESTAMP(8), total FLOAT (6, 2), temp_order_ID INT UNSIGNED)
Transactions in the Example
The pseudo-code for what transpires during checkout reads as such:
- Insert customer data into the
- Transfer data from the temporary order tables to the permanent order tables.
- Update the permanent order tables to reflect their relationships to the
- Delete data from the temporary order tables (if necessary).
This process exemplifies a transaction, as all stages contribute to a single goal — the completion of a sale. If the database crashes during a single step of this process, we immediately want to recover and restore the information. For example, if the customer information fails to load, we want to revert the database state back to the previous state, preserving the shopping cart data in the temporary order tables.
These queries must execute in the specified order. If the deletion of data
from the temporary tables executes before the data is transferred to the
permanent tables, the operation will fail. Using transaction-safe tables will
prevent that problem. To ensure that these tables use the InnoDB handler, the
developer can use the
ALTER statement to change the table type
after creation, or tack the table type on at the end of the
statement. The latter is preferred.
CREATE TABLE orders ( order_ID INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, customer_ID INT UNSIGNED, order_date TIMESTAMP(8), total FLOAT (6, 2), temp_order_ID INT UNSIGNED) Type=InnoDB
One database can use numerous table types. Assuming we are selling MP3 files
legitimately for download, we can assume a table to hold the name, description,
and price of each song. We will also assume separate linking tables to link
songs to categories (assuming a many-to-many relationship), songs to artists,
songs to CDs, and songs to record labels. These tables will be searched
frequently. The process of adding to the inventory is done in batch format
when traffic is slow. Also, since these are MP3 files, the purchasing of a song
does not affect inventory; adding a song to a shopping cart does not impact the
song table whatsoever. Therefore, these tables should be MyISAM, which
SELECT queries the fastest. This the default type in
MySQL as of version 3.23.
The programming logic can catch any SQL queries that fail or return no
results. To make these
SELECT queries that search the inventory
part of a transaction would be a waste of overhead.
The best strategy for working with transactions is to apply the try/catch programming strategy from Java. Let's examine a model of troubleshooting with PHP.
Figure 1. Web application coding model.
Pages: 1, 2