Data Protection for LAMP Applicationsby Paddy Sreenivasan
The number of web and enterprise applications based on the LAMP application stack is growing. LAMP applications use Linux or BSD as the operating system, Apache as the web server, MySQL or PostgreSQL as the application database, and PHP, Perl, or Python as the programming language. Other applications use JBoss as the middleware. These layers have the following common attributes:
- They are open source and have a large developer and user community behind them.
- They are available on multiple hardware platforms and are very well integrated.
An often overlooked aspect in the LAMP application solution is the protection of the application and configuration data. This article examines how to use available open source tools to protect the LAMP application data. The security aspects of the application data and securing the LAMP application servers is beyond the scope of this article. It is also important to test the data recovery scenarios before the actual need arises.
LAMP Application Data Characteristics
From a data protection perspective, one of the key characteristics of LAMP application and configuration data is that they reside on filesystems or in MySQL or PostgreSQL databases. The Apache configuration is usually in a directory such as /etc/httpd/ directory, and PHP configuration files are often in /etc/php/. Both MySQL and PostgreSQL data, index, and other configuration files are also commonly stored in filesystems. Another characteristic is that LAMP applications may be distributed on multiple servers. Distributed LAMP application servers provide more scalability and availability for the applications. Enterprise and web applications may require high levels of availability, with very little downtime available for backups.
These characteristics of LAMP applications data (database and filesystem storage, as well as cross-server distribution) can make regular backups challenging. The system administrator needs to pay attention to the data in all layers of the LAMP application in order to get a consistent data backup for the whole LAMP application stack. Figure 1 shows a sample application and its data.
Figure 1. A LAMP application and its data
The type of data protection solution implemented depends on the impact on application performance, application availability, type of failures to recover from, and the cost of implementing the solution. One of the key requirements for backed up data is the level of data consistency required. There are two levels of data consistency:
- Application consistency: The backed up data is completely consistent. That is, the application does not need to do any internal recovery after restoring backed up data.
- Crash consistency: The backed up data is equivalent to the data in the persistent storage when the application fails or aborts in a unexpected manner. The application (or other components such as filesystem) will have to do internal recovery after a restoration.
Very few applications support the capability of temporarily quiescing the application I/O at a consistent point in time to allow for backups. Database servers provide such capability.
Backup recovery often takes place to recover from operator errors, disk/filesystem failures, application data corruption due to bugs, or complete application host failures. The data protection solution you choose depends on the type of failures from which you want to recover and how quickly you need to recover your data.
Consistent MySQL Backups
MySQL provides two tools for backing up the databases. These tools are open source and are available as part of the MySQL distribution. The
mysqldump tool creates a logical backup of the database whereas
mysqlhotcopy tool creates a physical (or raw) backup of the database. The logical backup archive contains the MySQL statements to reconstruct the database data, whereas the physical backup archive contains database data, indexes, and logs in raw format (database specific format). Neither utility maintains database referential integrity; they do not back up all of the tables at the same time or same consistency point.
The tool used to backup MySQL depends on the storage engines used by the tables in the database. Transaction-capable storage engines, including InnoDB, BDB, Solid, and MySQL's Falcon (in the future), support transactional backups without any effect on the database application's availability. These backups are usually referred to as "hot" or "live" backups. Storage engines that do not support transactions, such as MyISAM, ARCHIVE, and MERGE, require obtaining a global read lock on the database and flushing active database data and logs to the disk.
Logical Full Backups with
Logical backup consists of SQL statements (e.g.,
INSERT INTO) that can re-create the complete database or a table in the database. The
mysqldump tool creates logical backups of MySQL database or tables for all the storage engines. The necessary command-line options depend on whether the underlying storage engine supports transactions:
mysqldump [options] db_name [tables] mysqldump [options] --databases db_name1 [db_name2 db_name3...] mysqldump [options] --all-databases
mysqldump options are
--extended-insert. They generate optimized SQL statements. These options will speed up the restore operation and create smaller backup files. The
--opt option is necessary to create correct DDL statements (
--flush-logs lock all tables in the database and flush the data. All pending transactions are committed. These options are necessary for non-transactional storage engines such as
MERGE. Obtaining a read lock on all tables in an active database during the backup may not be feasible.
If you are using snapshot or replication data consistency mechanisms, it may not be necessary to lock all tables and flush logs.
--single-transaction can create consistent backups for transactional storage engines such as InnoDB and BDB. The
lock-all-tables option or the
single-transaction option should be used.
--master-data=2 writes the binary logfile name and position to the backup output as an SQL statement. It can help in point-in-time recovery.
For example, to back up all databases containing only MyISAM tables:
mysqldump --opt --extended-insert --lock-all-tables --flush-logs --routines --triggers --master-data=2 --all-databases
One of the advantages of the logical backup is that you can restore the data to another database server. Logical backup is independent of the server architecture and filesystem type. It is also possible to correct operator errors (such as erroneous
DROP TABLE statements) from logical backups. This is significant advantage compared to raw backups.
You can also run the
mysqldump tool on a different machine from the MySQL server. The disadvantage of logical backups is the backup file size can be larger than the actual database (sometimes twice the size). The restoration time from the logical backup can also be significant; all SQL statements in the logical backup have to be re-run during the restore process.