O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Discuss Subscribe to Databases Subscribe to Newsletters

Data Protection for LAMP Applications
Pages: 1, 2, 3

Consistent Backups from PostgreSQL

PostgreSQL provides multi-version concurrency control. Multi-version concurrency control makes it easier to create database consistent backups. Snapshots are also possible; this is equivalent to recovering from a PostgreSQL server crash.



Logical Full Backups

Two tools--pg_dump and pg_dumpall--create logical backups of the database into a backup file. To avoid file size restrictions due to filesystem limits, I sometimes send the output of these commands to the Unix split command. These tools can run on any machine (not just on the machine running PostgreSQL postmaster server). For example, here's how to dump a Postgres database to multiple files with names starting with pg_backup:

# pg_dump postgresdb | split -b 1024m - pg_backup

Raw (Physical) Full Backups: Continuous Archiving

PostgreSQL 8.1 keeps track of all changes made to the database data files in Write-Ahead Logs (WALs) under the pg_xlog subdirectory of the cluster data directory. You can archive these WALs in a different directory before the database reuses them by copying them to a directory backed up by a network backup tool.

To copy the logs to the /backup/postgres/WAL directory, add a following shell command to your postgresql.conf file:

archive_command = 'cp -i %p /backup/postgres/WAL/%f < /dev/null'

Note: It is necessary to backup configuration files such as postgresql.conf, pg_hba.conf, and pg_ident.conf separately. To do a full raw online backup, as the database superuser:

  1. Execute the SQL command SELECT pg_start_backup('backup_id1'); The backup information (backup label, time of command execution, name of first WAL segment file) will be in the the /backup/postgres/backup_id1 file.
  2. Perform a backup of the filesystem backup of the database data directory (/usr/local/pgsql/data). Exclude the WAL files under the pg_xlog subdirectory. The archive command will back up these files.
  3. Execute the SQL command to inform that backup is complete: SELECT pg_stop_backup(); Stopping the backup creates a backup history file in the WAL archive location. The starting WAL file is used as part of the backup history file.

After executing the stop command, the WAL archival operation will be complete. Now you can back up all files under the WAL archival and the database data directory using a network backup tool.

Incremental Backups

All automatically archived WAL logs are usable as incremental backups. Use the network backup tool to perform incremental backups of your WAL archive directory.

Network-based Backup and Recovery

After getting a consistent data copy using replication or snapshots, you can use network-based backup and recovery tools to backup the data into various media, such as tapes, disks, optical devices, and NAS appliances.

Amanda is the most popular open source backup and archiving tool. Amanda is an active SourceForge project with tens of thousands of users and hundreds of developers. It has been part of the public domain since 1991 and is available as part of various Linux and BSD distributions. Latest packages are available from the Amanda downloads page. As of this writing, Amanda 2.5 is the latest stable release.

Amanda can backup multiple servers running the LAMP applications in the same backup run or in different backup runs. You can run an Amanda server on one of the LAMP application servers or on a dedicated Amanda backup server. Amanda will be active only during a backup run and will be idle at other times.

One of the unique features of Amanda is the consistent backup window. Amanda attempts to backup the same amount of data from the various backup clients running LAMP applications in each backup run. Amanda will distribute the full backup for each client (actually each backup unit) over the backup cycle days to achieve same amount of backup size each run. For system administrators it provides a consistent backup window for each backup run without having to tune the backup configuration. Figure 2 shows three client filesystems backed up daily by Amanda. In the first backup cycle, Amanda performs a full backup for all filesystems. Amanda, based on the amount of backup daily media available and amount of data changes in each client filesystem, constructs a backup plan of backup levels (full, incremental backup levels 1-9) for each run. A full backup is backup level 0. All filesystems have at least a full backup done every backup cycle days (Amanda configuration parameter). As the figure shows, the amount of data backed up is almost the same for each run resulting in consistent backup windows.

amount of data backed up each daily run
Figure 2: The amount of data backed up for each daily run

Amanda uses platform tools for backup, data compression, and encryption. It uses tools such as GNU tar, filesystem dump, and Schily tar for backup and can use any tool that is available in the client. Amanda uses the tool's data format on the backup media. This feature allows the possibility of restoring data without using Amanda. All Amanda backup media is readable using dd and mt commands. In fact, the Amanda media file header has the command to restore the media as a text string.

Here's how to use dd to display Amanda tape file header:

# dd if=<amanda_media_dev> bs=32k count=1
AMANDA: FILE 20060228 natasha /boot  lev 1 comp N program /bin/gtar

To restore this backup, position the tape at the start of the file and run:

dd if=<tape> bs=32k skip=1 |   /bin/gtar -f...  -
1+0 records in
1+0 records out

Amanda can backup to tapes, disks, optical media, media changers, and RAIT. RAIT stands for redundant array of tapes--in fact, it works for any media volume. In concept it is similar to RAID and works by striping the backup data across multiple media volumes and the parity information in another media volume. RAIT with a two-volume set is same as mirroring backup data across two media volumes, e.g., you can backup to disk (for immediate recovery needs) and to tape (for archival needs) simultaneously.

Amanda allows a lot of configuration flexibility in specifying how to perform backups, even down to the individual file granularity level. Users can specify the type of compression and encryption, whether to perform them on the Amanda client or server, and which network to be used for backups for each backup unit.

To reduce the complexity introduced by the configuration flexibility, configuration tools are available for first-time Amanda users. Amanda supports both data consistency mechanisms for LAMP application backup--snapshots as well as replication. Amanda server is scalable and can backup hundreds of LAMP application servers in each backup run. The commands to prepare the data for consistent backups can be done as a pre-backup action before the Amanda backup. Because Amanda can use any platform command for backing up applications using Amanda, it is easy to use different data protection mechanisms for different LAMP application servers in the same Amanda configuration.

Amanda developers are working on an application API that will make addition of new backup programs for applications easier. Some Amanda users backup LAMP application data, including MySQL databases, to a temporary directory in a filesystem. Later, Amanda's backup run grabs the filesystem directory. This method allows quicker recovery from the temporary filesystem and recovery from Amanda media for longer term needs. Amanda users and developers use the Amanda wiki for documenting the project and incorporating Amanda into the overall IT processes.

Recovery Process

To recover data from the backup archive, use the Amanda recovery tools: amrecover and amrestore. The amrecover tool allows users to browse the index database and choose the backups to restore. Any machine with the Amanda client software installed can use this tool. It is always advisable to restore backups to a temporary directory location. Application files can be restored to the correct location on the LAMP application server as well as MySQL and PostgreSQL databases from mysqldump and pg_dump backup files, respectively. Remember that you should perform full a backup restoration only when the database and application process are not running.

Here's how to perform a complete restoration for database database1 using mysqldump and the backup file mysql_backup:

# mysql database1 < mysql_backup

Here's how to perform a complete restoration of a Postgres database from pg_dump using backup files starting with pg_backup:

# cat pg_backup.* | psql postgresdb

To restore a PostgreSQL database from the raw backup files, recreate the PostgreSQL cluster data directory from the restored files. It is advisable to make copy of the existing contents before restoring files from the backup. The pg_xlog sub-directory can contain WAL that were not archived. The recovery process uses the recovery.conf file in the cluster data directory. This file contains the location of the archived WAL logs. Here is an example recovery.conf file:

restore_command = 'cp /backup/postgres/WAL/%f %p'

Incremental Restores

You can perform incremental restores from the MySQL binary logfiles to the temporary directory by Amanda. The incremental restore can take place at a start time to an end time or from a starting log position to ending log position in the binary logs. The log position can be used to fix operator errors using the backup files.

Here's an incremental restore of all MySQL database changes performed up to May 1, 2006:

# mysqlbinlog --stop-date="2006-05-01 12:00:00" backup-logs.[0-9]* |
     mysql -u <user name> -p <password>

Perform PostgreSQL incremental restores by using settings in recovery.conf in the cluster data directory. It is possible to specify end timestamps for recovery or end transaction ID for recovery. When the postmaster process restarts, it will use the restore_command setting to retrieve the archived WAL segments up to the specified stop point--either the date/time stamp or the specific transaction ID.

It is critical to test the data recovery procedure for LAMP applications on a regular basis.

Backup Security

Security is important for any data protection process. The data must be secure in the backup media, which often means using encryption. You should track the keys and keep them in a secure location also under backup. Amanda's flexibility in encrypting data on the backup client where LAMP applications are running or on the Amanda server provides more options to secure the data. Use secure communication during the backup with tools such as OpenSSH or through network tunneling.

The MySQL and PostgreSQL database backup and restore users should have only the minimum privileges required to back up and restore the database. For example, the minimum privileges for a backup user for the mysqlhotcopy and mysqldump tools are SELECT, RELOAD, and LOCK TABLES.

Conclusions

Several open source data protection tools are available for LAMP applications. These tools can help you create secure backups for LAMP applications. Increased use of LAMP technologies for web applications, data warehousing applications, and enterprise applications make it necessary to protect the customer data used by these applications.

The many new and exciting open source developments in the data protection field, such as creating backups while maintaining database referential integrity and using storage available over the internet as backup media, allow the creation of just the right LAMP backup solution for an individual site's needs.

Resources

  • MySQL backup tools
  • MySQL backup forums
  • PostgreSQL backup tools
  • Amanda forums

Paddy Sreenivasan is the Vice President of Engineering and a co-founder of Zmanda, Inc.


Return to O'Reilly Databases


Do you have any other backup and data protection tips?
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment


Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
June 2009
$9.99 USD

New Features in Oracle 9i New Features in Oracle 9i
by Howard J. Rogers
June 2009
$5.95 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com