Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples

Oracle SQL*Loader: The Definitive Guide

SQL*Loader: A Case Study in Tuning

by Stephen Andert, technical reviewer for Oracle SQL*Loader: The Definitive Guide

I am responsible for monitoring a handful of pipelines that load data from other systems into my employer's Oracle database. This data comes in via flat files, which we load into Oracle using SQL*Loader. When I was offered the opportunity to review the upcoming book Oracle SQL*Loader: The Definitive Guide, by Jonathan Gennick and Sanjay Mishra, I jumped at the chance to do so, hoping to learn more about SQL*Loader in the process. I found the whole book to be very informative, and although some areas were geared toward the novice user, most of the book contained valuable information for all levels of experience.

Related Reading

Oracle SQL*Loader: The Definitive Guide
By Jonathan Gennick, Sanjay Mishra

In my experience, people do not usually run loads using SQL*Loader's default settings. Rather, an attempt is usually made to modify one or more of the default settings in an effort to improve performance. Once performance becomes tolerable, further tuning is usually abandoned in favor of other more pressing or exciting tasks. In our case, because we were in the middle of a large, mission-critical migration from another database vendor, we made no adjustments to the SQL*Loader jobs that I manage after we initially set them up. After reading Sanjay's performance tuning chapter in the book, however, I was inspired to have another go at tuning these jobs. Just how good, I wondered, was Sanjay's advice?

A common first attempt at tuning a SQL*Loader job is to approach the process with the understanding that increasing the number of rows that get loaded at one time will make the load run more efficiently. A look at the Oracle documentation reveals the ROWS parameter which appears to allow you to increase the number of rows that get loaded at one time. Increasing the value for ROWS in order to increase performance represents fairly sound logic to a point, but in the case of SQL*Loader there are two factors that you need to consider when you adjust this parameter:

The number of rows that can be loaded at one time is dependent on the size of the bind array. To make things somewhat confusing, the size of the bind array is somewhat dependent on the number of rows being loaded at one time. Not only can you use the ROWS parameter to specify the number of rows to load at one time, but you can also use the BINDSIZE parameter to specify the size of the bind array directly. Regardless of whether you specify values for these parameters, or use their default values, SQL*Loader calculates the size of the bind array as follows:

mem = ROWS * size of a single row

Size of bind array = lesser of (BINDSIZE, mem)

A common first attempt at tuning a SQL*Loader job is to approach the process with the understanding that increasing the number of rows that get loaded at one time will make the load run more efficiently.

This calculation will cause an insufficient BINDSIZE setting to become a bottleneck on an otherwise good ROWS setting. For example, if you set the ROWS parameter to 5000 and BINDSIZE to 128000, but the calculated bind array size for 5000 rows is higher than the BINDSIZE setting, SQL*Loader will end up setting the bind array size to a value less than or equal to the BINDSIZE setting. Consequently, it's important to try and match the memory requirements derived from the ROWS setting with the BINDSIZE value.

The other factor to consider when increasing the ROWS setting is its effect on rollback segment usage. If you are running loads at the same time as other batch jobs, or when there is a high volume of user activity, you may get into trouble by running out of space in your rollback segments. You may also have the opportunity to experience the joy of the ORA-01555, snapshot too old: rollback segment too small error. I'll talk more about sizing rollback segments later in this article.

In my attempt to maximize load performance, I attempted to load up to 5000 rows at a time. In order to use the formula above, I needed to determine the row width for the data that I was loading. To do that, I ran a test load on a null data set. As per the book, I used a control file with the following characteristics:

This test load did not actually load any data-because there was none to load - but it did generate a SQL*Loader log file, and SQL*Loader records the size of the bind array in the log file. Because the bind array in this case represented only one row, I was able to determine by reading the log file that my row width was 2056 bytes. I then created the spreadsheet shown in Figure 1, which allowed me to determine the actual memory requirements given various values for the BINDSIZE and ROWS parameters. Armed with this information, I could make informed choices regarding the values to use for these parameters.

Figure 1. SQL*Loader memory requirements
Figure 1. SQL*Loader memory requirements

Lines 9-11 in the spreadsheet shown in Figure 1 provide an estimate of the required rollback segment space for each load scenario. The rule of thumb that I used was to estimate the rollback segment requirement at 30 percent more than the size of the bind array that will be used. By planning your load and estimating the rollback usage as described in the book, you can avoid the ORA-01555 error. Due to our current configuration, I was able to determine that there would be no negative impact on rollback segments for any of the three scenarios shown in Figure 1.

Armed with the data from my spreadsheet, I took a subset of data from a load that we do weekly, and using a non-indexed test table, performed the same load using different ROWS and BINDSIZE values to assess the difference in load times. In between each load, I truncated the destination table and renamed the logfile. The results are shown in Table 1. Test #1 was my first test run, for which I used the Oracle default settings for ROWS and BINDSIZE. These defaults were 64 and 65,536 respectively. Table 1 shows 31 rows for test #1, because that's all the 2056-byte rows that the default bind array size will accommodate. Test #2 was my second run, and for that I used 5000 and 1,310,720 bytes for ROWS and BINDSIZE respectively.

Finally in Test #3, I specified 5000 ROWS and a 5,242,880 byte BINDSIZE. The values that you see in Table 1 for ROWS and BINDSIZE are the values actually used by SQL*Loader after going through the calculation described earlier.

  TEST #1
Default Settings
BINDSIZE used 63,984 bytes 1,310,640 bytes 5,242,560 bytes
ROWS used 31 635 2540
Elapsed time (min:sec) 32:07.74 04:02.20 02:28.37
Reduction (min:sec)
From Test #1
  00:28:05.54 00:29:39.37
Reduction (%)
From Test #1
  87% 92%
Table 1. SQL*Loader performance test results

Looking at Table 1 in conjunction with the planning worksheet from Figure 1, you can see that in all three cases the BINDSIZE values actually used are less than the BINDSIZE that was specified. The situation is similar for the ROWS values. This is because my BINDSIZE settings always limited the number of rows that could fit in the bind array. The BINDSIZE value actually used represents the number of rows that would fit within the BINDSIZE that I specified. Another notable item in Table 1 is the inverse relationship between ROWS used for a load and the elapsed time for a load. That is to say, the more rows that were used in each array, the less time the whole load took. This supports the theory I stated earlier: "The more rows you load at one time, the better your performance will be."

Notice the huge drop in elapsed time between the first and third loads. To load 192,671 rows with the SQL*Loader default settings took over 32 minutes. When I increased the BINDSIZE and ROWS values to generate a bind array size of 5,242,560 bytes, the load time dropped to two minutes and 28 seconds. When you consider that the entire table from which this sample is taken contains around 55 million records, and that we completely reload that table on a weekly basis, you can see that this time savings is very significant indeed. This magnitude of time savings makes for a much more efficient process, and therefore less work for me, the DBA.

As you can see, the logic of loading as many records at a time as your machine, operating system, and database are capable of handling seems to be a reasonable path to follow. One recommendation that the authors provide is to aim for between 1,000 and 5,000 rows. Do keep in mind however, the two factors that I described in this article: the BINDSIZE setting can be a bottleneck on the ROWS value, and the size of the bind array directly affects the amount of rollback segment space required for a load.

Another option for speeding up SQL*Loader processes involves using the DIRECT parameter to specify that a direct-path load be performed. Because our requirements require the use of SQL functions such as RTRIM() and TO_DATE() to massage the data being loaded, I am unable to take advantage of the direct path load feature. However, I encourage you to consider this if your circumstances will permit it.

Jonathan and Sanjay's book gave me a better understanding of the whole SQL*Loader process, which I've found to be very helpful when troubleshooting and tuning. As far as I'm concerned, the chapter with tips on tuning alone is well worth the price of the book. By applying a couple of formulas found in that chapter, I was able to reduce the run time of our daily pipeline jobs by as much as to 60 percent in some cases, so Sanjay's advice actually turned out to be quite good.

Stephen Andert is a database administrator for First Health Group Corporation, and has ten years of experience working with database technologies. He was a technical reviewer for Oracle SQL*Loader: The Definitive Guide and Oracle Net8 Configuration and Troubleshooting, both from O'Reilly & Associates. He can be reached by email at sandert@wiredglobal.com.

O'Reilly & Associates will soon release (April 2001) Oracle SQL*Loader: The Definitive Guide.

Return to: oracle.oreilly.com

Copyright © 2009 O'Reilly Media, Inc.