SQL*Loader: A Case Study in Tuningby Stephen Andert, technical reviewer for Oracle SQL*Loader: The Definitive Guide
I am responsible for monitoring a handful of pipelines that load data from other systems into my employer's Oracle database. This data comes in via flat files, which we load into Oracle using SQL*Loader. When I was offered the opportunity to review the upcoming book Oracle SQL*Loader: The Definitive Guide, by Jonathan Gennick and Sanjay Mishra, I jumped at the chance to do so, hoping to learn more about SQL*Loader in the process. I found the whole book to be very informative, and although some areas were geared toward the novice user, most of the book contained valuable information for all levels of experience.
In my experience, people do not usually run loads using SQL*Loader's default settings. Rather, an attempt is usually made to modify one or more of the default settings in an effort to improve performance. Once performance becomes tolerable, further tuning is usually abandoned in favor of other more pressing or exciting tasks. In our case, because we were in the middle of a large, mission-critical migration from another database vendor, we made no adjustments to the SQL*Loader jobs that I manage after we initially set them up. After reading Sanjay's performance tuning chapter in the book, however, I was inspired to have another go at tuning these jobs. Just how good, I wondered, was Sanjay's advice?
A common first attempt at tuning a SQL*Loader job is to approach the process with the understanding that increasing the number of rows that get loaded at one time will make the load run more efficiently. A look at the Oracle documentation reveals the ROWS parameter which appears to allow you to increase the number of rows that get loaded at one time. Increasing the value for ROWS in order to increase performance represents fairly sound logic to a point, but in the case of SQL*Loader there are two factors that you need to consider when you adjust this parameter:
The calculation that Oracle uses to determine how much memory is allocated to the bind array used for the loading process involves not only the ROWS parameter, but also the BINDSIZE parameter.
Increasing the number of rows loaded at one time has an impact on rollback segment usage.
The number of rows that can be loaded at one time is dependent on the size of the bind array. To make things somewhat confusing, the size of the bind array is somewhat dependent on the number of rows being loaded at one time. Not only can you use the ROWS parameter to specify the number of rows to load at one time, but you can also use the BINDSIZE parameter to specify the size of the bind array directly. Regardless of whether you specify values for these parameters, or use their default values, SQL*Loader calculates the size of the bind array as follows:
mem = ROWS * size of a single row
Size of bind array = lesser of (BINDSIZE, mem)
A common first attempt at tuning a SQL*Loader job is to approach the process with the understanding that increasing the number of rows that get loaded at one time will make the load run more efficiently.
This calculation will cause an insufficient BINDSIZE setting to become a bottleneck on an otherwise good ROWS setting. For example, if you set the ROWS parameter to 5000 and BINDSIZE to 128000, but the calculated bind array size for 5000 rows is higher than the BINDSIZE setting, SQL*Loader will end up setting the bind array size to a value less than or equal to the BINDSIZE setting. Consequently, it's important to try and match the memory requirements derived from the ROWS setting with the BINDSIZE value.
The other factor to consider when increasing the ROWS setting is its effect on rollback segment usage. If you are running loads at the same time as other batch jobs, or when there is a high volume of user activity, you may get into trouble by running out of space in your rollback segments. You may also have the opportunity to experience the joy of the ORA-01555, snapshot too old: rollback segment too small error. I'll talk more about sizing rollback segments later in this article.
In my attempt to maximize load performance, I attempted to load up to 5000 rows at a time. In order to use the formula above, I needed to determine the row width for the data that I was loading. To do that, I ran a test load on a null data set. As per the book, I used a control file with the following characteristics:
I placed ROWS=1 in a control file OPTIONS clause to force SQL*Loader to create a bind array sized to fit exactly one row.
I used INFILE * to tell SQL*Loader to read data from the control file itself rather than from an external file.
I used a null data set. In other words, I had no data following the BEGINDATA clause in the control file.
This test load did not actually load any data-because there was none to load - but it did generate a SQL*Loader log file, and SQL*Loader records the size of the bind array in the log file. Because the bind array in this case represented only one row, I was able to determine by reading the log file that my row width was 2056 bytes. I then created the spreadsheet shown in Figure 1, which allowed me to determine the actual memory requirements given various values for the BINDSIZE and ROWS parameters. Armed with this information, I could make informed choices regarding the values to use for these parameters.
Figure 1. SQL*Loader memory requirements
Lines 9-11 in the spreadsheet shown in Figure 1 provide an estimate of the required rollback segment space for each load scenario. The rule of thumb that I used was to estimate the rollback segment requirement at 30 percent more than the size of the bind array that will be used. By planning your load and estimating the rollback usage as described in the book, you can avoid the ORA-01555 error. Due to our current configuration, I was able to determine that there would be no negative impact on rollback segments for any of the three scenarios shown in Figure 1.
Armed with the data from my spreadsheet, I took a subset of data from a load
that we do weekly, and using a non-indexed test table, performed the same load
BINDSIZE values to assess the
difference in load times. In between each load, I truncated the destination
table and renamed the logfile. The results are shown in Table 1. Test #1 was
my first test run, for which I used the Oracle default settings for
BINDSIZE. These defaults were 64 and 65,536
respectively. Table 1 shows 31 rows for test #1, because that's all the
2056-byte rows that the default bind array size will accommodate. Test #2 was
my second run, and for that I used 5000 and 1,310,720 bytes for
Finally in Test #3, I specified 5000
ROWS and a 5,242,880 byte
BINDSIZE. The values that you see in Table 1 for
BINDSIZE are the values actually used by SQL*Loader after going
through the calculation described earlier.
|TEST #2||TEST #3|
|BINDSIZE used||63,984 bytes||1,310,640 bytes||5,242,560 bytes|
|Elapsed time (min:sec)||32:07.74||04:02.20||02:28.37|
From Test #1
From Test #1
Looking at Table 1 in conjunction with the planning worksheet from Figure 1,
you can see that in all three cases the
BINDSIZE values actually used
are less than the
BINDSIZE that was specified. The situation is
similar for the
ROWS values. This is because my
settings always limited the number of rows that could fit in the bind array.
BINDSIZE value actually used represents the number of rows that
would fit within the
BINDSIZE that I specified. Another notable item
in Table 1 is the inverse relationship between
ROWS used for a load
and the elapsed time for a load. That is to say, the more rows that were used
in each array, the less time the whole load took. This supports the theory I
stated earlier: "The more rows you load at one time, the better your
performance will be."
Notice the huge drop in elapsed time between the first and third loads. To
load 192,671 rows with the SQL*Loader default settings took over 32 minutes.
When I increased the
ROWS values to generate a
bind array size of 5,242,560 bytes, the load time dropped to two minutes and
28 seconds. When you consider that the entire table from which this sample is
taken contains around 55 million records, and that we completely reload that
table on a weekly basis, you can see that this time savings is very significant
indeed. This magnitude of time savings makes for a much more efficient process,
and therefore less work for me, the DBA.
As you can see, the logic of loading as many records at a time as your machine,
operating system, and database are capable of handling seems to be a
reasonable path to follow. One recommendation that the authors provide is to
aim for between 1,000 and 5,000 rows. Do keep in mind however, the two factors
that I described in this article: the
BINDSIZE setting can be a
bottleneck on the
ROWS value, and the size of the bind array directly
affects the amount of rollback segment space required for a load.
Another option for speeding up SQL*Loader processes involves using the
DIRECT parameter to specify that a direct-path load be performed.
Because our requirements require the use of SQL functions such as
TO_DATE() to massage the data being loaded, I am
unable to take advantage of the direct path load feature. However, I encourage
you to consider this if your circumstances will permit it.
Jonathan and Sanjay's book gave me a better understanding of the whole SQL*Loader process, which I've found to be very helpful when troubleshooting and tuning. As far as I'm concerned, the chapter with tips on tuning alone is well worth the price of the book. By applying a couple of formulas found in that chapter, I was able to reduce the run time of our daily pipeline jobs by as much as to 60 percent in some cases, so Sanjay's advice actually turned out to be quite good.
Stephen Andert is a database administrator for First Health Group Corporation, and has ten years of experience working with database technologies. He was a technical reviewer for Oracle SQL*Loader: The Definitive Guide and Oracle Net8 Configuration and Troubleshooting, both from O'Reilly & Associates. He can be reached by email at firstname.lastname@example.org.
O'Reilly & Associates will soon release (April 2001) Oracle SQL*Loader: The Definitive Guide.
Sample Chapter 1, Introduction to SQL*Loader, is available free online.
You can also look at the Full Description of the book.
For more information, or to order the book, click here.
Return to: oracle.oreilly.com