Oracle 9i Release 2 Developments for PL/SQL Collections
Pages: 1, 2, 3, 4
Direct Lookup in index-by-varchar2 Table
Of course, a yet more elaborate ambition pre-Oracle 9i Database Release 2 would be- after studying the relevant computer science textbooks-to implement a B*-tree structure in PL/SQL, horror of wheel re-invention notwithstanding!
The datastructure might look like this:
type Node_t is record (
value varchar2(20),
left_child binary_integer /* refer to the array... */,
right_child binary_integer /* ...index of the... */,
parent binary_integer /* ...relevant element. */ );
type Btree_t is table of Node_t index by binary_integer;
the_tree Btree_t;
However, the implementation would be very far from trivial, and is certainly too long and complex for inclusion in this article.
A far better approach-newly possible in Oracle 9i Database Release 2-is to use
precisely the B*-tree organization of the values but to do so implicitly via a language
feature, the index-by-varchar2 table.
The index-by-varchar2 table is optimized for efficiency of lookup on a non-numeric
key, where the notion of sparseness isn't really applicable. The index-by-*_integer table
(where now *_integer can be either pls_integer or binary_integer), in contrast, is optimized for compactness of storage on the assumption that the data is dense.
This implies that there might even be cases where, even though the key is inherently
numeric, it's better to represent it as an index-by-varchar2 table via a To_Char
conversion.
You can think of the index-by-varchar2 table as the in-memory PL/SQL version of the
schema-level index organized table. Using this approach, we can produce a simple,
intuitive package for both populating and looking up the French translations of English
words. See Example 5 and the following table for an explanation of this virtually
transparent code.
| Description | |
| 3-5 | Declare a VARCHAR2-indexed collection type, and an instance of that type.
In this case, each row of the collection contains a word in French, and the
index into that row is the word in English. |
| 7-12 | The lookup function simply returns the value found in the row for that English word. |
| 15-18 | The initialization section of the package deposits the French word in to the row indicated by the English word. |
Example 5. Using the VARCHAR2-indexed collection approach.
1 CREATE OR REPLACE PACKAGE BODY vocab4
2 IS
3 TYPE word_list IS TABLE OF translations.french%TYPE
4 INDEX BY translations.english%type;
5 g_english_french word_list;
6
7 FUNCTION lookup (p_english IN VARCHAR2)
8 RETURN VARCHAR2
9 IS
10 BEGIN
11 RETURN g_english_french (p_english);
12 END lookup;
13
14 BEGIN /* package initialization */
15 FOR j IN (SELECT english, french FROM translations)
16 LOOP
17 g_english_french (j.english) := j.french;
18 END LOOP;
19* END vocab4;
You can't really get much more straightforward than that! In fact, the body of function Lookup is now so trivial that you may decide to dispense with it altogether (suppressing a mild qualm about violating some rule of best practice, since you're exposing the collection in the package specification) and use this stripped-down implementation (see vocab5.sql):
CREATE OR REPLACE PACKAGE vocab5
IS
TYPE word_list IS
TABLE OF translations.french%TYPE
INDEX BY translations.english%type;
lookup word_list;
END vocab5;
/
CREATE OR REPLACE PACKAGE BODY vocab5
IS
BEGIN /* package initialization */
FOR indx IN (
SELECT english, french
FROM translations)
LOOP
lookup (indx.english) :=
indx.french;
END LOOP;
END vocab5;
/
Note that it's not yet possible to use an index-by-varchar2 table as the target or source
of a bulk-binding construct. We can't, in other words, take advantage of the BULK
COLLECT syntax to deposit all the rows of translations directly into the lookup
collection in a single roundtrip.
And the Winner Is...
Whew. OK. So we have now a total of five different implementations we can test for optimal performance:
- Database lookup (vocab1.sql)
- Linear search using
FORloop (vocab2.sql) - Hash index (vocab3.sql)
- Associative array (
VARCHAR2 index) via a function call (vocab4.sql) - Associative array (
VARCHAR2 index) via direct access (vocab5.sql)
To compare performance, we take advantage of the DBMS_UTILITY.GET_TIME
function, which helps us calculate elapsed time down to the hundredth of a second. We've
encapsulated this function inside a "timer object," which is created in the tmr.ot script.
This object type allows us to easily declare, start, and stop multiple virtual timers inside
our PL/SQL code, as shown here (a fragment of the vocab.tst timing script):
DECLARE
v translations.french%TYPE;
db_tmr tmr_t :=
tmr_t.make ('DB lookup', 10000);
...
BEGIN
db_tmr.go;
FOR indx IN 1 .. 10000
LOOP
v := vocab1.lookup ('computer');
END LOOP;
db_tmr.STOP;
When we run the vocab.tst script for 5,000, 7,500, and 10,000 iterations of the FOR loops (as shown earlier), we get results that are typified by Table 1. Here are some
observations on these results:
Linear searches of collections should always be avoided, especially since, as of Oracle 9i Release 2, the alternatives are so easy to write. Not only is the linear search very much slower than any of the alternatives for a given number of iterations, but also (and even more to its detriment) it doesn't scale linearly as the number of iterations increases.
Note: In the test as constructed here, the time for the linear lookup scales as "sum of 1-to-N" for N iterations. It's easy to show that our reported numbers follow this pattern, either by doing some algebra or by writing a trivial program to sum 1-to- 5000, 1-to-7500, and 1-to-10000. This is left as an exercise for the reader!
Database lookups supported by an appropriate index are very efficient (10,000 executions of the query-based lookup function took less than two seconds)-they just aren't the most efficient path to take.
The associative array and hash-based lookups are all dramatically faster (an order of magnitude or higher) than the database lookup. Their improvement over linear search is far greater.
Removal of the function interface for the associative array (
VARCHAR2 index) didn't appreciably improve performance. You should avoid direct access to data structures and instead hide them behind functions to improve overall maintainability of your code.Perhaps most impressive, the elapsed time for the index-supported database lookup and both the hash-based and associative array lookups scaled linearly as the number of iterations increased by an order of magnitude. These are all very stable algorithms.
Table 1. Performance comparison of five lookup approaches.
These measurements were made on a stand-alone, high-end laptop running Windows 2000. The times are in seconds, and are the raw results of a single run of the test. Of course, they vary from run to run. Forgive us for not taking the next step and quoting our figures with appropriate precision and standard deviations!
| Method | 5000 Iterations | 7500 Iterations | 10000 Iterations |
| DB lookup | 0.80 | 1.13 | 1.52 |
| Linear search | 18.91 | 43.13 | 78.45 |
| Hash Index | 0.09 | 0.13 | 0.17 |
| Assoc Array | 0.05 | 0.09 | 0.11 |
| Assoc Array (without function) | 0.04 | 0.06 | 0.08 |
The bottom line for PL/SQL developers: By extending the flexibility of the collection syntax, storage, and access, we're able to write much simpler code that is much more efficient than implementations possible in earlier versions. It's incumbent upon all of us to become aware of these impressive new features, and then figure out how to best integrate them into existing and new application development projects.
This article was originally published in the July 2002 issue of Oracle Professional. The material in Feuerstein's articles--and those he cowrote with Bryn Llewellyn--are based on Oracle Corporation white papers originally prepared by Llewellyn for Oracle OpenWorld 2001 in San Francisco and OracleWorld Copenhagen in June 2002, and Feuerstein's book, Oracle PL/SQL Programming, 3rd Edition.
Steven Feuerstein is considered one of the world's leading experts on the Oracle PL/SQL language, having written ten books on the subject. Steven is a Senior Technology Advisor with Quest Software and has been developing software since 1980.
Bryn Llewellyn is PL/SQL Product Manager, Database and Application Server Technologies Development Group, at Oracle Corporation Headquarters.
O'Reilly & Associates recently released (September 2002) Oracle PL/SQL Programming, 3rd Edition.
Sample Chapter 10, Dates and Timestamps, is available free online.
You can also look at the Full Description of the book.
For more information, or to order the book, click here.
Return to the O'Reilly Network.



