Oracle 9i Release 2 Developments for PL/SQL Collections
Pages: 1, 2, 3, 4
The Pure SQL Approach
The vocab1 package offers the traditional, pure SQL approach to solving this problem:
CREATE OR REPLACE PACKAGE BODY vocab1
IS
FUNCTION lookup (
p_english IN VARCHAR2)
RETURN VARCHAR2
IS
v_french
translations.french%TYPE;
BEGIN
SELECT french
INTO v_french
FROM translations
WHERE english = p_english;
RETURN v_french;
END lookup;
...
END;
This is a wonderfully simple implementation and demonstrates the elegance of the SQL language. Each time Lookup is invoked, we make a roundtrip between PL/SQL and SQL, involving a "context switch." Oracle has been working hard to reduce the overhead of this switch, but you still pay a price. It would also be very convenient if this most straightforward of approaches is also the most efficient. We'll soon find out!
Linear Search in index-by-binary_integer Table
If we identify the context switch between PL/SQL and SQL as a potential performance problem, then we should seek ways to avoid that switch. One way to do this is to cache all the data in program memory, thereby removing the need for repeated SQL access. To do this, we must:
Retrieve all the data from our static vocabulary table.
Store the data in a session-persistent, PL/SQL data structure (the most appropriate in this case being the index-by table).
Scan through that collection to find the French word and then return the associated English word.
This technique has been available ever since Oracle 7.3.4. The implementation is contained in the vocab2.sql script file and is shown in Example 3. Here are high-level explanations of the major areas of functionality:
| Description | |
| 3-10 | Declare the index-by table type, the collection itself, and a query used to initialize the collection. |
| 12-23 | Use a FOR loop to scan through the collection linearly. As soon as a match
is found, RETURN that value. For those of you who prefer a more structured
approach using a WHILE loop, you'll find such an implementation in the
vocab2.sql script file. It's slightly less efficient than the FOR loop. |
| 26-35 | Initialize the collection by transferring the contents of each row to the collection. This happens only the first time the lookup function (or any other program in the package) is called. Then the data persists in the collection. |
Example 3. Using a linear search algorithm.
1 CREATE OR REPLACE PACKAGE BODY vocab2
2 IS
3 TYPE word_list IS TABLE OF translations%ROWTYPE
4 INDEX BY BINARY_INTEGER /* can't use pls_integer pre-9.2 */;
5
6 g_english_french word_list;
7
8 CURSOR trans_cur
9 IS
10 SELECT english, french FROM translations;
11
12 FUNCTION lookup (p_english IN VARCHAR2)
13 RETURN VARCHAR2
14 IS
15 BEGIN
16 FOR indx IN 1 .. g_english_french.LAST ()
17 LOOP
18 IF g_english_french (indx).english = p_english
19 THEN
20 RETURN g_english_french (indx).french;
21 END IF;
22 END LOOP;
23 END lookup;
24
25 BEGIN /* package initialization */
26 DECLARE
27 indx PLS_INTEGER := 0;
28 BEGIN
29 FOR rec IN trans_cur
30 LOOP
31 indx := indx + 1;
32 g_english_french (indx).english := rec.english;
33 g_english_french (indx).french := rec.french;
34 END LOOP;
35 END;
36 END vocab2;
As you can see, the algorithm is trivial, but does require a fair amount of coding. In
Oracle 9i Release 2, we can simplify and improve the performance of the initialization
section, by the way, by using the BULK SELECT to directly populate the collection, as is
shown here:
BEGIN
OPEN trans_cur;
FETCH trans_cur BULK COLLECT INTO g_english_french;
CLOSE trans_cur;
This depends on the exciting new functionality in Oracle 9i that allows much wider
use of RECORD binds in SQL statements than previously was possible. (Here,
g_english_french is a table of RECORDs of the same shape as the translations table.) This will be the subject of our next article.
Hash-Based Lookup in index-by Table
The linear search algorithm suffers from the well-known disadvantage that on average we'll examine half the elements in the index-by table before we find a match. A possible improvement is to maintain the elements in lexical sort order and to use a binary chop algorithm: Compare the search target to the half-way element to determine which half it's in; repeat this test recursively on the relevant half. This requires much more elaborate coding-and testing-especially if you also need to write update and insert procedures that modify the database table and the cached collection. This is the sort of code that one writes as an exercise in a computer science class. One shouldn't have to go through such efforts when writing production code with a language as robust and mature as PL/SQL.
And, in fact, we don't-even prior to Oracle 9i Release 2. An alternative path is
available to us by taking advantage of the Oracle hashing algorithm provided by the
DBMS_UTILITY.GET_HASH_VALUE. Without going into all the details, a hash
algorithm transforms a string into a number. If you use the algorithm correctly (give it
enough distinct integer values to choose from), there's a very good chance (but no
guarantee; see the end of this section for details on "conflict resolution") that every
distinct string will convert to a distinct integer value. It's possible, therefore, to use
hashing to construct our own string-based index, which can then be used to look up the
French word from the English.
You'll find the hash-based implementation in the vocab3.sql script, the key elements of which are shown in Example 4. While it's beyond the scope of this article to offer a thorough explanation of the technique, the following table provides an overview of those key elements.
| Description | |
| 3-8 | Declarations of variables and constants used for consistent hashing, as well as the index-by table holding the French translations. In this case, the row number of the collection will be the hashed value of the English word. |
| 12-18 | The lookup function takes the English word, hashes it to an integer, and then returns the French word found in that row. This makes more sense when you look at the initialization section. |
| 21-26 | The initialization section. In this case, for each row retrieved from the table, we hash the English word into an integer and then deposit the corresponding French word into that row in the collection. |
Example 4. Key elements of the hash-based lookup technique.
1 CREATE OR REPLACE PACKAGE BODY vocab3
2 IS
3 hash BINARY_INTEGER;
4 g_hash_base CONSTANT NUMBER := 1;
5 g_hash_size CONSTANT NUMBER := 1000000 ;
6
7 TYPE word_list IS TABLE OF translations.french%TYPE
8 INDEX BY BINARY_INTEGER;
9
10 g_english_french word_list;
11
12 FUNCTION lookup (p_english IN VARCHAR2) RETURN VARCHAR2
13 IS
14 BEGIN
15 hash := DBMS_UTILITY.get_hash_value (
16 p_english, g_hash_base, g_hash_size);
17 RETURN g_english_french (hash);
18 END lookup;
19
20 BEGIN /* package initialization */
21 FOR rec IN (SELECT english, french FROM translations)
22 LOOP
23 hash := DBMS_UTILITY.get_hash_value (
24 rec.english, g_hash_base, g_hash_size);
25 g_english_french (hash) := rec.french;
26 END LOOP;
27 END vocab3;
What a neat and clean algorithm! Unfortunately, it's a bit naive and definitely not
suited for real-world use. The sad fact of the matter is that no one has yet come up with a
hashing algorithm that can guarantee that two distinct values for the name IN parameter
to get_hash_value will always produce distinct hash or integer values. For hashing to be
"ready for prime time," one has to implement a "conflict resolution" algorithm (in other
words, if "TABLE" and "CHAIR" both hash to the same number, you have to detect this and store the information in separate, traceable rows).
Oracle doesn't provide built-in conflict resolution. On the other hand, it's not all that difficult to implement this logic; one of the simplest techniques is called "linear probe." The altind.pkg script contains an implementation of such conflict resolution logic; you'll only need to make minimal changes to adapt it to your own circumstances.



