O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Subscribe to Databases Subscribe to Newsletters

Oracle 9i Release 2 Developments for PL/SQL Collections
Pages: 1, 2, 3, 4

The Pure SQL Approach

The vocab1 package offers the traditional, pure SQL approach to solving this problem:



CREATE OR REPLACE PACKAGE BODY vocab1
IS 
   FUNCTION lookup (
      p_english IN VARCHAR2)
      RETURN VARCHAR2
   IS 
      v_french   
         translations.french%TYPE;
   BEGIN
      SELECT french
        INTO v_french
        FROM translations
       WHERE english = p_english;

      RETURN v_french;
   END lookup;
   ...
END;

This is a wonderfully simple implementation and demonstrates the elegance of the SQL language. Each time Lookup is invoked, we make a roundtrip between PL/SQL and SQL, involving a "context switch." Oracle has been working hard to reduce the overhead of this switch, but you still pay a price. It would also be very convenient if this most straightforward of approaches is also the most efficient. We'll soon find out!

Linear Search in index-by-binary_integer Table

If we identify the context switch between PL/SQL and SQL as a potential performance problem, then we should seek ways to avoid that switch. One way to do this is to cache all the data in program memory, thereby removing the need for repeated SQL access. To do this, we must:

  1. Retrieve all the data from our static vocabulary table.

  2. Store the data in a session-persistent, PL/SQL data structure (the most appropriate in this case being the index-by table).

  3. Scan through that collection to find the French word and then return the associated English word.

This technique has been available ever since Oracle 7.3.4. The implementation is contained in the vocab2.sql script file and is shown in Example 3. Here are high-level explanations of the major areas of functionality:

Line(s) Description
3-10 Declare the index-by table type, the collection itself, and a query used to initialize the collection.
12-23 Use a FOR loop to scan through the collection linearly. As soon as a match is found, RETURN that value. For those of you who prefer a more structured approach using a WHILE loop, you'll find such an implementation in the vocab2.sql script file. It's slightly less efficient than the FOR loop.
26-35 Initialize the collection by transferring the contents of each row to the collection. This happens only the first time the lookup function (or any other program in the package) is called. Then the data persists in the collection.

Example 3. Using a linear search algorithm.

 1  CREATE OR REPLACE PACKAGE BODY vocab2
 2  IS
 3     TYPE word_list IS TABLE OF translations%ROWTYPE
 4        INDEX BY BINARY_INTEGER /* can't use pls_integer pre-9.2 */;
 5
 6     g_english_french   word_list;
 7
 8     CURSOR trans_cur
 9     IS
10        SELECT english, french FROM translations;
11
12     FUNCTION lookup (p_english IN VARCHAR2)
13        RETURN VARCHAR2
14     IS
15     BEGIN
16        FOR indx IN 1 .. g_english_french.LAST ()
17        LOOP
18           IF g_english_french (indx).english = p_english
19           THEN
20              RETURN g_english_french (indx).french;
21           END IF;
22        END LOOP;
23     END lookup;
24
25  BEGIN /* package initialization */
26     DECLARE
27        indx   PLS_INTEGER := 0;
28     BEGIN
29        FOR rec IN trans_cur
30        LOOP
31           indx := indx + 1;
32           g_english_french (indx).english := rec.english;
33           g_english_french (indx).french := rec.french;
34        END LOOP;
35     END;
36  END vocab2;

As you can see, the algorithm is trivial, but does require a fair amount of coding. In Oracle 9i Release 2, we can simplify and improve the performance of the initialization section, by the way, by using the BULK SELECT to directly populate the collection, as is shown here:

BEGIN
   OPEN trans_cur;
   FETCH trans_cur BULK COLLECT INTO g_english_french;
   CLOSE trans_cur;

This depends on the exciting new functionality in Oracle 9i that allows much wider use of RECORD binds in SQL statements than previously was possible. (Here, g_english_french is a table of RECORDs of the same shape as the translations table.) This will be the subject of our next article.

Hash-Based Lookup in index-by Table

The linear search algorithm suffers from the well-known disadvantage that on average we'll examine half the elements in the index-by table before we find a match. A possible improvement is to maintain the elements in lexical sort order and to use a binary chop algorithm: Compare the search target to the half-way element to determine which half it's in; repeat this test recursively on the relevant half. This requires much more elaborate coding-and testing-especially if you also need to write update and insert procedures that modify the database table and the cached collection. This is the sort of code that one writes as an exercise in a computer science class. One shouldn't have to go through such efforts when writing production code with a language as robust and mature as PL/SQL.

And, in fact, we don't-even prior to Oracle 9i Release 2. An alternative path is available to us by taking advantage of the Oracle hashing algorithm provided by the DBMS_UTILITY.GET_HASH_VALUE. Without going into all the details, a hash algorithm transforms a string into a number. If you use the algorithm correctly (give it enough distinct integer values to choose from), there's a very good chance (but no guarantee; see the end of this section for details on "conflict resolution") that every distinct string will convert to a distinct integer value. It's possible, therefore, to use hashing to construct our own string-based index, which can then be used to look up the French word from the English.

You'll find the hash-based implementation in the vocab3.sql script, the key elements of which are shown in Example 4. While it's beyond the scope of this article to offer a thorough explanation of the technique, the following table provides an overview of those key elements.

Line(s) Description
3-8 Declarations of variables and constants used for consistent hashing, as well as the index-by table holding the French translations. In this case, the row number of the collection will be the hashed value of the English word.
12-18 The lookup function takes the English word, hashes it to an integer, and then returns the French word found in that row. This makes more sense when you look at the initialization section.
21-26 The initialization section. In this case, for each row retrieved from the table, we hash the English word into an integer and then deposit the corresponding French word into that row in the collection.

Example 4. Key elements of the hash-based lookup technique.

 1  CREATE OR REPLACE PACKAGE BODY vocab3
 2  IS
 3     hash                   BINARY_INTEGER;
 4     g_hash_base   CONSTANT NUMBER := 1;
 5     g_hash_size   CONSTANT NUMBER := 1000000 ;
 6
 7     TYPE word_list IS TABLE OF translations.french%TYPE
 8        INDEX BY BINARY_INTEGER;
 9
10     g_english_french word_list;
11
12     FUNCTION lookup (p_english IN VARCHAR2) RETURN VARCHAR2
13     IS
14     BEGIN
15        hash := DBMS_UTILITY.get_hash_value (
16                   p_english, g_hash_base, g_hash_size);
17        RETURN g_english_french (hash);
18     END lookup;
19
20  BEGIN /* package initialization */
21     FOR rec IN (SELECT english, french FROM translations)
22     LOOP
23        hash := DBMS_UTILITY.get_hash_value (
24                   rec.english, g_hash_base, g_hash_size);
25        g_english_french (hash) := rec.french;
26     END LOOP;
27  END vocab3;

What a neat and clean algorithm! Unfortunately, it's a bit naive and definitely not suited for real-world use. The sad fact of the matter is that no one has yet come up with a hashing algorithm that can guarantee that two distinct values for the name IN parameter to get_hash_value will always produce distinct hash or integer values. For hashing to be "ready for prime time," one has to implement a "conflict resolution" algorithm (in other words, if "TABLE" and "CHAIR" both hash to the same number, you have to detect this and store the information in separate, traceable rows).

Oracle doesn't provide built-in conflict resolution. On the other hand, it's not all that difficult to implement this logic; one of the simplest techniques is called "linear probe." The altind.pkg script contains an implementation of such conflict resolution logic; you'll only need to make minimal changes to adapt it to your own circumstances.

Pages: 1, 2, 3, 4

Next Pagearrow




Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
June 2009
$9.99 USD

New Features in Oracle 9i New Features in Oracle 9i
by Howard J. Rogers
June 2009
$5.95 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com