advertisement

Listen Print Discuss

Using the New MODEL Clause in Oracle Database 10g
Pages: 1, 2, 3, 4

Along with the cardinality error, in 9.2.0.1 there was a bug when trying to either join a pipelined function to another table or use it in the WHERE clause as an argument to the IN operator to filter multiple rows. The 9.2.0.4 patch fixed those problems but the cardinality error remained, and large tables that were joined with these table functions were being full-table-scanned. Regardless of the number of rows returned (which was usually very small), the full scans were still being performed.

For this particular problem, the MODEL clause proved to be a nice solution. By incorporating the MODEL clause, we were able to:

  1. Avoid the need for hints
  2. Remove the PL/SQL function
  3. Remove the object type
  4. Remove the array type
  5. Perform the procedural logic directly in SQL
  6. Obtain the correct cardinality when performing critical joins

Here's the MODEL version:


select empno,
       s power_score,
       list
  from (
select score,
       empno,
       lag(score) over (partition by empno order by score) ls
  from emp_score
       )
 where ls is not null
 model
   dimension by (empno)
   measures (score s, ls, 0 tmp, cast(ls||','||score as varchar2(20)) list)
   rules iterate(3) (
     -- save the current score
      tmp[any] = s[cv()],
     -- compute the new score
        s[any] = s[cv()] + ls[cv()],
     -- update the lag score 
       ls[any] = tmp[cv()], 
-- list has been initialized with the first two scores, 
     -- append the computed score
     list[any] = list[cv()]||','||s[cv()] 
   )
 order by 2 desc, 1;


     EMPNO POWER_SCORE LIST
---------- ----------- --------------------
      7698          22 2,6,8,14,22
      7844          22 2,6,8,14,22
      7934          22 2,6,8,14,22
      7654          21 3,5,8,13,21
      7499          19 2,5,7,12,19
      7788          19 2,5,7,12,19
      7566          17 1,5,6,11,17
      7369          16 2,4,6,10,16
      7782          16 2,4,6,10,16
      7876          16 2,4,6,10,16
      7521          14 1,4,5,9,14
      7839          14 1,4,5,9,14
      7900          14 1,4,5,9,14
      7902          14 1,4,5,9,14

14 rows selected.

Let's briefly examine the example above before moving on. While the inline comments let you follow the logical flow of the code, I'd like to elaborate a bit on certain areas. The meaning behind the MODEL-specific syntax is not immediately obvious but is covered in great detail in the Oracle documentation, and it makes sense once you begin using it.

First, I've used the analytic function LAG(). For those not familiar with LAG(), it allows you to access prior rows in your result set without having to use a self-join. So if the results were initially like this:


     EMPNO      SCORE 
---------- ---------- 
      7369          2 
      7369          4 

LAG lets me access scores 2 and 4 at the same time without a self-join.

You'll also notice ITERATE(3) in the RULES clause. In this case, 3 could have been any number (as long as it's a constant, not a variable or expression--hopefully this will be changed soon).

That instructs the MODEL clause to perform the code in the RULES clause three times.

Let's break down the first rule:


tmp[any] = s[cv()]
  1. tmp[] is our array, and its values default to 0 for every row; that is, tmp[7839] has a value of 0 initially.
  2. tmp[any] The ANY keyword lets you reference all empnos; that is, "for any empno in the table" (ALL might have been more intuitive).
  3. s[cv()] s[] is our array and defaults to the last score in emp_score for every empno; that is, s[7839] has a value of 4. (Only the last score is kept in s[]; the first score is kept in ls[].) cv() allows you to reference the current value of the dimension. I've used empty parentheses so the position will indicate the value, but you can be explicit: s[cv(empno)]

Let's put it all together for employee 7839:

Before we execute any rules, tmp[7839] is 0.

Through the first iteration, tmp[7839] is set to the second score, 4.

Through the second iteration, tmp[7839] is set to 5 (the second score of 4 + the prior score of 1).

Through the third iteration, tmp[7839] is set to 9 (the new score of 5 + the second score of 4).

Now that we know what is going on, let's see what AUTOTRACE says:


SQL> set autotrace traceonly
SQL> /
 

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=28 Bytes=1092)
   1    0   SORT (ORDER BY) (Cost=5 Card=28 Bytes=1092)
   2    1     SQL MODEL (ORDERED) (Cost=5 Card=28 Bytes=1092)
   3    2       VIEW (Cost=4 Card=28 Bytes=1092)
   4    3         WINDOW (SORT) (Cost=4 Card=28 Bytes=196)
   5    4           TABLE ACCESS (FULL) OF 'EMP_SCORE' (TABLE) (Cost=3 Card=28 
Bytes=196)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        738  bytes sent via SQL*Net to client
        512  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         14  rows processed

According to autotrace, the performance is about the same, but notice there are no recursive calls since this is just SQL and the cardinality estimates are correct as well. By using the MODEL clause, not only do we help the optimizer make better decisions, but we also get (some) flexibility of procedural programming while keeping the set-based power.

The example above demonstrates that the MODEL clause gives us the ability to:

  • Easily build lists of values
  • Perform recursion
  • Have array access to data
  • Use local variables or placeholders in SQL (if you want to think of measures that way)
  • Reduce the amount of code we write

Look at the syntax! This opens doors to new thinking when dealing with relational data. Things that were impossible or extremely inefficient to implement in SQL may now be as simple as using SELECT. I'm alluding to the possibility of performing matrix (eigenvalue) calculations or truly complex temporal functions directly in SQL. There is the potential for some great things here, and it's all in SQL.

Pages: 1, 2, 3, 4

Next Pagearrow