Using the New MODEL Clause in Oracle Database 10g
Pages: 1, 2, 3, 4
Along with the cardinality error, in 9.2.0.1 there was a bug when trying to
either join a pipelined function to another table or use it in the WHERE clause
as an argument to the IN operator to filter multiple rows. The 9.2.0.4 patch
fixed those problems but the cardinality error remained, and large tables that
were joined with these table functions were being full-table-scanned. Regardless of
the number of rows returned (which was usually very small), the full scans were
still being performed.
For this particular problem, the MODEL clause proved to be a nice
solution. By incorporating the MODEL clause, we were able to:
- Avoid the need for hints
- Remove the PL/SQL function
- Remove the object type
- Remove the array type
- Perform the procedural logic directly in SQL
- Obtain the correct cardinality when performing critical joins
Here's the MODEL version:
select empno,
s power_score,
list
from (
select score,
empno,
lag(score) over (partition by empno order by score) ls
from emp_score
)
where ls is not null
model
dimension by (empno)
measures (score s, ls, 0 tmp, cast(ls||','||score as varchar2(20)) list)
rules iterate(3) (
-- save the current score
tmp[any] = s[cv()],
-- compute the new score
s[any] = s[cv()] + ls[cv()],
-- update the lag score
ls[any] = tmp[cv()],
-- list has been initialized with the first two scores,
-- append the computed score
list[any] = list[cv()]||','||s[cv()]
)
order by 2 desc, 1;
EMPNO POWER_SCORE LIST
---------- ----------- --------------------
7698 22 2,6,8,14,22
7844 22 2,6,8,14,22
7934 22 2,6,8,14,22
7654 21 3,5,8,13,21
7499 19 2,5,7,12,19
7788 19 2,5,7,12,19
7566 17 1,5,6,11,17
7369 16 2,4,6,10,16
7782 16 2,4,6,10,16
7876 16 2,4,6,10,16
7521 14 1,4,5,9,14
7839 14 1,4,5,9,14
7900 14 1,4,5,9,14
7902 14 1,4,5,9,14
14 rows selected.
Let's briefly examine the example above before moving on. While the inline
comments let you follow the logical flow of the code, I'd like to elaborate
a bit on certain areas. The meaning behind the MODEL-specific syntax is not immediately obvious but is covered in great detail in
the Oracle documentation, and it makes sense once you begin using it.
First, I've used the analytic function LAG(). For those not familiar with LAG(),
it allows you to access prior rows in your result set without having to use
a self-join. So if the results were initially like this:
EMPNO SCORE
---------- ----------
7369 2
7369 4
LAG lets me access scores 2 and 4 at the same time without a self-join.
You'll also notice ITERATE(3) in the RULES clause. In this case,
3 could have been any number (as long as it's a constant, not a variable or
expression--hopefully this will be changed soon).
That instructs the MODEL clause to perform the code in the RULES
clause three times.
Let's break down the first rule:
tmp[any] = s[cv()]
tmp[]is our array, and its values default to 0 for every row; that is,tmp[7839]has a value of 0 initially.tmp[any]The ANY keyword lets you reference all empnos; that is, "for any empno in the table" (ALL might have been more intuitive).s[cv()]s[]is our array and defaults to the last score inemp_scorefor everyempno; that is,s[7839]has a value of 4. (Only the last score is kept ins[]; the first score is kept inls[].)cv()allows you to reference the current value of the dimension. I've used empty parentheses so the position will indicate the value, but you can be explicit:s[cv(empno)]
Let's put it all together for employee 7839:
Before we execute any rules, tmp[7839] is 0.
Through the first iteration, tmp[7839] is set to the second score, 4.
Through the second iteration, tmp[7839] is set to 5 (the second score of 4 + the prior
score of 1).
Through the third iteration, tmp[7839] is set to 9 (the new score of 5 + the second score of
4).
Now that we know what is going on, let's see what AUTOTRACE says:
SQL> set autotrace traceonly
SQL> /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=28 Bytes=1092)
1 0 SORT (ORDER BY) (Cost=5 Card=28 Bytes=1092)
2 1 SQL MODEL (ORDERED) (Cost=5 Card=28 Bytes=1092)
3 2 VIEW (Cost=4 Card=28 Bytes=1092)
4 3 WINDOW (SORT) (Cost=4 Card=28 Bytes=196)
5 4 TABLE ACCESS (FULL) OF 'EMP_SCORE' (TABLE) (Cost=3 Card=28
Bytes=196)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
738 bytes sent via SQL*Net to client
512 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed
According to autotrace, the performance is about the same, but notice there
are no recursive calls since this is just SQL and the cardinality estimates
are correct as well. By using the MODEL clause, not only do we help
the optimizer make better decisions, but we also get (some) flexibility of procedural
programming while keeping the set-based power.
The example above demonstrates that the MODEL clause gives us the ability to:
- Easily build lists of values
- Perform recursion
- Have array access to data
- Use local variables or placeholders in SQL (if you want to think of measures that way)
- Reduce the amount of code we write
Look at the syntax! This opens doors to new thinking when dealing with relational data. Things that were impossible or extremely inefficient to implement in SQL may now be as simple as using SELECT. I'm alluding to the possibility of performing matrix (eigenvalue) calculations or truly complex temporal functions directly in SQL. There is the potential for some great things here, and it's all in SQL.



