Using the New MODEL Clause in Oracle Database 10g
Pages: 1, 2, 3, 4
Recursive Logic and CSV
My "discovery" of what the MODEL clause can do came about while
trying to improve the performance of an existing pipelined table function. The
original requirement was to create a query to compute a "power score" for employees
and to display the score progression in a comma-separated value (CSV) list.
I'll use Scott's (the demo schema that comes with every oracle database) standard emp table along with a table called EMP_SCORE as
defined below to help me explain further:
create table emp_score (empno number(4), score number, create_date date);
insert into emp_score ( empno,score,create_date )
select empno, round(dbms_random.value(1,3)),sysdate
from emp
union all
select empno, round(dbms_random.value(4,6)),sysdate
from emp;
SQL> select * from emp_score order by 1;
EMPNO SCORE CREATE_DA
---------- ---------- ---------
7369 2 25-JUL-04
7369 4 25-JUL-04
7499 2 25-JUL-04
7499 5 25-JUL-04
7521 1 25-JUL-04
7521 4 25-JUL-04
7566 1 25-JUL-04
7566 5 25-JUL-04
7654 3 25-JUL-04
7654 5 25-JUL-04
7698 2 25-JUL-04
7698 6 25-JUL-04
7782 2 25-JUL-04
7782 4 25-JUL-04
7788 2 25-JUL-04
7788 5 25-JUL-04
7839 1 25-JUL-04
7839 4 25-JUL-04
7844 2 25-JUL-04
7844 6 25-JUL-04
7876 2 25-JUL-04
7876 4 25-JUL-04
7900 1 25-JUL-04
7900 4 25-JUL-04
7902 1 25-JUL-04
7902 4 25-JUL-04
7934 2 25-JUL-04
7934 6 25-JUL-04
28 rows selected.
The SCORE column represents the employee's scores during two evaluations.
The "power score" is computed by summing the two prior scores n times (for this example, after the initial sum, we'll just sum twice to calculate the power score). So, for example, if an employee scored 1 and 5, his power score would be 17, because 1+5=6, 6+5=11, and 11+6=17. The CSV list would display all the numbers involved in getting to the final score, which in this case is 1,5,6,11,17.
Based on the data in EMP_SCORE, the results for employee 7369 should look like
this:
EMPNO POWER_SCORE LIST
---------- ----------- --------------------
7369 16 2,4,6,10,16
Due to the recursive nature of the computation (we see Fibonacci in there),
my first attempt made use of the analytic LAG along with the WITH clause to
calculate the power score, while the CSV list was constructed in a hierarchical
fashion. The CSV was easy enough, but the power score was tough to compute efficiently
because future rows depended on rows created through past computation (rows
that didn't yet exist). After some testing using only SQL, the performance proved
to be poor and also a bit inaccurate.
I finally settled on a pipelined table function, much like the one below:
create type emp_score_obj as object ( empno number, score number, list
varchar2(20) );
/
create type emp_score_array as table of emp_score_obj;
/
create function get_emp_power_score
return emp_score_array pipelined
as
l_data emp_score_array := emp_score_array();
l_score1 number := 0;
l_score2 number := 0;
l_tmp number := 0;
l_list varchar2(20);
begin
for i in (
select emp_score_obj (empno,score,null) emp_row
from emp_score
order by empno
)
loop
/* this is the first loop iteration set l_data to the first row in the loop */
if ( l_data.count() = 0 )
then
l_data.extend();
l_data(l_data.last()) := i.emp_row;
elsif ( l_data(l_data.last()).empno = i.emp_row.empno )
then
/* this is the next score, the current empno is the same as
* the prior, compute the power score and build the csv list
*/
l_score2 := l_data(l_data.last()).score;
l_score1 := i.emp_row.score;
l_tmp := l_score1 + l_score2;
l_list := l_data(l_data.last()).score || ',' || i.emp_row.score ||
',' || l_tmp || ',';
for j in 1 .. 2
loop
l_score2 := l_score1;
l_score1 := l_tmp;
l_tmp := l_score1 + l_score2;
l_list := l_list || l_tmp || ',';
end loop;
l_data(l_data.last()).score := l_tmp;
l_data(l_data.last()).list := rtrim(l_list,',');
else
/* reached a new employee, pipe the row and reset l_data */
pipe row (l_data(l_data.last()));
l_data(l_data.last()) := i.emp_row;
end if;
end loop;
/* pipe out the last row */
pipe row (l_data(l_data.last()));
return;
end get_emp_power_score;
/
Since we were returning the rows in a pipelined (streaming) fashion, the performance
was fine initially. It was when the function was called constantly and then
joined with other tables that we ran into trouble. We can get a glimpse of the
potential problems even when using the tiny emp_score table:
SQL> set autotrace on
SQL> select * from table( get_emp_power_score() ) order by 2 desc, 1;
EMPNO SCORE LIST
---------- ---------- --------------------
7698 22 2,6,8,14,22
7844 22 2,6,8,14,22
7934 22 2,6,8,14,22
7654 21 3,5,8,13,21
7499 19 2,5,7,12,19
7788 19 2,5,7,12,19
7566 17 1,5,6,11,17
7369 16 2,4,6,10,16
7782 16 2,4,6,10,16
7876 16 2,4,6,10,16
7521 14 1,4,5,9,14
7839 14 1,4,5,9,14
7900 14 1,4,5,9,14
7902 14 1,4,5,9,14
14 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=26 Card=8168 Bytes=16336)
1 0 SORT (ORDER BY) (Cost=26 Card=8168 Bytes=16336)
2 1 COLLECTION ITERATOR (PICKLER FETCH) OF 'GET_EMP_POWER_SCORE'
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
732 bytes sent via SQL*Net to client
512 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed
For frequently executing SQL, recursive calls can be problematic, but the main problem
here is the erroneous cardinality estimate (which will vary based on your db
block size). The solution is to use the CARDINALITY hint. At the time, this
had proven to be a huge help, but this "solution" had two fundamental problems:
- It's a hint.
- It's static, so the cardinality you pick at time t1 might be wrong at time t2.



