advertisement

Listen Print Discuss

Using the New MODEL Clause in Oracle Database 10g
Pages: 1, 2, 3, 4

Recursive Logic and CSV

My "discovery" of what the MODEL clause can do came about while trying to improve the performance of an existing pipelined table function. The original requirement was to create a query to compute a "power score" for employees and to display the score progression in a comma-separated value (CSV) list.

I'll use Scott's (the demo schema that comes with every oracle database) standard emp table along with a table called EMP_SCORE as defined below to help me explain further:


create table emp_score (empno number(4), score number, create_date date); 

insert into emp_score ( empno,score,create_date )
select empno, round(dbms_random.value(1,3)),sysdate
  from emp
union all
select empno, round(dbms_random.value(4,6)),sysdate
  from emp;


SQL> select * from emp_score order by 1;

     EMPNO      SCORE CREATE_DA
---------- ---------- ---------
      7369          2 25-JUL-04
      7369          4 25-JUL-04
      7499          2 25-JUL-04
      7499          5 25-JUL-04
      7521          1 25-JUL-04
      7521          4 25-JUL-04
      7566          1 25-JUL-04
      7566          5 25-JUL-04
      7654          3 25-JUL-04
      7654          5 25-JUL-04
      7698          2 25-JUL-04
      7698          6 25-JUL-04
      7782          2 25-JUL-04
      7782          4 25-JUL-04
      7788          2 25-JUL-04
      7788          5 25-JUL-04
      7839          1 25-JUL-04
      7839          4 25-JUL-04
      7844          2 25-JUL-04
      7844          6 25-JUL-04
      7876          2 25-JUL-04
      7876          4 25-JUL-04
      7900          1 25-JUL-04
      7900          4 25-JUL-04
      7902          1 25-JUL-04
      7902          4 25-JUL-04
      7934          2 25-JUL-04
      7934          6 25-JUL-04

28 rows selected.

The SCORE column represents the employee's scores during two evaluations.

The "power score" is computed by summing the two prior scores n times (for this example, after the initial sum, we'll just sum twice to calculate the power score). So, for example, if an employee scored 1 and 5, his power score would be 17, because 1+5=6, 6+5=11, and 11+6=17. The CSV list would display all the numbers involved in getting to the final score, which in this case is 1,5,6,11,17.

Based on the data in EMP_SCORE, the results for employee 7369 should look like this:


     EMPNO POWER_SCORE LIST
---------- ----------- --------------------
      7369          16 2,4,6,10,16

Due to the recursive nature of the computation (we see Fibonacci in there), my first attempt made use of the analytic LAG along with the WITH clause to calculate the power score, while the CSV list was constructed in a hierarchical fashion. The CSV was easy enough, but the power score was tough to compute efficiently because future rows depended on rows created through past computation (rows that didn't yet exist). After some testing using only SQL, the performance proved to be poor and also a bit inaccurate.

I finally settled on a pipelined table function, much like the one below:

 
create type emp_score_obj as object ( empno number, score number, list 
varchar2(20) );
/

create type emp_score_array as table of emp_score_obj;
/

create function get_emp_power_score
return emp_score_array pipelined
as

    l_data      emp_score_array := emp_score_array();
    l_score1    number := 0;
    l_score2    number := 0;
    l_tmp       number := 0;
    l_list      varchar2(20);

begin

    for i in (
        select emp_score_obj (empno,score,null) emp_row
          from emp_score
         order by empno
    )
    loop

/* this is the first loop iteration set l_data to the first row in the loop */
        if ( l_data.count() = 0 )
        then

            l_data.extend();
            l_data(l_data.last()) := i.emp_row;

        elsif ( l_data(l_data.last()).empno = i.emp_row.empno )
        then
       /* this is the next score, the current empno is the same as
        * the prior, compute the power score and build the csv list
        */
            l_score2 := l_data(l_data.last()).score;
            l_score1 := i.emp_row.score;
            l_tmp := l_score1 + l_score2;
            l_list := l_data(l_data.last()).score || ',' || i.emp_row.score || 
',' || l_tmp || ',';
            for j in 1 .. 2
            loop
                l_score2 := l_score1;
                l_score1 := l_tmp; 
                l_tmp    := l_score1 + l_score2;
                l_list   := l_list || l_tmp || ','; 
            end loop;   
            l_data(l_data.last()).score := l_tmp;
            l_data(l_data.last()).list  := rtrim(l_list,',');     
        else
            /* reached a new employee, pipe the row and reset l_data */
            pipe row (l_data(l_data.last()));
            l_data(l_data.last()) := i.emp_row;
        end if;
   
    end loop;

    /* pipe out the last row */
    pipe row (l_data(l_data.last()));
   
    return;

end get_emp_power_score;
/

Since we were returning the rows in a pipelined (streaming) fashion, the performance was fine initially. It was when the function was called constantly and then joined with other tables that we ran into trouble. We can get a glimpse of the potential problems even when using the tiny emp_score table:


SQL> set autotrace on
SQL> select * from table( get_emp_power_score() ) order by 2 desc, 1;

     EMPNO      SCORE LIST
---------- ---------- --------------------
      7698         22 2,6,8,14,22
      7844         22 2,6,8,14,22
      7934         22 2,6,8,14,22
      7654         21 3,5,8,13,21
      7499         19 2,5,7,12,19
      7788         19 2,5,7,12,19
      7566         17 1,5,6,11,17
      7369         16 2,4,6,10,16
      7782         16 2,4,6,10,16
      7876         16 2,4,6,10,16
      7521         14 1,4,5,9,14
      7839         14 1,4,5,9,14
      7900         14 1,4,5,9,14
      7902         14 1,4,5,9,14

14 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=26 Card=8168 Bytes=16336)
   1    0   SORT (ORDER BY) (Cost=26 Card=8168 Bytes=16336)
   2    1     COLLECTION ITERATOR (PICKLER FETCH) OF 'GET_EMP_POWER_SCORE'


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        732  bytes sent via SQL*Net to client
        512  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         14  rows processed
 

For frequently executing SQL, recursive calls can be problematic, but the main problem here is the erroneous cardinality estimate (which will vary based on your db block size). The solution is to use the CARDINALITY hint. At the time, this had proven to be a huge help, but this "solution" had two fundamental problems:

  1. It's a hint.
  2. It's static, so the cardinality you pick at time t1 might be wrong at time t2.

Pages: 1, 2, 3, 4

Next Pagearrow