O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 http://www.oreillynet.com/pub/a/network/2003/01/22/feuerstein.html
 See this if you're having trouble printing code examples


Oracle PL/SQL Programming, 3rd Edition

Table Functions and Cursor Expressions

by Steven Feuerstein, coauthor of Oracle PL/SQL Programming, 3rd Edition and Bryn Llewellyn
01/22/2003

Editor's note: In their last article, Steven Feuerstein and Bryn Llewellyn covered native compilation of PL/SQL, CASE statements and CASE expressions, and bulk binding in native dynamic SQL. In this article, the pair introduces cursor expressions and then shows how those expressions can be utilized with table functions. (The material in this article is based on an Oracle Corporation white paper originally presented by Bryn Llewellyn at Oracle OpenWorld 2001 in San Francisco.)

Introduction

Cursor expressions (sometimes known as cursor subqueries) are an element of the SQL language. In pre-Oracle 9i they were supported in SQL and by certain programming environments but not by PL/SQL. Oracle 9i introduces PL/SQL support for cursor expressions. For example, a cursor expression can now be used in the SELECT statement used to open a PL/SQL cursor, and manipulated appropriately thereafter. It can also be used as an actual parameter to a PL/SQL procedure or function, which has great significance in connection with table functions.

Related Reading

Oracle PL/SQL Programming

Oracle PL/SQL Programming
By Steven Feuerstein

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

Table functions were also supported (in rudimentary form) in pre-Oracle 9i, but a number of major enhancements have been made at Oracle 9i. A table function can now be written to deliver rows in pipeline fashion as soon as they're computed, dramatically improving response time in a "first-rows" scenario. A function can now be written to accept a SELECT statement as input, allowing an indefinite number of transformations to be daisy-chained, avoiding the need for storage of intermediate results. And it can now be written so that its computation can be parallelized to leverage Oracle's parallel query mechanism.

The enabling of parallel execution of a table function means that it's now practical to implement the ETL (Extraction, Transformation, Load) phase of data warehouse applications in PL/SQL. (A pre-Oracle 9i table function caused serialization.)

Let's take a look first at cursor expressions.

Manipulating Cursor Expressions in PL/SQL

Consider the following task: List the department names, and, for each department, list the names of the employees in that department. It can be simply implemented by a classical sequential programming approach, as shown in Example 1.

Example 1. Use of nested FOR loops.

begin
   for department in (select department_id, department_name
                          from departments
                      order by department_name)
   loop
      show (department.department_name);

      for employee in (select employee_id, last_name
                           from employees
                          where department_id = department.department_id
                       order by last_name)
      loop
         show (employee.last_name);
      end loop;
   end loop;
end;

A note on the code: Whenever two authors collaborate, they determine how best to share the work and present their contributions. We have two different formatting styles: Steven prefers to capitalize key and reserved words, while lowercasing everything else; Bryn likes to present everything in lowercase. As Bryn wrote the vast majority of the code for this article, we'll use his preferred style.

The CURSOR expression, introduced in Oracle 8i but not available within PL/SQL, allows you to more concisely express the desired result set in a single query, such as:

select
  department_name,
  cursor (
     select last_name
     from employees e
     where e.department_id = d.department_id
     order by last_name
  ) the_employees
  from departments d
  order by department_name;

Now with Oracle 9i, you can take advantage of the CURSOR expression in your PL/SQL; Example 2 shows a rewrite of the original nested FOR loop using these expressions. Though this approach requires more lines of code, and is arguably less easy to proofread than the sequentially programmed implementation, it has this advantage: There's only one SQL statement, and so it can be optimized more effectively than (what the SQL engine sees as) two unconnected SQL statements. Table 1 below presents an explanation of the more interesting code lines in Example 2.

Example 2. Use of CURSOR expression to retrieve all data in a single query.

 1  declare
 2      cursor the_departments is
 3        select
 4            department_name,
 5            cursor (
 6                     select last_name
 7                       from employees e
 8                       where e.department_id = d.department_id
 9                       order by last_name
10                   )
11          from departments d
12          where department_name in ( 'executive', 'marketing' )
13          order by department_name;
14
15     v_department_name       departments.department_name%type;
16     the_employees           sys_refcursor;
17
18     type employee_last_names_t is table of employees.last_name%type
19        index by binary_integer;
20
21     v_employee_last_names   employee_last_names_t;
22  begin
23     open the_departments;
24
25     loop
26        fetch the_departments into v_department_name, the_employees;
27        exit when the_departments%notfound;
28
29        show (v_department_name);
30
31        fetch the_employees bulk collect into v_employee_last_names;
32
33        for indx in v_employee_last_names.first .. V_employee_last_names.last
34        loop
35           show (v_employee_last_names (indx));
36        end loop;
37     end loop;
38
39     close the_departments;
40     end;

Table 1. Explanation of code in Example 2.

LineDescription
5CURSOR expression inside the query means that the second element in the SELECT list for the cursor is actually a cursor or result set in its own right.
16Use of the Oracle 9i, pre-defined weak REF CURSOR type, SYS_REFCURSOR. This way, we don't have to declare our own TYPE xxx IS REF CURSOR. These types are used to declare cursor variables (the_employees is a cursor variable).
26Our fetch statement fetches into the cursor variable. This means that it now points to cursor, which in turn identifies a result set of its own.
31This cursor (the_employees) is opened implicitly by Oracle, so we can immediately perform a fetch (in this case, a bulk fetch) to extract all the employee last names into our local collection (defined on lines 18-19).
33-36Use a numeric FOR loop to scan through the bulk-collected names and process them.

Using a Cursor Expression as an Actual Parameter to a PL/SQL Function

A cursor variable (that is, a variable of type ref cursor) points to an actual cursor, and may be used as a formal parameter to a PL/SQL procedure or function. A cursor expression defines an actual cursor. (Both these statements are true pre-Oracle 9i.) So we'd expect that it would be possible to invoke a PL/SQL procedure or function that has a formal parameter of type ref cursor with a cursor expression as its actual parameter, thus:

My_Function ( cursor ( select my_col from my_tab ) )

In fact, this wasn't allowed under any circumstances pre-Oracle 9i (attempts would result in the ORA-22902 exception). New in Oracle 9i, it is now allowed under certain circumstances--namely, when the function (it can't be a procedure) is invoked in a top- level SQL statement.

Given a function that can be invoked as follows:

declare
  the_cursor  sys_refcursor;
  n number;
begin
  open the_cursor for
    select my_col from my_tab;
  n := My_function (the_cursor);
  close the_cursor;
end;

it can now be called directly within the query, like this:

select 'my_function' my_function 
  from dual
 where 
    my_function ( 
     cursor ( 
        select my_col 
            from my_tab ) ) 
    = 1;

or:

select 'my_function' my_function 
  from dual
 order by 
    my_function ( 
     cursor ( 
        select my_col 
            from my_tab ) );

Most significantly, this syntax is now allowed in the invocation of a table function in the FROM list of a SELECT statement, which we'll explore later in the article.

The "Young Managers" Scenario

Consider the requirement to find those managers in the employees table, the majority of whose direct reports were hired before the manager. The algorithm depends on finding the direct reports for each manager and comparing the number who were hired before him with the number who were hired after him. This can be programmed straightforwardly in PL/SQL using classical techniques. (Note that, seeking to use enhanced Oracle 9i functionality, this is implemented using a single SQL SELECT that has a cursor subquery for the reports of a given manager.) This approach allows the production of a report, or as is illustrated, populating a table with the results.

But suppose the requirement is more subtle: to create a VIEW to represent managers as specified, so that it can be leveraged in ad hoc queries representing the current state of the underlying data. In fact, the requirement in this scenario can be implemented in pure SQL using only SQL functions such as SUM and DECODE. There are some rules that are too complex to implement by DECODE, in which case the user could write his own function.

But this approach, though it works, feels back-to-front! Unlike the classical approach described previously, it doesn't model the simple statement of the algorithm, and is therefore hard to write and to proofread. A more comfortable approach is to define a view as follows:

create view young_managers as
  select ...
    from employees managers
   where most_reports_before_manager( 
     < stuff for this manager > ) = 1;

We can do this classically like so:

create view young_managers as
  select ...
    from employees managers
    where most_reports_before_manager
      (
        managers.employee_id, managers.hire_date
      ) = 1;

or by passing a cursor expression as the actual parameter to a function whose formal parameter is of type REF CURSOR (see Example 3), giving us something more like this:

create view young_managers as
  select ...
    from employees managers
    where most_reports_before_manager
      (
        cursor ( < select statement > ),
        managers.hire_date
      ) = 1;

Example 3. Using a function with a ref cursor parameter in a WHERE clause.

create or replace function Most_Reports_Before_Manager (
  report_hire_dates_cur  in sys_refcursor,
  manager_hire_date      in date ) 
  return number
is
  type report_hire_date_t   is table of employees.hire_date%type
                               index by binary_integer;
  report_hire_dates         report_hire_date_t;
  before integer:=0; after integer:=0;
begin
  fetch report_hire_dates_cur bulk collect into report_hire_dates;
  if report_hire_dates.count > 0
  then
    for j in report_hire_dates.first..report_hire_dates.last
    loop
      case report_hire_dates(j) < manager_hire_date
        when                                   true then before:=before+1;
        else                                             after:=after+1;
      end case;
    end loop;
  end if;
  case before > after
    when         true then return 1;
    else                   return 0;
  end case;
end Most_Reports_Before_Manager;

create or replace view young_managers as
  select managers.employee_id manager_employee_id
    from employees managers
    where Most_Reports_Before_Manager
      (
        cursor ( select reports.hire_date from employees reports
                   where reports.manager_id = managers.employee_id
               ),
        managers.hire_date
      ) = 1;

The Example 3 approach isn't possible before Oracle 9i. Its advantage over the approach of using a classical function in a WHERE clause is marginal rather than dramatic: It offers greater potential for reuse in that its logic is expressed in terms of, and depends only on, the select list for an arbitrary SELECT, whereas the classical approach hard-codes the SELECT. And, since there's only one SQL statement, this can be optimized more effectively than (what the SQL engine sees as) two unconnected SQL statements (as discussed earlier).

The dramatic benefit of the new Oracle 9i feature allowing a cursor expression as an actual parameter to a PL/SQL function comes in connection with table functions, which we'll now explore.

Table Functions: Recap

Suppose we have two schema-level types, a tuple analogous to a table row and a table of these, defined as follows:

create type lookup_row as 
  object ( idx number, text varchar2(20) );
create type lookups_tab as 
  table of lookup_row;

We can then write a PL/SQL function that returns an instance of the table as shown in Example 4. While the example doesn't reflect a "real-world" scenario, it's intended to emphasize the fact that you can model arbitrarily complex logic within the function, something that can be quite the challenge in pure SQL.

Example 4. A function that returns a collection of information.

create or replace function lookups_fn return lookups_tab
is 
   v_table   lookups_tab;
begin
   /*
  to extend a nested table, you must use the built-in 
  procedure extend, but to extend an index-by table, 
  you just specify larger subscripts. 
  */
   v_table := lookups_tab (lookup_row (1, 'one'));

   for j in 2 .. 9
   loop
      v_table.extend;

      if j = 2
      then
         v_table (j) := lookup_row (2, 'two');
      elsif j = 3
      then
         v_table (j) := lookup_row (3, 'three');
      elsif j = 4
      then
         v_table (j) := lookup_row (4, 'four');
      elsif j = 5
      then
         v_table (j) := lookup_row (5, 'five');
      elsif j = 6
      then
         v_table (j) := lookup_row (6, 'six');
      elsif j = 7
      then
         v_table (j) := lookup_row (7, 'seven');
      else
         v_table (j) := lookup_row (j, 'other');
      end if;
   end loop;

   return v_table;
end lookups_fn;

We can then invoke it in the FROM list of a SELECT statement thus:

select * 
  from table ( 
         cast ( lookups_fn() 
           as lookups_tab ) );

This allows a table to be synthesized by computation. For example, the function might call Utl_File procedures (to parse data that can't be handled by the SQL*Loader utility or by the external table feature), or it might call C routines (via the callout framework) that access arbitrary external data sources. Or it might access database tables and perform transformations that can't be expressed with pure SQL and SQL functions. The SELECT statement can be used to define a view, and/or combined with other tables in the FROM list in an arbitrarily complex SQL statement.

A table function, then, is a PL/SQL function that can be invoked in the FROM clause of a SQL SELECT clause. We'll see later that a table function that exploits new Oracle 9i functionality, which we expect all table functions to do, can only be invoked in the FROM clause of a SQL SELECT clause.

Pipelined Table Functions: New in Oracle 9i

The preceding functionality is available pre-Oracle 9i. However, it has the limitation that the function must run to completion, storing all the rows it computes in the PL/SQL table before even the first row can be delivered. (There are other limitations, discussed shortly.) Oracle 9i introduces the pipelined construct that allows the procedure to be rewritten as shown in Example 5.

Example 5. A pipelined function.

create or replace function lookups_fn return lookups_tab
  pipelined
is
  v_row lookup_row;
begin
  for j in 1..10
  loop
    v_row := 
      case j
        when 1 then lookup_row ( 1, 'one' )
        ...
        when 7 then lookup_row ( 7, 'seven' )
        else        lookup_row ( j, 'other' )
      end;
    pipe row ( v_row );
  end loop;
  return;
end lookups_fn;

Thus each row is delivered as soon as it's ready, so that the response time characteristics of a table function are symmetrical with those of a row source based on a table scan or an index scan. (For performance, the PL/SQL runtime system delivers the rows from a pipelined table function in batches.)

Note: The procedure body now mentions only rows (that is, not the table), and the table is just implied by the return type. (For elegance, the IF construct has been replaced with the new CASE formulation.) The same syntax as shown earlier can be used to select from the table function, but it can now be simplified thus:

select * from table ( Lookups_Fn );

The invocation will be written Lookups_Fn() in the following to emphasize its status as a function.

Oracle 9i also introduces the possibility to create a table function that returns a PL/SQL type (that is, one that's defined in a PL/SQL block rather than at schema level), as shown in Example 6.

Example 6. A table function that returns a PL/SQL type.

create or replace package My_Types is
  type lookup_row is record ( idx number, text varchar2(20) );
  type lookups_tab is table of lookup_row;
end My_Types;

create or replace function Lookups_Fn return My_Types.lookups_tab
  pipelined
is
  v_row My_Types.lookup_row;
begin
  for j in 1..10
  loop
    case j
      when 1 then v_row.idx := 1; v_row.text := 'one';
      ...
      when 7 then v_row.idx := 7; v_row.text := 'seven';
      else        v_row.idx := j; v_row.text := 'other';
    end case;
    pipe row ( v_row );
  end loop;
  return;
end Lookups_Fn;

In the limit, a PL/SQL type may be defined in the declare section of an anonymous block and hence have no persistence. However, to be useful in connection with table functions, the PL/SQL types must be declared in a package, and so when discussing table functions, they're usually referred to as package-level types (in contrast to schema-level types).

Note: A table function that returns a package-level type must be pipelined. Moreover, the simpler SELECT syntax (without the CAST) must be used.

Piping Data from One Table Function to the Next

Also new to Oracle 9i, a table function may now be defined with an input parameter of type ref cursor and invoked with a cursor expression as the actual parameter. Consider the code shown in Example 7.

Example 7. A transformative pipelined function.

create or replace function Mappings_Fn ( p_input_rows in sys_refcursor )
  return My_Types.lookups_tab
  pipelined
is
  v_in_row  My_Types.lookup_row;
  v_out_row My_Types.lookup_row;
begin
/*
  The following causes...
  PLS-00361: IN cursor 'P_INPUT_ROWS' cannot be OPEN'ed
  (The system opens the cursor on invoking the function.)
*/
--open p_input_rows;
  loop
    fetch p_input_rows into v_in_row;
    exit when p_input_rows%notfound;

    case v_in_row.idx
      when 1 then v_out_row.idx :=  1*2; v_out_row.text := 'was one';
      when 2 then v_out_row.idx :=  2*3; v_out_row.text := 'was TWO';
      when 3 then v_out_row.idx :=  3*4; v_out_row.text := 'was three';
      when 4 then v_out_row.idx :=  4*5; v_out_row.text := 'was FOUR';
      when 5 then v_out_row.idx :=  5*6; v_out_row.text := 'was five';
      when 6 then v_out_row.idx :=  6*7; v_out_row.text := 'was SIX';
      when 7 then v_out_row.idx :=  7*8; v_out_row.text := 'was seven';
      else        v_out_row.idx :=
                        v_in_row.idx*10; v_out_row.text := 'was other';
    end case;
    pipe row ( v_out_row );
  end loop;
  close p_input_rows;
  return;
end Mappings_Fn;

Suppose t is a table that supports a select list compatible with My_Types.lookup_row. We can now invoke the table function thus:

select * 
  from table ( 
     mappings_fn ( 
        cursor ( 
         select idx, text 
         from t ) ) );

Of course, t might have been a view defined thus:

create or replace view t 
as
  select * from table ( lookups_fn() );

which implies the more compact syntax shown in Example 8.

Example 8. Passing a query as an argument.

create or replace view v as
  select *
    from table ( Mappings_Fn ( cursor ( select * from table ( Lookups_Fn() ) ) ) );

Data can be piped from one to the next of an arbitrary number of table functions daisy- chained in succession. And, due to the pipelining feature, storage of intermediate results is avoided. Table functions can thus be used to implement the Extraction, Transformation, and Load (a.k.a. ETL) operation for building a data warehouse from OLTP data. In the limit, the extraction table function would access a foreign data source as discussed earlier.

The "Young Managers" Scenario Revisited: Table Function Approach

We can now use yet another approach! The complete solution can be implemented in a table function. This has the usability advantage of keeping all the logic in one place, and the performance advantage of invoking the function only once rather than once per row in the table (see Example 9). This was derived "mechanically" simply by creating an appropriate PL/SQL table type and by creating the block as a pipelined function to return that type, substituting pipe row ( manager_employee_id ) for insert into young_managers values ( manager_employee_id ).

Example 9. Using a table function.

create or replace package My_Types is
  type employee_ids_tab is table of employees.employee_id%type;
end My_Types;

create or replace function Young_Managers_Fn
  return My_Types.employee_ids_tab
  pipelined
is
  cursor managers is
    select
      employee_id, hire_date,
      cursor (
              select hire_date
                from employees reports
                where reports.manager_id = managers.employee_id
             )
      from employees managers;

  manager_employee_id       employees.employee_id%type;
  manager_hire_date         employees.hire_date%type;
  reports                   sys_refcursor;
  type report_hire_date_t   is table of employees.hire_date%type
                               index by binary_integer;
  report_hire_dates         report_hire_date_t;
  before integer; after integer;
begin
  open managers;
  loop
    before:=0; after:=0;
    fetch managers into manager_employee_id, manager_hire_date, reports;
    exit when managers%notfound;
    fetch reports bulk collect into report_hire_dates;
    if report_hire_dates.count > 0
    then
      for j in report_hire_dates.first..report_hire_dates.last
      loop
        case report_hire_dates(j) < manager_hire_date
          when true then before:=before+1;
          else           after:=after+1;
        end case;
      end loop;
    end if;
    if before > after then
      pipe row ( manager_employee_id ); end if;
  end loop;
  close managers;
  return;
end Young_Managers_Fn;

create or replace view young_managers as
  select column_value manager_employee_id from table ( Young_Managers_Fn() );

The function can be made more general by giving it a ref cursor input parameter and by passing in the cursor expression as the actual parameter (see Example 10). This would allow it to be "pointed at" any table that expressed a hierarchy where both parent and child have a date.

Example 10. Using a table function with a ref cursor Input parameter.

create or replace function Young_Managers_Fn ( managers in sys_refcursor )
  return My_Types.employee_ids_tab
  pipelined
is
  manager_employee_id       employees.employee_id%type;
  manager_hire_date         employees.hire_date%type;
  reports                   sys_refcursor;
  type report_hire_date_t   is table of employees.hire_date%type
                               index by binary_integer;
  report_hire_dates         report_hire_date_t;
  before integer; after integer;
begin
  loop
    before:=0; after:=0;
    fetch managers into manager_employee_id, manager_hire_date, reports;
    exit when managers%notfound;
    fetch reports bulk collect into report_hire_dates;
    if report_hire_dates.count > 0
    then
      for j in report_hire_dates.first..report_hire_dates.last
      loop
        case report_hire_dates(j) < manager_hire_date
          when                                    true then before:=before+1;
          else                                              after:=after+1;
        end case;
      end loop;
    end if;
    if before > after then
      pipe row ( manager_employee_id ); end if;
  end loop;
  close managers;
  return;
end Young_Managers_Fn;

select column_value manager_employee_id from table
 (
   Young_Managers_Fn
     (
       cursor
         (
            select
              employee_id, hire_date,
              cursor (
                        select hire_date
                          from employees reports
                          where reports.manager_id = 
                            managers.employee_id
                     )
              from employees managers
         )
     )
 );

Note: An attempt to create a view as this SELECT statement currently fails with "ORA-22902: CURSOR expression not allowed," where the exception is raised because the SELECT statement that's the argument of the CURSOR formal parameter to the table function itself has a cursor expression (a.k.a. cursor subquery). A view can be created when the SELECT statement doesn't have a cursor subquery (see the earlier Mappings_Fn example).

Fanout: Using Table Functions with Side Effects

Sometimes the specification for the transformation to be implemented as a table function explicitly excludes source data with certain characteristics. In such cases, it's useful to report on the excluded source data and often most convenient to direct the report to the database for further analysis. A table function may do DML, provided that this is done within an autonomous transaction, as shown in Example 11.

Example 11. Using autonomous transactions to allow for intended "side effects."

create or replace function lookups_fn_with_side_effect
  return my_types.lookups_tab
  pipelined
/*
  uses...
  create table exclusions ( n number );
*/
is
  pragma autonomous_transaction;
  v_row My_Types.lookup_row;
begin
  for j in 1..15
  loop
    case
      when j < 11 then
        case j
          when 1 then v_row.idx := 1; v_row.text := 'one';
          when 2 then v_row.idx := 2; v_row.text := 'TWO';
          when 3 then v_row.idx := 3; v_row.text := 'three';
          when 4 then v_row.idx := 4; v_row.text := 'FOUR';
          when 5 then v_row.idx := 5; v_row.text := 'five';
          when 6 then v_row.idx := 6; v_row.text := 'SIX';
          when 7 then v_row.idx := 7; v_row.text := 'seven';
          else        v_row.idx := j; v_row.text := 'other';
        end case;
        pipe row ( v_row );
      else
        insert into exclusions values ( j );
    end case;
  end loop;
  commit;
  return;
end Lookups_Fn_With_Side_Effect;

Parallelizing Table Function Execution: New in Oracle 9i

It's beyond the scope of this article to describe the details of Oracle's parallel query feature. Suffice it to say that when certain environment conditions are met (especially a hardware environment that supports multiple concurrently executing processes making concurrent disk accesses, and a user environment close to single-user) and when the objects referenced in a query have appropriate parallel attributes, then the elapsed time for long-running queries can be cut in direct proportion to the number of available CPUs. This is especially significant in Decision Support Systems (a.k.a. DSS) both at query time and in the Extraction, Transformation, and Load (a.k.a. ETL) operations to populate them.

Oracle 9i introduces table function features to allow their execution to be parallelized.

These features require (with one small exception, discussed shortly) that the table function has exactly one strongly typed ref cursor input parameter. Let's take a look at the different ways in which table functions can be designed for parallel execution.

Special Case: Function Behavior is Independent of Input Data Partitioning

Consider a function that processes each row from its input cursor individually. (Such a transformation, which generates two or more output rows from each input row- generically referred to as piviotting-benefits particularly from a table function implementation.) The syntax to parallelize this is straightforward and is shown in Example 12.

Example 12. Parallel execution for arbitrary partitioning of data.

create or replace function Rowwise_Xform_Fn ( 
      p_input_rows in SYS_REFCURSOR )
  return My_Types.xforms_tab
  pipelined
  parallel_enable ( partition p_input_rows by any )
is
  v_in_row  My_Types.input_row;
  v_out_row My_Types.xform_row;
begin
  loop
    fetch p_input_rows into v_in_row;
    exit when p_input_rows%notfound;
    v_out_row.n := v_in_row.n*2; 
    v_out_row.typ := 'a';
    pipe row ( v_out_row );

    v_out_row.n := v_in_row.n*3; 
    v_out_row.typ := 'b';
    pipe row ( v_out_row );
  end loop;
  close p_input_rows;
  return;
end Rowwise_Xform_Fn;

See Example 13 for the complete working example. They keyword any expresses the programmer's assertion that the results are independent of the order in which the function gets the input rows. When this keyword is used, the runtime system randomly partitions the data among the query slaves. This keyword is appropriate for use with functions that take in one row, manipulate its columns, and generate output row(s) based on the columns of this row only. (Of course, if this assertion doesn't hold, the output won't be predictable.) This is the small exception referred to earlier: The input ref cursor need not be strongly typed to be partitioned by any.

Example 13. Algorithm is independent of the ordering of the source rows.

create or replace package My_Types is
  type input_row  is record ( n number );
  type cur_t      is ref cursor return input_row;

  type xform_row  is record ( n number, typ char(1) );
  type xforms_tab is table of xform_row;
end My_Types;

create table t ( n number );
begin
  for j in 1..1000
    loop insert into t ( n ) values (  j  ); end loop;
  commit;
end;

create or replace function Rowwise_Xform_Fn ( 
    p_input_rows in My_Types.cur_t )
  return My_Types.xforms_tab
  pipelined
  parallel_enable ( partition p_input_rows by any )
is
  v_in_row  My_Types.input_row;
  v_out_row My_Types.xform_row;
begin
  loop
    fetch p_input_rows into v_in_row;
    exit when p_input_rows%notfound;
    v_out_row.n := v_in_row.n * 2; v_out_row.typ := 'a';
    pipe row ( v_out_row );
    v_out_row.n := v_in_row.n * 3; v_out_row.typ := 'b';
    pipe row ( v_out_row );
  end loop;
  close p_input_rows;
  return;
end Rowwise_Xform_Fn;

select * from table ( 
    Rowwise_Xform_Fn ( cursor ( select n from t ) ) )
  where rownum < 11;

The ability to exploit the parallel potential of a table function depends on whether the source can be parallelized.

General Case: Function Behavior Depends on Input Data Partitioning

Consider a transformation along the lines of:

select avg ( salary ), department_id 
  from employees 
 group by department_id;

where the aggregation operation to be performed on the set of salaries for a given department is arbitrarily complex such that a classical SQL implementation is impossible, slow by virtue of a function invocation for each row of the source table, or prohibitively challenging to write and debug. For example, it might be that the cost to the employer of paying a given salary depends on the hire date because of changes in benefits packages that affect only employees hired after the date of change. This is illustrated in Example 14, but to avoid obscuring it with a complicated algorithm, the aggregation is simply the sum for the salary for each distinct department. This has the general form shown in Example 15.

Example 14. Algorithm requires only that the source rows are clustered.

Note: In order to avoid having to make the algorithm distractingly complex, the following DELETE should be issued:

delete from employees where department_id is null;

before continuing thus:

create or replace package My_Types is
  type dept_sal_row   is record 
                        ( sal number(8,2), dept number(4) );
  type cur_t          is ref cursor return dept_sal_row;

  type dept_sals_tab  is table of dept_sal_row;
end My_Types;
 
create or replace function Aggregate_Xform 
  ( p_input_rows in My_Types.cur_t )
  return My_Types.dept_sals_tab
  pipelined
--[ cluster / order ] p_input_rows by (dept)
--parallel_enable
--  ( partition p_input_rows by [ hash / range] (dept) )
is
  g_in_row        My_Types.dept_sal_row;
  g_out_row       My_Types.dept_sal_row;
  g_first_time    boolean := true;
  g_last_dept     number := null;
  g_got_a_row     boolean;
  g_new_dept      boolean;
  g_current_dept  employees.department_id%type;
  g_prev_dept     employees.department_id%type;
  v_total_sal     number;

  procedure Get_Next_Row is begin
    fetch p_input_rows into g_in_row;
    g_got_a_row := not p_input_rows%notfound;
    if g_got_a_row
    then
      case g_first_time
        when true then
          g_first_time := false;
          g_new_dept   := false;
        else
          g_new_dept := g_prev_dept <> g_in_row.dept;
      end case;
      g_prev_dept := g_in_row.dept;
    end if;
    return;
  end Get_Next_Row;

  function Got_Next_Dept return boolean is begin
    g_current_dept := g_in_row.dept;
    g_new_dept := false;
    return g_got_a_row;
  end Got_Next_Dept;

  function Got_Next_Row_In_Dept return boolean is begin
    return ( not g_new_dept ) and g_got_a_row;
  end Got_Next_Row_In_Dept;

begin
  Get_Next_Row();
  while Got_Next_Dept()
  loop
    v_total_sal := 0;
    while Got_Next_Row_In_Dept()
    loop
      v_total_sal := v_total_sal + g_in_row.sal;
      Get_Next_Row();
    end loop;
    g_out_row.sal := v_total_sal; 
    g_out_row.dept := g_current_dept;
    pipe row ( g_out_row );
  end loop;
  close p_input_rows;
  return;
end Aggregate_Xform;

Example 15. General form of a function performing an aggregate transformation.

create or replace function Aggregate_Xform ( 
    p_input_rows in My_Types.cur_t )
  return My_Types.dept_sals_tab
  pipelined
is
  ...
begin
  Get_Next_Row();
  while Got_Next_Dept() /* relies on assumption that
                           all rows for given dept are 
                           delivered consecutively */
  loop
    v_total_sal := 0;
    while Got_Next_Row_In_Dept()
    loop
      v_total_sal := v_total_sal + g_in_row.sal;
      Get_Next_Row();
    end loop;
    g_out_row.sal := v_total_sal; 
    g_out_row.dept := g_current_dept;
    pipe row ( g_out_row );
  end loop;
  close p_input_rows;
  return;
end Aggregate_Xform;

Given that the input rows will be partitioned between different slaves, the integrity of the algorithm requires that all the rows for a given department go to the same slave, and that all these rows are delivered consecutively. (Strictly speaking, the requirement for consecutive delivery is negotiable, but the design of the algorithm to handle this case would need to be much more elaborate. For that reason, Oracle commits to consecutive delivery.) We use the term clustered to signify this type of delivery, and cluster key for the column (in this case, "department") on which the aggregation is done. But significantly, the algorithm does not care in what order of cluster key it receives each successive cluster, and Oracle doesn't guarantee any particular order here.

This allows the possibility of a quicker algorithm than if rows were required to be clustered and delivered in order of the cluster key. It scales as order N rather than order N.log(N), where N is the number of rows. The syntax to accomplish this is shown here:

create or replace function aggregate_xform ( 
    p_input_rows in my_types.cur_t )
  return my_types.dept_sals_tab
  pipelined
  cluster p_input_rows by (dept)
  parallel_enable
    ( partition p_input_rows by hash (dept) )
is ...

We can choose between hash (dept) and range (dept) depending on what we know about the distribution of the values. (Hash will be quicker than range and is the natural choice to be used with cluster... by.) Here, to be partitioned by a specified column, the input ref cursor must be strongly typed. Cluster... by isn't allowed without parallel_enable (partition... by).

Note: At version 9.0.1, it's necessary to include ORDER BY on the cluster key in the SELECT used to invoke the table function as follows, in order to preserve correctness of behavior, but this restriction will be removed when the order N clustering algorithm is productized:

select * from table (
  Aggregate_Xform (
     cursor (
       select salary, department_id 
         from employees
         where department_id is not null
         order by department_id   ) ) );

Order By Versus Cluster By

This alternative syntax is also allowed:

create or replace function my_fn ( 
      p_input_rows in my_types.cur_t )
  return my_types.items_tab
  pipelined
  order p_input_rows by (c1)
  parallel_enable
    ( partition p_input_rows by range (c1) )
is...

This means that those rows that are delivered to a particular slave as directed by partition... by will be locally sorted by that slave, thus parallelizing the sort. Therefore, there should be no ORDER BY in the SELECT used to invoke the table function. (To have one would subvert the attempt to parallelize the sort.) Thus it's natural to use the range option together with the order by option. This will be slower than cluster by, and so should be used only when the algorithm depends on it.

Note: The cluster... by construct can't be used together with the order... by in the declaration of a table function. This means that an algorithm that depends on clustering on one key, c1, and then on ordering within the set row for a given value of c1 by, say, c2 would have to be parallelized by using the order... by in the declaration in the table function. Here, we'd use:

create or replace function median ( 
     p_input_rows in my_types.cur_t )
  return my_types.items_tab
  pipelined
  order p_input_rows by (c1,c2)
  parallel_enable
    ( partition p_input_rows by range (c1) )
is...

The current restriction preventing using cluster... by together with order... by implies no loss of functionality, but only a missed opportunity to leverage the order N sort.

Caution: It's possible to design an algorithm for a table function that would deliver a different number of rows according to the degree of parallelism. The simplest example is a function that returns a table of NUMBER representing the count of the rows its input cursor delivered. A non-parallelized version would deliver just one row giving count(*) for the input table. A parallelized version would deliver N rows (where N is the degree of parallelism), the sum of whose values would give count(*) for the input table. However, this breaks the parallel query abstraction. Oracle recommends against programming this way.

Syntax for Table Function Based on Schema-Level Type

When a table function is written to return a schema-level type, the syntax required to invoke it is somewhat verbose. For completeness, it's illustrated here.

The Lookups_Fn and Mappings_Fn Example

(This example has been rewritten to return schema-level types)

Since the query syntax for an object table is rather verbose, we recap it here using a table.

create table t of lookup_row;
insert into t values ( lookup_row (  1, 'one'   ) );
insert into t values ( lookup_row (  2, 'TWO'   ) );
insert into t values ( lookup_row (  3, 'three' ) );
insert into t values ( lookup_row (  4, 'FOUR'  ) );
insert into t values ( lookup_row (  5, 'five'  ) );
insert into t values ( lookup_row (  6, 'SIX'   ) );
insert into t values ( lookup_row (  7, 'seven' ) );
insert into t values ( lookup_row (  8, 'other' ) );
insert into t values ( lookup_row (  9, 'other' ) );
insert into t values ( lookup_row ( 10, 'other' ) );
commit;

/* this is how an object query should be written */
select VALUE(a) rec from t a;

/* because it's verbose, it's convenient to define a view */
create or replace view v as
  select value(a) rec from t a;

/* test the view */
select * from v;

Now the example proper:

create type lookup_row as 
  object ( idx number, text varchar2(20) );

create type lookups_tab as table of lookup_row;

create or replace function Lookups_Fn
  return lookups_tab
  pipelined
is
  v_row lookup_row;
begin
  for j in 1..10
  loop
    v_row := 
      case j
        when 1 then lookup_row ( 1, 'one' )
        when 2 then lookup_row ( 2, 'TWO' )
        when 3 then lookup_row ( 3, 'three' )
        when 4 then lookup_row ( 4, 'FOUR' )
        when 5 then lookup_row ( 5, 'five' )
        when 6 then lookup_row ( 6, 'SIX' )
        when 7 then lookup_row ( 7, 'seven' )
        else        lookup_row ( j, 'other' )
      end;
    pipe row ( v_row );
  end loop;
  return;
end Lookups_Fn;

Note the syntax of the query. Since the table function returns an object, it follows from the syntax against an object table above. Again, it's convenient to encapsulate it in a view:

create or replace view lookups as
  select value(a) rec
    from table
      (
       cast ( Lookups_Fn() as lookups_tab )
      ) a;

select * from lookups; 

create or replace function Mappings_Fn 
  ( p_input_rows in sys_refcursor )
  return lookups_tab
  pipelined
is
  v_in_row  lookup_row;

  /* always initialize an object type using a 
     type constructor or user defined constructor */
  v_out_row lookup_row := lookup_row( 1, 'x' );
begin
  loop
    fetch p_input_rows into v_in_row;
    exit when p_input_rows%notfound;
    case v_in_row.idx
      when 1 then v_out_row.idx := 1*2; 
        v_out_row.text := 'was one';
      when 2 then v_out_row.idx := 2*3; 
        v_out_row.text := 'was TWO';
      when 3 then v_out_row.idx := 3*4; 
        v_out_row.text := 'was three';
      when 4 then v_out_row.idx := 4*5; 
        v_out_row.text := 'was FOUR';
      when 5 then v_out_row.idx := 5*6; 
        v_out_row.text := 'was five';
      when 6 then v_out_row.idx := 6*7; 
        v_out_row.text := 'was SIX';
      when 7 then v_out_row.idx := 7*8; 
        v_out_row.text := 'was seven';
      else        v_out_row.idx :=
                  v_in_row.idx*10; 
                  v_out_row.text := 'was other';
    end case;
    pipe row ( v_out_row );
  end loop;
  close p_input_rows;
  return;
end Mappings_Fn;

Note the syntax of the query. It's most compactly expressed using the views v or lookups defined earlier.

select value(b)
  from table
    (
      cast
        (
          Mappings_Fn
            (
              cursor ( select * from lookups )
            )
          as lookups_tab
        )
    ) b;

For completeness, here's how it looks without the view:

select value(b) from table
  (
    cast
      (
        Mappings_Fn
          (
             cursor
                ( select value(a) from table
                    (
                      cast ( Lookups_Fn() as lookups_tab )
                    ) a
                )
           )
         as lookups_tab
      )
  ) b;

We can create a VIEW mapped_lookups with this SELECT statement, and then access it without restriction from PL/SQL. For example:

declare
  cursor table_fn_cur is
    select * from mapped_lookups;
  rec lookup_row;
begin
  open table_fn_cur;
  loop
    fetch table_fn_cur into rec;
    exit when table_fn_cur%notfound;
    Print ( rec.idx, rec.text ); 
  end loop;
  close table_fn_cur;
end;

Note, however, that this simpler syntax doesn't work here:

for rec in ( select * from mapped_lookups ) loop

Business Benefits of Table Functions and Cursor Expressions

Table functions and cursor expressions expand in important new ways our ability as PL/SQL developers both to improve performance and reuse critical business logic. These features are particularly important if you rely on parallelization in your application and want to fully exploit PL/SQL-based functionality.

With Oracle 9i you can now nest collections within collections, also referred to as support for "multi-level collections." In the next article, Steven Feuerstein and Bryn Llewellyn show how to use multi-level collections and demonstrate with some examples.

This article was originally published in the May 2002 issue of Oracle Professional. The material in Feuerstein's articles--and those he cowrote with Bryn Llewellyn--are based on Oracle Corporation white papers originally prepared by Llewellyn for Oracle OpenWorld 2001 in San Francisco and OracleWorld Copenhagen in June 2002, and Feuerstein's book, Oracle PL/SQL Programming, 3rd Edition.

Steven Feuerstein is considered one of the world's leading experts on the Oracle PL/SQL language.

Bryn Llewellyn is PL/SQL Product Manager, Database and Application Server Technologies Development Group, at Oracle Corporation Headquarters.


O'Reilly & Associates recently released (September 2002) Oracle PL/SQL Programming, 3rd Edition.


Return to the O'Reilly Network.

Copyright © 2007 O'Reilly Media, Inc.