|
Table Functions and Cursor Expressionsby Steven Feuerstein, coauthor of Oracle PL/SQL Programming, 3rd Edition and Bryn Llewellyn01/22/2003 |
Editor's note: In their last article, Steven Feuerstein and Bryn Llewellyn covered native compilation of PL/SQL, CASE statements and CASE expressions, and bulk binding in native dynamic SQL. In this article, the pair introduces cursor expressions and then shows how those expressions can be utilized with table functions. (The material in this article is based on an Oracle Corporation white paper originally presented by Bryn Llewellyn at Oracle OpenWorld 2001 in San Francisco.)
Cursor expressions (sometimes known as cursor subqueries) are an element of the SQL
language. In pre-Oracle 9i they were supported in SQL and by certain programming
environments but not by PL/SQL. Oracle 9i introduces PL/SQL support for cursor
expressions. For example, a cursor expression can now be used in the SELECT statement used to open a PL/SQL cursor, and manipulated appropriately thereafter. It can also be
used as an actual parameter to a PL/SQL procedure or function, which has great
significance in connection with table functions.
|
Related Reading
|
Table functions were also supported (in rudimentary form) in pre-Oracle 9i, but a
number of major enhancements have been made at Oracle 9i. A table function can now be
written to deliver rows in pipeline fashion as soon as they're computed, dramatically
improving response time in a "first-rows" scenario. A function can now be written to
accept a SELECT statement as input, allowing an indefinite number of transformations to be daisy-chained, avoiding the need for storage of intermediate results. And it can now be
written so that its computation can be parallelized to leverage Oracle's parallel query
mechanism.
The enabling of parallel execution of a table function means that it's now practical to implement the ETL (Extraction, Transformation, Load) phase of data warehouse applications in PL/SQL. (A pre-Oracle 9i table function caused serialization.)
Let's take a look first at cursor expressions.
Consider the following task: List the department names, and, for each department, list the names of the employees in that department. It can be simply implemented by a classical sequential programming approach, as shown in Example 1.
Example 1. Use of nested FOR loops.
begin
for department in (select department_id, department_name
from departments
order by department_name)
loop
show (department.department_name);
for employee in (select employee_id, last_name
from employees
where department_id = department.department_id
order by last_name)
loop
show (employee.last_name);
end loop;
end loop;
end;
A note on the code: Whenever two authors collaborate, they determine how best to share the work and present their contributions. We have two different formatting styles: Steven prefers to capitalize key and reserved words, while lowercasing everything else; Bryn likes to present everything in lowercase. As Bryn wrote the vast majority of the code for this article, we'll use his preferred style.
The CURSOR expression, introduced in Oracle 8i but not available within PL/SQL, allows you to more concisely express the desired result set in a single query, such as:
select
department_name,
cursor (
select last_name
from employees e
where e.department_id = d.department_id
order by last_name
) the_employees
from departments d
order by department_name;
Now with Oracle 9i, you can take advantage of the CURSOR expression in your PL/SQL; Example 2 shows a rewrite of the original nested FOR loop using these
expressions. Though this approach requires more lines of code, and is arguably less easy
to proofread than the sequentially programmed implementation, it has this advantage:
There's only one SQL statement, and so it can be optimized more effectively than (what
the SQL engine sees as) two unconnected SQL statements. Table 1 below presents an
explanation of the more interesting code lines in Example 2.
Example 2. Use of CURSOR expression to retrieve all data in a single query.
1 declare
2 cursor the_departments is
3 select
4 department_name,
5 cursor (
6 select last_name
7 from employees e
8 where e.department_id = d.department_id
9 order by last_name
10 )
11 from departments d
12 where department_name in ( 'executive', 'marketing' )
13 order by department_name;
14
15 v_department_name departments.department_name%type;
16 the_employees sys_refcursor;
17
18 type employee_last_names_t is table of employees.last_name%type
19 index by binary_integer;
20
21 v_employee_last_names employee_last_names_t;
22 begin
23 open the_departments;
24
25 loop
26 fetch the_departments into v_department_name, the_employees;
27 exit when the_departments%notfound;
28
29 show (v_department_name);
30
31 fetch the_employees bulk collect into v_employee_last_names;
32
33 for indx in v_employee_last_names.first .. V_employee_last_names.last
34 loop
35 show (v_employee_last_names (indx));
36 end loop;
37 end loop;
38
39 close the_departments;
40 end;
Table 1. Explanation of code in Example 2.
| Line | Description |
| 5 | CURSOR expression inside the query means that the second element in the
SELECT list for the cursor is actually a cursor or result set in its own right. |
| 16 | Use of the Oracle 9i, pre-defined weak REF CURSOR type, SYS_REFCURSOR. This way, we don't have to declare our own TYPE xxx IS REF CURSOR. These types are used to declare cursor variables (the_employees is a cursor variable). |
| 26 | Our fetch statement fetches into the cursor variable. This means that it now points to cursor, which in turn identifies a result set of its own. |
| 31 | This cursor (the_employees) is opened implicitly by Oracle, so we can
immediately perform a fetch (in this case, a bulk fetch) to extract all the
employee last names into our local collection (defined on lines 18-19). |
Use a numeric FOR loop to scan through the bulk-collected names and process them. |
|
A cursor variable (that is, a variable of type ref cursor) points to an actual cursor, and may be used as a formal parameter to a PL/SQL procedure or function. A cursor expression
defines an actual cursor. (Both these statements are true pre-Oracle 9i.) So we'd expect that
it would be possible to invoke a PL/SQL procedure or function that has a formal
parameter of type ref cursor with a cursor expression as its actual parameter, thus:
My_Function ( cursor ( select my_col from my_tab ) )
In fact, this wasn't allowed under any circumstances pre-Oracle 9i (attempts would result in the ORA-22902 exception). New in Oracle 9i, it is now allowed under certain circumstances--namely, when the function (it can't be a procedure) is invoked in a top- level SQL statement.
Given a function that can be invoked as follows:
declare
the_cursor sys_refcursor;
n number;
begin
open the_cursor for
select my_col from my_tab;
n := My_function (the_cursor);
close the_cursor;
end;
it can now be called directly within the query, like this:
select 'my_function' my_function
from dual
where
my_function (
cursor (
select my_col
from my_tab ) )
= 1;
or:
select 'my_function' my_function
from dual
order by
my_function (
cursor (
select my_col
from my_tab ) );
Most significantly, this syntax is now allowed in the invocation of a table function in
the FROM list of a SELECT statement, which we'll explore later in the article.
Consider the requirement to find those managers in the employees table, the majority of
whose direct reports were hired before the manager. The algorithm depends on finding the
direct reports for each manager and comparing the number who were hired before him
with the number who were hired after him. This can be programmed straightforwardly in
PL/SQL using classical techniques. (Note that, seeking to use enhanced Oracle 9i functionality, this is implemented using a single SQL SELECT that has
a cursor subquery for the reports of a given manager.) This approach allows the
production of a report, or as is illustrated, populating a table with the results.
But suppose the requirement is more subtle: to create a VIEW to represent managers as specified, so that it can be leveraged in ad hoc queries representing the current state of the underlying data. In fact, the requirement in this scenario can be implemented in pure SQL using only SQL functions such as SUM and DECODE. There are some rules that are too complex to implement by DECODE, in which case the user could write his own function.
But this approach, though it works, feels back-to-front! Unlike the classical approach described previously, it doesn't model the simple statement of the algorithm, and is therefore hard to write and to proofread. A more comfortable approach is to define a view as follows:
create view young_managers as
select ...
from employees managers
where most_reports_before_manager(
< stuff for this manager > ) = 1;
We can do this classically like so:
create view young_managers as
select ...
from employees managers
where most_reports_before_manager
(
managers.employee_id, managers.hire_date
) = 1;
or by passing a cursor expression as the actual parameter to a function whose formal
parameter is of type REF CURSOR (see Example 3), giving us something more like this:
create view young_managers as
select ...
from employees managers
where most_reports_before_manager
(
cursor ( < select statement > ),
managers.hire_date
) = 1;
Example 3. Using a function with a ref cursor parameter in a WHERE clause.
create or replace function Most_Reports_Before_Manager (
report_hire_dates_cur in sys_refcursor,
manager_hire_date in date )
return number
is
type report_hire_date_t is table of employees.hire_date%type
index by binary_integer;
report_hire_dates report_hire_date_t;
before integer:=0; after integer:=0;
begin
fetch report_hire_dates_cur bulk collect into report_hire_dates;
if report_hire_dates.count > 0
then
for j in report_hire_dates.first..report_hire_dates.last
loop
case report_hire_dates(j) < manager_hire_date
when true then before:=before+1;
else after:=after+1;
end case;
end loop;
end if;
case before > after
when true then return 1;
else return 0;
end case;
end Most_Reports_Before_Manager;
create or replace view young_managers as
select managers.employee_id manager_employee_id
from employees managers
where Most_Reports_Before_Manager
(
cursor ( select reports.hire_date from employees reports
where reports.manager_id = managers.employee_id
),
managers.hire_date
) = 1;
The Example 3 approach isn't possible before Oracle 9i. Its advantage over the approach of using a classical function in a WHERE clause is marginal
rather than dramatic: It offers greater potential for reuse in that its logic is expressed in
terms of, and depends only on, the select list for an arbitrary SELECT, whereas the
classical approach hard-codes the SELECT. And, since there's only one SQL statement,
this can be optimized more effectively than (what the SQL engine sees as) two
unconnected SQL statements (as discussed earlier).
The dramatic benefit of the new Oracle 9i feature allowing a cursor expression as an actual parameter to a PL/SQL function comes in connection with table functions, which we'll now explore.
Suppose we have two schema-level types, a tuple analogous to a table row and a table of these, defined as follows:
create type lookup_row as
object ( idx number, text varchar2(20) );
create type lookups_tab as
table of lookup_row;
We can then write a PL/SQL function that returns an instance of the table as shown in Example 4. While the example doesn't reflect a "real-world" scenario, it's intended to emphasize the fact that you can model arbitrarily complex logic within the function, something that can be quite the challenge in pure SQL.
Example 4. A function that returns a collection of information.
create or replace function lookups_fn return lookups_tab
is
v_table lookups_tab;
begin
/*
to extend a nested table, you must use the built-in
procedure extend, but to extend an index-by table,
you just specify larger subscripts.
*/
v_table := lookups_tab (lookup_row (1, 'one'));
for j in 2 .. 9
loop
v_table.extend;
if j = 2
then
v_table (j) := lookup_row (2, 'two');
elsif j = 3
then
v_table (j) := lookup_row (3, 'three');
elsif j = 4
then
v_table (j) := lookup_row (4, 'four');
elsif j = 5
then
v_table (j) := lookup_row (5, 'five');
elsif j = 6
then
v_table (j) := lookup_row (6, 'six');
elsif j = 7
then
v_table (j) := lookup_row (7, 'seven');
else
v_table (j) := lookup_row (j, 'other');
end if;
end loop;
return v_table;
end lookups_fn;
We can then invoke it in the FROM list of a SELECT statement thus:
select *
from table (
cast ( lookups_fn()
as lookups_tab ) );
This allows a table to be synthesized by computation. For example, the function might
call Utl_File procedures (to parse data that can't be handled by the SQL*Loader utility or
by the external table feature), or it might call C routines (via the callout framework) that
access arbitrary external data sources. Or it might access database tables and perform
transformations that can't be expressed with pure SQL and SQL functions. The SELECT
statement can be used to define a view, and/or combined with other tables in the FROM
list in an arbitrarily complex SQL statement.
A table function, then, is a PL/SQL function that can be invoked in the FROM clause
of a SQL SELECT clause. We'll see later that a table function that exploits new Oracle 9i
functionality, which we expect all table functions to do, can only be invoked in the FROM
clause of a SQL SELECT clause.
|
The preceding functionality is available pre-Oracle 9i. However, it has the limitation that the function must run to completion, storing all the rows it computes in the PL/SQL table before even the first row can be delivered. (There are other limitations, discussed shortly.) Oracle 9i introduces the pipelined construct that allows the procedure to be rewritten as shown in Example 5.
Example 5. A pipelined function.
create or replace function lookups_fn return lookups_tab
pipelined
is
v_row lookup_row;
begin
for j in 1..10
loop
v_row :=
case j
when 1 then lookup_row ( 1, 'one' )
...
when 7 then lookup_row ( 7, 'seven' )
else lookup_row ( j, 'other' )
end;
pipe row ( v_row );
end loop;
return;
end lookups_fn;
Thus each row is delivered as soon as it's ready, so that the response time characteristics of a table function are symmetrical with those of a row source based on a table scan or an index scan. (For performance, the PL/SQL runtime system delivers the rows from a pipelined table function in batches.)
Note: The procedure body now mentions only rows (that is, not the table), and the
table is just implied by the return type. (For elegance, the IF construct has been replaced
with the new CASE formulation.) The same syntax as shown earlier can be used to select
from the table function, but it can now be simplified thus:
select * from table ( Lookups_Fn );
The invocation will be written Lookups_Fn() in the following to emphasize its status
as a function.
Oracle 9i also introduces the possibility to create a table function that returns a PL/SQL type (that is, one that's defined in a PL/SQL block rather than at schema level), as shown in Example 6.
Example 6. A table function that returns a PL/SQL type.
create or replace package My_Types is
type lookup_row is record ( idx number, text varchar2(20) );
type lookups_tab is table of lookup_row;
end My_Types;
create or replace function Lookups_Fn return My_Types.lookups_tab
pipelined
is
v_row My_Types.lookup_row;
begin
for j in 1..10
loop
case j
when 1 then v_row.idx := 1; v_row.text := 'one';
...
when 7 then v_row.idx := 7; v_row.text := 'seven';
else v_row.idx := j; v_row.text := 'other';
end case;
pipe row ( v_row );
end loop;
return;
end Lookups_Fn;
In the limit, a PL/SQL type may be defined in the declare section of an anonymous block and hence have no persistence. However, to be useful in connection with table functions, the PL/SQL types must be declared in a package, and so when discussing table functions, they're usually referred to as package-level types (in contrast to schema-level types).
Note: A table function that returns a package-level type must be pipelined. Moreover,
the simpler SELECT syntax (without the CAST) must be used.
Also new to Oracle 9i, a table function may now be defined with an input parameter of type ref cursor and invoked with a cursor expression as the actual parameter. Consider the code shown in Example 7.
Example 7. A transformative pipelined function.
create or replace function Mappings_Fn ( p_input_rows in sys_refcursor )
return My_Types.lookups_tab
pipelined
is
v_in_row My_Types.lookup_row;
v_out_row My_Types.lookup_row;
begin
/*
The following causes...
PLS-00361: IN cursor 'P_INPUT_ROWS' cannot be OPEN'ed
(The system opens the cursor on invoking the function.)
*/
--open p_input_rows;
loop
fetch p_input_rows into v_in_row;
exit when p_input_rows%notfound;
case v_in_row.idx
when 1 then v_out_row.idx := 1*2; v_out_row.text := 'was one';
when 2 then v_out_row.idx := 2*3; v_out_row.text := 'was TWO';
when 3 then v_out_row.idx := 3*4; v_out_row.text := 'was three';
when 4 then v_out_row.idx := 4*5; v_out_row.text := 'was FOUR';
when 5 then v_out_row.idx := 5*6; v_out_row.text := 'was five';
when 6 then v_out_row.idx := 6*7; v_out_row.text := 'was SIX';
when 7 then v_out_row.idx := 7*8; v_out_row.text := 'was seven';
else v_out_row.idx :=
v_in_row.idx*10; v_out_row.text := 'was other';
end case;
pipe row ( v_out_row );
end loop;
close p_input_rows;
return;
end Mappings_Fn;
Suppose t is a table that supports a select list compatible with My_Types.lookup_row. We can now invoke the table function thus:
select *
from table (
mappings_fn (
cursor (
select idx, text
from t ) ) );
Of course, t might have been a view defined thus:
create or replace view t
as
select * from table ( lookups_fn() );
which implies the more compact syntax shown in Example 8.
Example 8. Passing a query as an argument.
create or replace view v as
select *
from table ( Mappings_Fn ( cursor ( select * from table ( Lookups_Fn() ) ) ) );
Data can be piped from one to the next of an arbitrary number of table functions daisy- chained in succession. And, due to the pipelining feature, storage of intermediate results is avoided. Table functions can thus be used to implement the Extraction, Transformation, and Load (a.k.a. ETL) operation for building a data warehouse from OLTP data. In the limit, the extraction table function would access a foreign data source as discussed earlier.
We can now use yet another approach! The complete solution can be implemented in a
table function. This has the usability advantage of keeping all the logic in one place, and
the performance advantage of invoking the function only once rather than once per row in
the table (see Example 9). This was derived "mechanically" simply by creating an appropriate PL/SQL table type and by creating the block as a pipelined function to return that type, substituting pipe row ( manager_employee_id ) for insert into young_managers values ( manager_employee_id ).
Example 9. Using a table function.
create or replace package My_Types is
type employee_ids_tab is table of employees.employee_id%type;
end My_Types;
create or replace function Young_Managers_Fn
return My_Types.employee_ids_tab
pipelined
is
cursor managers is
select
employee_id, hire_date,
cursor (
select hire_date
from employees reports
where reports.manager_id = managers.employee_id
)
from employees managers;
manager_employee_id employees.employee_id%type;
manager_hire_date employees.hire_date%type;
reports sys_refcursor;
type report_hire_date_t is table of employees.hire_date%type
index by binary_integer;
report_hire_dates report_hire_date_t;
before integer; after integer;
begin
open managers;
loop
before:=0; after:=0;
fetch managers into manager_employee_id, manager_hire_date, reports;
exit when managers%notfound;
fetch reports bulk collect into report_hire_dates;
if report_hire_dates.count > 0
then
for j in report_hire_dates.first..report_hire_dates.last
loop
case report_hire_dates(j) < manager_hire_date
when true then before:=before+1;
else after:=after+1;
end case;
end loop;
end if;
if before > after then
pipe row ( manager_employee_id ); end if;
end loop;
close managers;
return;
end Young_Managers_Fn;
create or replace view young_managers as
select column_value manager_employee_id from table ( Young_Managers_Fn() );
The function can be made more general by giving it a ref cursor input parameter and
by passing in the cursor expression as the actual parameter (see Example 10). This would
allow it to be "pointed at" any table that expressed a hierarchy where both parent and child
have a date.
Example 10. Using a table function with a ref cursor Input parameter.
create or replace function Young_Managers_Fn ( managers in sys_refcursor )
return My_Types.employee_ids_tab
pipelined
is
manager_employee_id employees.employee_id%type;
manager_hire_date employees.hire_date%type;
reports sys_refcursor;
type report_hire_date_t is table of employees.hire_date%type
index by binary_integer;
report_hire_dates report_hire_date_t;
before integer; after integer;
begin
loop
before:=0; after:=0;
fetch managers into manager_employee_id, manager_hire_date, reports;
exit when managers%notfound;
fetch reports bulk collect into report_hire_dates;
if report_hire_dates.count > 0
then
for j in report_hire_dates.first..report_hire_dates.last
loop
case report_hire_dates(j) < manager_hire_date
when true then before:=before+1;
else after:=after+1;
end case;
end loop;
end if;
if before > after then
pipe row ( manager_employee_id ); end if;
end loop;
close managers;
return;
end Young_Managers_Fn;
select column_value manager_employee_id from table
(
Young_Managers_Fn
(
cursor
(
select
employee_id, hire_date,
cursor (
select hire_date
from employees reports
where reports.manager_id =
managers.employee_id
)
from employees managers
)
)
);
Note: An attempt to create a view as this SELECT statement currently fails with
"ORA-22902: CURSOR expression not allowed," where the exception is raised because
the SELECT statement that's the argument of the CURSOR formal parameter to the table function itself has a cursor expression (a.k.a. cursor subquery). A view can be created
when the SELECT statement doesn't have a cursor subquery (see the earlier Mappings_Fn
example).
|
Sometimes the specification for the transformation to be implemented as a table function explicitly excludes source data with certain characteristics. In such cases, it's useful to report on the excluded source data and often most convenient to direct the report to the database for further analysis. A table function may do DML, provided that this is done within an autonomous transaction, as shown in Example 11.
Example 11. Using autonomous transactions to allow for intended "side effects."
create or replace function lookups_fn_with_side_effect
return my_types.lookups_tab
pipelined
/*
uses...
create table exclusions ( n number );
*/
is
pragma autonomous_transaction;
v_row My_Types.lookup_row;
begin
for j in 1..15
loop
case
when j < 11 then
case j
when 1 then v_row.idx := 1; v_row.text := 'one';
when 2 then v_row.idx := 2; v_row.text := 'TWO';
when 3 then v_row.idx := 3; v_row.text := 'three';
when 4 then v_row.idx := 4; v_row.text := 'FOUR';
when 5 then v_row.idx := 5; v_row.text := 'five';
when 6 then v_row.idx := 6; v_row.text := 'SIX';
when 7 then v_row.idx := 7; v_row.text := 'seven';
else v_row.idx := j; v_row.text := 'other';
end case;
pipe row ( v_row );
else
insert into exclusions values ( j );
end case;
end loop;
commit;
return;
end Lookups_Fn_With_Side_Effect;
It's beyond the scope of this article to describe the details of Oracle's parallel query feature. Suffice it to say that when certain environment conditions are met (especially a hardware environment that supports multiple concurrently executing processes making concurrent disk accesses, and a user environment close to single-user) and when the objects referenced in a query have appropriate parallel attributes, then the elapsed time for long-running queries can be cut in direct proportion to the number of available CPUs. This is especially significant in Decision Support Systems (a.k.a. DSS) both at query time and in the Extraction, Transformation, and Load (a.k.a. ETL) operations to populate them.
Oracle 9i introduces table function features to allow their execution to be parallelized.
These features require (with one small exception, discussed shortly) that the table function
has exactly one strongly typed ref cursor input parameter. Let's take a look at the different
ways in which table functions can be designed for parallel execution.
Consider a function that processes each row from its input cursor individually. (Such a transformation, which generates two or more output rows from each input row- generically referred to as piviotting-benefits particularly from a table function implementation.) The syntax to parallelize this is straightforward and is shown in Example 12.
Example 12. Parallel execution for arbitrary partitioning of data.
create or replace function Rowwise_Xform_Fn (
p_input_rows in SYS_REFCURSOR )
return My_Types.xforms_tab
pipelined
parallel_enable ( partition p_input_rows by any )
is
v_in_row My_Types.input_row;
v_out_row My_Types.xform_row;
begin
loop
fetch p_input_rows into v_in_row;
exit when p_input_rows%notfound;
v_out_row.n := v_in_row.n*2;
v_out_row.typ := 'a';
pipe row ( v_out_row );
v_out_row.n := v_in_row.n*3;
v_out_row.typ := 'b';
pipe row ( v_out_row );
end loop;
close p_input_rows;
return;
end Rowwise_Xform_Fn;
See Example 13 for the complete working example. They keyword any expresses the
programmer's assertion that the results are independent of the order in which the function
gets the input rows. When this keyword is used, the runtime system randomly partitions
the data among the query slaves. This keyword is appropriate for use with functions that
take in one row, manipulate its columns, and generate output row(s) based on the columns
of this row only. (Of course, if this assertion doesn't hold, the output won't be predictable.)
This is the small exception referred to earlier: The input ref cursor need not be strongly
typed to be partitioned by any.
Example 13. Algorithm is independent of the ordering of the source rows.
create or replace package My_Types is
type input_row is record ( n number );
type cur_t is ref cursor return input_row;
type xform_row is record ( n number, typ char(1) );
type xforms_tab is table of xform_row;
end My_Types;
create table t ( n number );
begin
for j in 1..1000
loop insert into t ( n ) values ( j ); end loop;
commit;
end;
create or replace function Rowwise_Xform_Fn (
p_input_rows in My_Types.cur_t )
return My_Types.xforms_tab
pipelined
parallel_enable ( partition p_input_rows by any )
is
v_in_row My_Types.input_row;
v_out_row My_Types.xform_row;
begin
loop
fetch p_input_rows into v_in_row;
exit when p_input_rows%notfound;
v_out_row.n := v_in_row.n * 2; v_out_row.typ := 'a';
pipe row ( v_out_row );
v_out_row.n := v_in_row.n * 3; v_out_row.typ := 'b';
pipe row ( v_out_row );
end loop;
close p_input_rows;
return;
end Rowwise_Xform_Fn;
select * from table (
Rowwise_Xform_Fn ( cursor ( select n from t ) ) )
where rownum < 11;
The ability to exploit the parallel potential of a table function depends on whether the source can be parallelized.
Consider a transformation along the lines of:
select avg ( salary ), department_id
from employees
group by department_id;
where the aggregation operation to be performed on the set of salaries for a given department is arbitrarily complex such that a classical SQL implementation is impossible, slow by virtue of a function invocation for each row of the source table, or prohibitively challenging to write and debug. For example, it might be that the cost to the employer of paying a given salary depends on the hire date because of changes in benefits packages that affect only employees hired after the date of change. This is illustrated in Example 14, but to avoid obscuring it with a complicated algorithm, the aggregation is simply the sum for the salary for each distinct department. This has the general form shown in Example 15.
Example 14. Algorithm requires only that the source rows are clustered.
Note: In order to avoid having to make the algorithm distractingly complex, the following
DELETE should be issued:
delete from employees where department_id is null;
before continuing thus:
create or replace package My_Types is
type dept_sal_row is record
( sal number(8,2), dept number(4) );
type cur_t is ref cursor return dept_sal_row;
type dept_sals_tab is table of dept_sal_row;
end My_Types;
create or replace function Aggregate_Xform
( p_input_rows in My_Types.cur_t )
return My_Types.dept_sals_tab
pipelined
--[ cluster / order ] p_input_rows by (dept)
--parallel_enable
-- ( partition p_input_rows by [ hash / range] (dept) )
is
g_in_row My_Types.dept_sal_row;
g_out_row My_Types.dept_sal_row;
g_first_time boolean := true;
g_last_dept number := null;
g_got_a_row boolean;
g_new_dept boolean;
g_current_dept employees.department_id%type;
g_prev_dept employees.department_id%type;
v_total_sal number;
procedure Get_Next_Row is begin
fetch p_input_rows into g_in_row;
g_got_a_row := not p_input_rows%notfound;
if g_got_a_row
then
case g_first_time
when true then
g_first_time := false;
g_new_dept := false;
else
g_new_dept := g_prev_dept <> g_in_row.dept;
end case;
g_prev_dept := g_in_row.dept;
end if;
return;
end Get_Next_Row;
function Got_Next_Dept return boolean is begin
g_current_dept := g_in_row.dept;
g_new_dept := false;
return g_got_a_row;
end Got_Next_Dept;
function Got_Next_Row_In_Dept return boolean is begin
return ( not g_new_dept ) and g_got_a_row;
end Got_Next_Row_In_Dept;
begin
Get_Next_Row();
while Got_Next_Dept()
loop
v_total_sal := 0;
while Got_Next_Row_In_Dept()
loop
v_total_sal := v_total_sal + g_in_row.sal;
Get_Next_Row();
end loop;
g_out_row.sal := v_total_sal;
g_out_row.dept := g_current_dept;
pipe row ( g_out_row );
end loop;
close p_input_rows;
return;
end Aggregate_Xform;
Example 15. General form of a function performing an aggregate transformation.
create or replace function Aggregate_Xform (
p_input_rows in My_Types.cur_t )
return My_Types.dept_sals_tab
pipelined
is
...
begin
Get_Next_Row();
while Got_Next_Dept() /* relies on assumption that
all rows for given dept are
delivered consecutively */
loop
v_total_sal := 0;
while Got_Next_Row_In_Dept()
loop
v_total_sal := v_total_sal + g_in_row.sal;
Get_Next_Row();
end loop;
g_out_row.sal := v_total_sal;
g_out_row.dept := g_current_dept;
pipe row ( g_out_row );
end loop;
close p_input_rows;
return;
end Aggregate_Xform;
Given that the input rows will be partitioned between different slaves, the integrity of the algorithm requires that all the rows for a given department go to the same slave, and that all these rows are delivered consecutively. (Strictly speaking, the requirement for consecutive delivery is negotiable, but the design of the algorithm to handle this case would need to be much more elaborate. For that reason, Oracle commits to consecutive delivery.) We use the term clustered to signify this type of delivery, and cluster key for the column (in this case, "department") on which the aggregation is done. But significantly, the algorithm does not care in what order of cluster key it receives each successive cluster, and Oracle doesn't guarantee any particular order here.
This allows the possibility of a quicker algorithm than if rows were required to be
clustered and delivered in order of the cluster key. It scales as order N rather than order
N.log(N), where N is the number of rows. The syntax to accomplish this is shown here:
create or replace function aggregate_xform (
p_input_rows in my_types.cur_t )
return my_types.dept_sals_tab
pipelined
cluster p_input_rows by (dept)
parallel_enable
( partition p_input_rows by hash (dept) )
is ...
We can choose between hash (dept) and range (dept) depending on what we know about the distribution of the values. (Hash will be quicker than range and is the natural
choice to be used with cluster... by.) Here, to be partitioned by a specified column, the
input ref cursor must be strongly typed. Cluster... by isn't allowed without parallel_enable (partition... by).
Note: At version 9.0.1, it's necessary to include ORDER BY on the cluster key in the
SELECT used to invoke the table function as follows, in order to preserve correctness of
behavior, but this restriction will be removed when the order N clustering algorithm is
productized:
select * from table (
Aggregate_Xform (
cursor (
select salary, department_id
from employees
where department_id is not null
order by department_id ) ) );
|
This alternative syntax is also allowed:
create or replace function my_fn (
p_input_rows in my_types.cur_t )
return my_types.items_tab
pipelined
order p_input_rows by (c1)
parallel_enable
( partition p_input_rows by range (c1) )
is...
This means that those rows that are delivered to a particular slave as directed by
partition... by will be locally sorted by that slave, thus parallelizing the sort. Therefore,
there should be no ORDER BY in the SELECT used to invoke the table function. (To have one would subvert the attempt to parallelize the sort.) Thus it's natural to use the
range option together with the order by option. This will be slower than cluster by, and so
should be used only when the algorithm depends on it.
Note: The cluster... by construct can't be used together with the order... by in the
declaration of a table function. This means that an algorithm that depends on clustering on
one key, c1, and then on ordering within the set row for a given value of c1 by, say, c2 would have to be parallelized by using the order... by in the declaration in the table function. Here, we'd use:
create or replace function median (
p_input_rows in my_types.cur_t )
return my_types.items_tab
pipelined
order p_input_rows by (c1,c2)
parallel_enable
( partition p_input_rows by range (c1) )
is...
The current restriction preventing using cluster... by together with order... by implies
no loss of functionality, but only a missed opportunity to leverage the order N sort.
Caution: It's possible to design an algorithm for a table function that would deliver a
different number of rows according to the degree of parallelism. The simplest example is a
function that returns a table of NUMBER representing the count of the rows its input
cursor delivered. A non-parallelized version would deliver just one row giving count(*) for
the input table. A parallelized version would deliver N rows (where N is the degree of parallelism), the sum of whose values would give count(*) for the input table. However, this breaks the parallel query abstraction. Oracle recommends against programming this
way.
When a table function is written to return a schema-level type, the syntax required to invoke it is somewhat verbose. For completeness, it's illustrated here.
(This example has been rewritten to return schema-level types)
Since the query syntax for an object table is rather verbose, we recap it here using a table.
create table t of lookup_row;
insert into t values ( lookup_row ( 1, 'one' ) );
insert into t values ( lookup_row ( 2, 'TWO' ) );
insert into t values ( lookup_row ( 3, 'three' ) );
insert into t values ( lookup_row ( 4, 'FOUR' ) );
insert into t values ( lookup_row ( 5, 'five' ) );
insert into t values ( lookup_row ( 6, 'SIX' ) );
insert into t values ( lookup_row ( 7, 'seven' ) );
insert into t values ( lookup_row ( 8, 'other' ) );
insert into t values ( lookup_row ( 9, 'other' ) );
insert into t values ( lookup_row ( 10, 'other' ) );
commit;
/* this is how an object query should be written */
select VALUE(a) rec from t a;
/* because it's verbose, it's convenient to define a view */
create or replace view v as
select value(a) rec from t a;
/* test the view */
select * from v;
Now the example proper:
create type lookup_row as
object ( idx number, text varchar2(20) );
create type lookups_tab as table of lookup_row;
create or replace function Lookups_Fn
return lookups_tab
pipelined
is
v_row lookup_row;
begin
for j in 1..10
loop
v_row :=
case j
when 1 then lookup_row ( 1, 'one' )
when 2 then lookup_row ( 2, 'TWO' )
when 3 then lookup_row ( 3, 'three' )
when 4 then lookup_row ( 4, 'FOUR' )
when 5 then lookup_row ( 5, 'five' )
when 6 then lookup_row ( 6, 'SIX' )
when 7 then lookup_row ( 7, 'seven' )
else lookup_row ( j, 'other' )
end;
pipe row ( v_row );
end loop;
return;
end Lookups_Fn;
Note the syntax of the query. Since the table function returns an object, it follows from the syntax against an object table above. Again, it's convenient to encapsulate it in a view:
create or replace view lookups as
select value(a) rec
from table
(
cast ( Lookups_Fn() as lookups_tab )
) a;
select * from lookups;
create or replace function Mappings_Fn
( p_input_rows in sys_refcursor )
return lookups_tab
pipelined
is
v_in_row lookup_row;
/* always initialize an object type using a
type constructor or user defined constructor */
v_out_row lookup_row := lookup_row( 1, 'x' );
begin
loop
fetch p_input_rows into v_in_row;
exit when p_input_rows%notfound;
case v_in_row.idx
when 1 then v_out_row.idx := 1*2;
v_out_row.text := 'was one';
when 2 then v_out_row.idx := 2*3;
v_out_row.text := 'was TWO';
when 3 then v_out_row.idx := 3*4;
v_out_row.text := 'was three';
when 4 then v_out_row.idx := 4*5;
v_out_row.text := 'was FOUR';
when 5 then v_out_row.idx := 5*6;
v_out_row.text := 'was five';
when 6 then v_out_row.idx := 6*7;
v_out_row.text := 'was SIX';
when 7 then v_out_row.idx := 7*8;
v_out_row.text := 'was seven';
else v_out_row.idx :=
v_in_row.idx*10;
v_out_row.text := 'was other';
end case;
pipe row ( v_out_row );
end loop;
close p_input_rows;
return;
end Mappings_Fn;
Note the syntax of the query. It's most compactly expressed using the views v or lookups defined earlier.
select value(b)
from table
(
cast
(
Mappings_Fn
(
cursor ( select * from lookups )
)
as lookups_tab
)
) b;
For completeness, here's how it looks without the view:
select value(b) from table
(
cast
(
Mappings_Fn
(
cursor
( select value(a) from table
(
cast ( Lookups_Fn() as lookups_tab )
) a
)
)
as lookups_tab
)
) b;
We can create a VIEW mapped_lookups with this SELECT statement, and then access it without restriction from PL/SQL. For example:
declare
cursor table_fn_cur is
select * from mapped_lookups;
rec lookup_row;
begin
open table_fn_cur;
loop
fetch table_fn_cur into rec;
exit when table_fn_cur%notfound;
Print ( rec.idx, rec.text );
end loop;
close table_fn_cur;
end;
Note, however, that this simpler syntax doesn't work here:
for rec in ( select * from mapped_lookups ) loop
Cursor expressions allow encapsulation of logic for reuse in compatible query situations, giving increased developer productivity and application reliability.
Table functions give increased functionality by allowing sets of tuples from arbitrary
external data sources and sets of tuples synthesized from arbitrary computations to be
invoked (as if they were a table) in the FROM list of a SELECT clause. For
convenience, they can be used to define a VIEW, giving new functionality.
Table functions can be used to deliver the rows from an arbitrarily complex PL/SQL
transformation sourced from Oracle tables (including, therefore, other table functions)
as a "VIEW," without storage of the calculated rows. This gives increased speed and
scalability, and increased developer productivity and application reliability.
Taking the "VIEW" metaphor a step further, the input parameters to the table function
allow the "VIEW" to be parameterizable, increasing code reusability and therefore
increasing developer productivity and application reliability.
A table function with a ref cursor input parameter can be invoked with another table function as the data source. Thus table functions can be daisy-chained, allowing modular program design and hence increased ease of programming, reuse, and application robustness.
Table function execution can be parallelized giving improved speed and scalability.
This, combined with the daisy-chaining feature, makes table functions particularly
suitable in data warehouse applications for implementing Extraction, Transformation,
and Load operations.
Fanout (DML from an autonomous transaction in the table function) adds functionality
of particular interest in data warehouse applications.
A table function allows data stored in nested tables to be queried as if it were stored
relationally, and data stored relationally to be queried as if it were stored as nested
tables. (This will be illustrated in the code samples for the next article.) This allows
genuine independence between the format for the persistent storage of data and the
design of the applications that access it. (A VIEW can be defined on a table function,
and INSTEAD OF triggers can be created on the VIEW to complete the picture.)
Table functions and cursor expressions expand in important new ways our ability as PL/SQL developers both to improve performance and reuse critical business logic. These features are particularly important if you rely on parallelization in your application and want to fully exploit PL/SQL-based functionality.
With Oracle 9i you can now nest collections within collections, also referred to as support for "multi-level collections." In the next article, Steven Feuerstein and Bryn Llewellyn show how to use multi-level collections and demonstrate with some examples.
This article was originally published in the May 2002 issue of Oracle Professional. The material in Feuerstein's articles--and those he cowrote with Bryn Llewellyn--are based on Oracle Corporation white papers originally prepared by Llewellyn for Oracle OpenWorld 2001 in San Francisco and OracleWorld Copenhagen in June 2002, and Feuerstein's book, Oracle PL/SQL Programming, 3rd Edition.
Steven Feuerstein is considered one of the world's leading experts on the Oracle PL/SQL language.
Bryn Llewellyn is PL/SQL Product Manager, Database and Application Server Technologies Development Group, at Oracle Corporation Headquarters.
O'Reilly & Associates recently released (September 2002) Oracle PL/SQL Programming, 3rd Edition.
Sample Chapter 10, Dates and Timestamps, is available free online.
You can also look at the Table of Contents, the Index, and the Full Description of the book.
For more information, or to order the book, click here.
Return to the O'Reilly Network.
Copyright © 2007 O'Reilly Media, Inc.