O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 http://www.oreillynet.com/pub/a/network/2002/12/13/mastersql.html
 See this if you're having trouble printing code examples


Mastering Oracle SQL

A Large Order of PL/SQL Please, Easy on the PL

by Alan Beaulieu, coauthor of Mastering Oracle SQL
12/13/2002

PL/SQL is a great language. In the eight years that I have been designing and implementing Oracle databases and associated applications, I have used PL/SQL to great advantage in every single project. The problem I have with PL/SQL is not the language itself, but the ease with which a programmer can dive down to the row-level when set manipulation might prove more efficient. Indeed, with the various enhancements to the SQL language made by Oracle over the past five years, complex business logic that would have required a combination of SQL and a procedural language can now generally be handled by SQL alone. To illustrate, we will use PL/SQL to implement a specific business need, but we will attack the problem from two different angles: the business logic perspective, whereby a series of steps are performed against a set of data in order to incorporate multiple business rules, and the data perspective, in which data sets are manipulated in order to absorb multiple business rules simultaneously.

Here's the problem: for various purposes (such as client billing, risk exposure), a financial services firm needs to track the number of trades executed in each country. Depending on the type of security being traded, the data might be tracked either by country code or by currency code. An operational data store tracks the number of trades each day by account, transaction type, and country/currency, and our assignment is to roll up the transaction counts each month to the country level, using a translation table to convert currency codes to country codes [1]. Additionally, transaction types AB and XY always get rolled into the US bucket, regardless of the country/currency stored in the tables. Here's the schema for our problem:

Given the above schema, our task is to roll up the txn_count column for each country from the Account_Currency_Txns and Account_Country_Txns tables and insert rows into the Country_Txn_Summary table. The next two sections will follow the thought processes of two programmers assigned to solve this problem: one coding from the business logic perspective, and the other from the data perspective.

Business Logic Perspective

We need to add each record from the two transaction tables to the appropriate summary bucket. Therefore, for each transaction table, we will sum the number of transactions by country and transaction type, check to see if the transaction type requires that the country code be remapped to US, and then add the transaction count to the appropriate bucket in the Country_Txn_Summary table. To facilitate reuse, we will create two stored procedures within a PL/SQL package: one procedure that loops through each of the transaction tables, and another private procedure that updates the appropriate row in the summary table:

/* package definition */
create or replace package pkg_monthly_summary as
  procedure summarize_txn_data(v_smry_start_date in date);
end pkg_monthly_summary;

/* package body */
create or replace package body pkg_monthly_summary as
  /* forward declaration of private procedure */
  procedure update_txn_bucket(v_cntry_cd in varchar2, 
    v_smry_start_date in date, v_txn_type in varchar2,
    v_txn_count in number);

  /* procedure to loop through transaction records */
  /* and add to appropriate country bucket         */
  procedure summarize_txn_data(v_smry_start_date in date) is
    /* country data cursor */
    cursor cntry_cur(cv_smry_start_date in date) is
    select cntry_cd, txn_type, sum(txn_count) tot_txns
    from account_country_txns
    where exec_date >= cv_smry_start_date
      and exec_date < add_months(cv_smry_start_date, 1)
    group by cntry_cd, txn_type;

    /* currency data - map to country via country_currency table */
    cursor crrncy_cur(cv_smry_start_date in date) is
    select cc.cntry_cd cntry_cd, act.txn_type txn_type, 
      sum(act.txn_count) tot_txns
    from account_currency_txns act, country_currency cc
    where act.exec_date >= cv_smry_start_date
      and act.exec_date < add_months(cv_smry_start_date, 1)
      and act.crrncy_cd = cc.crrncy_cd
    group by cc.cntry_cd, act.txn_type;
  begin
    /* add country records to buckets */
    for cntry_rec in cntry_cur(v_smry_start_date) loop
      update_txn_bucket(cntry_rec.cntry_cd, v_smry_start_date, 
        cntry_rec.txn_type, cntry_rec.tot_txns);
    end loop;

    /* add currency records to buckets */
    for crrncy_rec in crrncy_cur(v_smry_start_date) loop
      update_txn_bucket(crrncy_rec.cntry_cd, v_smry_start_date, 
        crrncy_rec.txn_type, crrncy_rec.tot_txns);
    end loop;

    commit;
  exception
    when OTHERS then
      rollback;
      raise_application_error(-20001, 'Oops');
  end summarize_txn_data;

  /* procedure to update appropriate country summary bucket */
  procedure update_txn_bucket(v_cntry_cd in varchar2, 
    v_smry_start_date in date, v_txn_type in varchar2,
    v_txn_count in number) is
    v_targ_cntry_cd varchar2(4);
  begin
    /* map all AB and XY transactions to country US */
    if v_txn_type = 'AB' or v_txn_type = 'XY' then
      v_targ_cntry_cd := 'US';
    else
      v_targ_cntry_cd := v_cntry_cd;
    end if;

    /* attempt to update existing bucket */
    update country_txn_summary
    set txn_count = txn_count + v_txn_count
    where cntry_cd = v_targ_cntry_cd
      and year = to_char(v_smry_start_date, 'YYYY')
      and month = to_char(v_smry_start_date, 'MM');

    if SQL%ROWCOUNT = 0 then
      /* bucket doesn't exist yet, so create it */
      insert into country_txn_summary (cntry_cd, 
        year, month, txn_count)
      values (v_targ_cntry_cd, to_char(v_smry_start_date, 'YYYY'),
        to_char(v_smry_start_date, 'MM'), v_txn_count);
    end if;
  end update_txn_bucket;
end pkg_monthly_summary;

To summarize data for the month of October, 2002, we would execute the following:

pkg_monthly_summary.summarize_txn_data(to_date('01-OCT-2002','DD-MON-YYYY'))

Data Perspective

We have two sets of data that we need to combine and roll up into a new set. As we combine the sets, we will translate currency codes to country codes and remap country codes based on the transaction type. The combined set will then be aggregated, and the resulting data will be inserted into a third table. We will create a PL/SQL package containing a single stored procedure to execute our logic:

/* package definition */
create or replace package pkg_monthly_summary as
  procedure summarize_txn_data(v_smry_start_date in date);
end pkg_monthly_summary;

/* package body */
create or replace package body pkg_monthly_summary as
  procedure summarize_txn_data(v_smry_start_date in date) is
  begin
    -- step #5
    insert into country_txn_summary
      (cntry_cd, year, month, txn_count)
    select cntry_data.cntry_cd,
      to_char(v_smry_start_date, 'YYYY'), 
      to_char(v_smry_start_date, 'MM'),
      sum(cntry_data.txn_count)
    from
     /* inline view cntry_data combines country/currency */
     /* transactions while translating currency codes to */
     /* country codes and mapping AB/XY txn types to US  */
     ( -- step #1
      select decode(txn_type, 'AB', 'US', 
        'XY', 'US', cntry_cd) cntry_cd, 
        sum(txn_count) tot_txns
      from account_country_txns
      where exec_date >= v_smry_start_date
        and exec_date < add_months(v_smry_start_date, 1)
      group by decode(txn_type, 'AB', 'US', 
        'XY', 'US', cntry_cd)
      union all
      -- step #2
      select decode(act.txn_type, 'AB', 'US', 
        'XY', 'US', cc.cntry_cd) cntry_cd, 
        sum(act.txn_count) tot_txns
      from account_currency_txns act, country_currency cc
      where act.exec_date >= v_smry_start_date
        and act.exec_date < add_months(v_smry_start_date, 1)
        and act.crrncy_cd = cc.crrncy_cd
      group by decode(act.txn_type, 'AB', 'US', 
        'XY', 'US', cc.cntry_cd)
      ) cntry_data -- step #3
    group by cntry_data.cntry_cd; -- step #4

    commit;
  exception
    when OTHERS then
      rollback;
      raise_application_error(-20001, 'Oops');
  end summarize_txn_data;
end pkg_monthly_summary;

To summarize data for the month of October, 2002, we would execute the following:

pkg_monthly_summary.summarize_txn_data(to_date('01-OCT-2002','DD-MON-YYYY'))

Because this code is nontrivial, we will break the statement down and describe each component. As with any SQL statement containing inline views, the best way to decompose the statement is from the inside out. For each step below, you can find the matching code by looking for the step #x comment in the above INSERT statement:

  1. Sum the txn_count column in the Account_Country_Txns table for each country, first translating the AB and XY transaction types to the US country code.

  2. Sum the txn_count column in the Account_Currency_Txns table for each country, first translating currency codes to country codes and translating the AB and XY transaction types to the US country code.

  3. Combine the two sets generated by steps 1 and 2 above using union all. Create an inline view called cntry_data to hold the combined set.

  4. Since the same country can appear multiple times in the cntry_data set, we must group by the cntry_cd column and sum the tot_txns column in the cntry_data inline view.

  5. Insert the results in the Country_Txn_Summary table.

Comparison

In the example written from the business logic perspective, the programmer codes a series of steps in order to reach the desired result, often revisiting the same rows in a table multiple times. If the amount of data being processed is relatively small, this approach is perfectly viable and has the advantage of being easy to follow, especially if it is well-commented. If the size of the data sets are large, however, this approach suffers from the fact that the data from the two tables is accessed one row at a time.

The example written from the data perspective is more compact, and, if you are working with large data sets and have access to servers with multiple processors, it allows Oracle to break the task into multiple pieces and execute the pieces in parallel across multiple CPUs. On the other hand, coding from the data perspective can be more difficult to maintain, and comments sprinkled around and within a 50-line SQL statement may offer little solace to programmers who are not experienced with tackling large SQL statements. Finally, SQL statements written from the data perspective may harbor logic errors that are difficult to identify; for example, using union instead of union all in the cntry_data inline view would generate incorrect results whenever the same number of transactions had been logged to a country/currency pair [2].

In my experience, most of the code I come across while tuning or modifying existing systems has been written from the business logic perspective. While this is generally acceptable, systems that work with significant amounts of data can see an improvement of ten times and better by reworking code from a business logic perspective to a data perspective. Since this frequently means the difference between success and failure, the disadvantages mentioned above are simply the price that must be paid to ensure that a system can finish execution within its allotted time window. Therefore, there is a place for both programming styles in most projects, and the savvy programmer will consider the need for clarity versus efficiency in order to know when to apply each style.

1. For the sake of simplicity, we will assume that all currencies map to exactly one country. Thus, we can either pretend the Euro doesn't exist, or we can create a country code of EU and map it to the Euro currency.

2. For example, if ten transactions had been logged to both country JP in the Account_Country_Txns table and currency JPY in the Account_Currency_Txns table, our final result for country JP would be off by 10. This is because using union mandates that duplicates be removed from the final set, whereby union all does not remove duplicates.

Alan Beaulieu has been designing, building, and implementing custom database applications for over 13 years.

Mastering Oracle SQL

Related Reading

Mastering Oracle SQL
By Sanjay Mishra, Alan Beaulieu

Table of Contents
Index
Sample Chapter

Read Online--Safari
Search this book on Safari:
 

Code Fragments only

O'Reilly & Associates recently released (April 2002) Mastering Oracle SQL.


Return to ONJava.com.

Copyright © 2007 O'Reilly Media, Inc.