O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


Oracle SQL*Plus Pocket Reference, 2nd Edition

What's in a Condition?

by Jonathan Gennick, author of Oracle SQL*Plus Pocket Reference, 2nd Edition
10/01/2002

Oracle9i Database introduced support for SQL92 join syntax. This newly supported syntax allows you to explicitly specify join conditions in the FROM clause of a SELECT statement. SQL92 join syntax has been supported by other databases for years, but is new to those of us who have been living in an Oracle-centric world. Recently, while researching the behavior of this new (to Oracle) syntax, I had to significantly adjust my mental model of joins to incorporate a new understanding of just what a join condition is. I was surprised and you might be too. Read on to learn of a subtle, but important-to-understand facet of this newly supported syntax.

The Old Way

Up until Oracle9i, you expressed a join by listing two tables in the FROM clause of a SELECT statement and writing one or more join conditions in the WHERE clause of that same statement. The join conditions served to link the two tables together, and each condition referenced one column from each of the two tables involved in the join. For example, the following is a typical outer-join using the traditional Oracle syntax:

SQL> SELECT c.city_name, c.population, b.business_name
  2  FROM up_city c, up_business b
  3* WHERE c.city_name = b.city_name(+);

CITY_NAME            POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Ishpeming                  6686 Da Yoopers Tourist Trap
Munising                   2539
Sault Ste. Marie          16542 Museum Ship Valley Camp

This query returns a list of Upper Peninsula (of Michigan) cities and businesses. The "(+)" following the b.city_name column in the WHERE clause makes the up_business table the optional table in the join (the b.city_name column refers to the up_business table). The results include one city, Munising, for which no businesses are known, but because this is an outer-join, and because the up_business table is the optional table, Munising is listed anyway.

Related Articles:

Learn about SQL92 join syntax by reading Jonathan's Oracle Magazine article ANSI Standard SQL Joins. Also read Sanjay Mishra's Full Outer Joins in Oracle9i on the O'Reilly Network.

I chose an outer-join for this example, because it's the SQL92 syntax's impact on outer-joins that might surprise you. The join condition for this query is implicit. You recognize the join condition by the fact that it links columns from the two tables via a common column, but otherwise there's nothing in the syntax to identify it as a join condition. It's no different than any other condition you might find in the WHERE clause. Up until very recently, my mental definition of a join condition went something like:

A join condition is any condition that uses a comparison operator (such as =) to link a column from one table to a column from another table.

This mental definition of mine came about because I've always worked with joins that were specified implicitly. SQL92 joins are explicit, and that's key to my new understanding of how joins work.

The New Way

The new syntax allows us to specify join conditions in the FROM clause of a SELECT statement. Thus, I can rewrite the outer-join shown in the previous section as follows:

SQL> SELECT c.city_name, c.population, b.business_name 
  2  FROM up_city c LEFT OUTER JOIN up_business b
  3    ON c.city_name = b.city_name;

CITY_NAME            POPULATION BUSINESS_NAME
-------------------- ---------- ------------------------
Ishpeming                  6686 DA Yoopers Tourist Trap
Sault Ste. Marie          16542 Museum Ship Valley Camp
Munising                   2539

No surprises here, not yet anyway. The results are in a different order, but that doesn't matter as I didn't specify an ORDER BY clause. The data returned is identical to that returned by the previous query. What is different is that the join condition is explicit from the syntax of the statement. The ON clause specifically identifies c.city_name = b.city_name as the join condition. This is a key point to understand, as you'll soon see.

The Surprise

The surprise for me came when a reader asked whether it might be more efficient to put what would otherwise be a non-join condition into the FROM clause in the hopes that it might be able to eliminate rows earlier in the query execution process, thus making the query more efficient. Consider the following query, which is an extension of our previous query that restricts the results to only those cities with populations 5,000 and higher:

SQL> SELECT c.city_name, c.population, b.business_name
  2  FROM up_city c LEFT OUTER JOIN up_business b
  3  ON c.city_name = b.city_name
  4* WHERE c.population >= 5000;

CITY_NAME             POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Ishpeming                  6686 DA Yoopers Tourist Trap
Sault Ste. Marie          16542 Museum Ship Valley Camp

This query works just as expected. Now, let's try that reader's suggestion and move the restriction on population into the FROM clause. By doing that, we explicitly make c.population >= 5000 one of the join conditions. It's one of the join conditions even though it doesn't link columns from two tables. Following are the results:

SQL> SELECT c.city_name, c.population, b.business_name
  2  FROM up_city c LEFT OUTER JOIN up_business b
  3    ON c.city_name = b.city_name
  4*  AND c.population >= 5000;

CITY_NAME            POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Munising                   2539
Ishpeming                  6686 DA Yoopers Tourist Trap
Sault Ste. Marie          16542 Museum Ship Valley Camp

Look at this: Munising is back in the game. Why? My query still contains the condition: c.population >= 5000. Why does Munising, which clearly violates my condition, still show up in the results? This behavior came as a complete and utter shock to me. I was quite dumbfounded and convinced I'd uncovered a rather serious bug in Oracle's software. But it was my mental model which needed adjusting, not Oracle's software.

The Explanation

Remember my mental definition for join condition. I had thought that the use of the ON clause versus the WHERE clause was mere eye-candy, and that all conditions were treated identically no matter where they were placed in the query. That's not the case. By moving the restriction on population from the WHERE clause to the FROM clause, I made it one of the join conditions, thus changing the semantics of my query. To understand why that affected the results the way it did, you need to understand conceptually how a query with an outer-join is executed:

  1. First, a Cartesian product is produced from the two tables involved in the join. This results in all combinations of all rows. The join conditions are then evaluated for each row in the Cartesian-product.
  2. If the join conditions evaluate to TRUE, the row in question is retained in the result set.
  3. If the join conditions evaluate to FALSE, the row is still retained. The columns from the non-optional table (up_city in my case) are preserved, and the columns from the optional table (up_business) are set to NULL.
  4. Any duplicate rows as a result of step 3 are ultimately removed.
  5. The WHERE clause is evaluated, and any rows not satisfying the conditions in the WHERE clause are removed.

The key to why Munising shows up in the results lies in step 3. Join conditions are explicit in the SQL92 syntax. By moving c.population >= 5000 into the FROM clause, I made that condition one of the join conditions, and moved it's evaluation up from step 5 to step 3. Thus, in step 3, for my query, Oracle is evaluating the following:

c.city_name = b.city_name AND c.population >= 5000

Clearly for Munising, with its population of only 2,539, this expression evaluates to FALSE. However, it doesn't matter, because the expression is evaluated as part of step 3. The row is still retained.

Updating My Mental Model

You can see that the ability to make join conditions explicit broadens the scope of what can be considered a join condition. I had to update my mental model of how joins work to include the following definition:

A join condition is any condition that I explicitly specify in the FROM clause, regardless of what columns, if any, it references. Join conditions may still be specified in the WHERE clause (for backwards compatibility), in which case they are those that use a comparison operator (such as =) to link a column from one table to a column from another table.

Related Reading

Oracle SQL Plus Pocket Reference
A Guide to SQL*Plus Syntax
By Jonathan Gennick

It's worth pointing out that with an inner-join, it won't make a difference if you move a condition from the WHERE clause to the FROM clause. That's because an inner-join doesn't allow for optionality: all join conditions must evaluate to TRUE or a row is rejected.

I don't know about the efficiency aspect of that one reader's question as to whether it might sometimes be more efficient to evaluate a condition in the FROM clause rather than the WHERE clause of a query. He hails from the DB2 world where, he tells me, queries do sometimes execute faster if you push restrictions into the FROM clause. I've run a few tests with Oracle, and always Oracle comes up with the same execution plan each way. I'm keeping an open mind on the issue, but at the same time my advice is not to play such games with your queries. Aim for clarity. Don't obfuscate things by moving what should be a non-join condition into the FROM clause.

New Possibilities

Now that I've aligned my mental model of joins to reality, I realize that there is a benefit to the way things work. The ANSI committee wasn't sitting around trying to make things confusing for us. Using Oracle's old syntax for outer-joins, it would never have been possible for a condition such as c.population >= 5000 to be ignored. That condition would never have been considered a join condition, because it doesn't reference one column from each table. Only using the SQL92 syntax do you have the choice of making it a join condition. And choice is what's important.

Acknowledgments

My thanks to the reader from the DB2 world (you know who you are) who first posed the question about pushing conditions into the FROM clause, which ended up pushing me down the path of discovery. Thanks also to Joe Celko, author of SQL for Smarties and former member of the ANSI SQL standards committee, for enlightening me as to the conceptual execution of an outer-join.

Jonathan Gennick is an O'Reilly Media, Inc. editor specializing in database and programming titles.


Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.