O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters
Oracle SQL*Plus Pocket Reference, 2nd Edition

What's in a Condition?

by Jonathan Gennick, author of Oracle SQL*Plus Pocket Reference, 2nd Edition
10/01/2002

Oracle9i Database introduced support for SQL92 join syntax. This newly supported syntax allows you to explicitly specify join conditions in the FROM clause of a SELECT statement. SQL92 join syntax has been supported by other databases for years, but is new to those of us who have been living in an Oracle-centric world. Recently, while researching the behavior of this new (to Oracle) syntax, I had to significantly adjust my mental model of joins to incorporate a new understanding of just what a join condition is. I was surprised and you might be too. Read on to learn of a subtle, but important-to-understand facet of this newly supported syntax.

The Old Way

Up until Oracle9i, you expressed a join by listing two tables in the FROM clause of a SELECT statement and writing one or more join conditions in the WHERE clause of that same statement. The join conditions served to link the two tables together, and each condition referenced one column from each of the two tables involved in the join. For example, the following is a typical outer-join using the traditional Oracle syntax:

SQL> SELECT c.city_name, c.population, b.business_name
  2  FROM up_city c, up_business b
  3* WHERE c.city_name = b.city_name(+);

CITY_NAME            POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Ishpeming                  6686 Da Yoopers Tourist Trap
Munising                   2539
Sault Ste. Marie          16542 Museum Ship Valley Camp

This query returns a list of Upper Peninsula (of Michigan) cities and businesses. The "(+)" following the b.city_name column in the WHERE clause makes the up_business table the optional table in the join (the b.city_name column refers to the up_business table). The results include one city, Munising, for which no businesses are known, but because this is an outer-join, and because the up_business table is the optional table, Munising is listed anyway.

Related Articles:

Learn about SQL92 join syntax by reading Jonathan's Oracle Magazine article ANSI Standard SQL Joins. Also read Sanjay Mishra's Full Outer Joins in Oracle9i on the O'Reilly Network.

I chose an outer-join for this example, because it's the SQL92 syntax's impact on outer-joins that might surprise you. The join condition for this query is implicit. You recognize the join condition by the fact that it links columns from the two tables via a common column, but otherwise there's nothing in the syntax to identify it as a join condition. It's no different than any other condition you might find in the WHERE clause. Up until very recently, my mental definition of a join condition went something like:

A join condition is any condition that uses a comparison operator (such as =) to link a column from one table to a column from another table.

This mental definition of mine came about because I've always worked with joins that were specified implicitly. SQL92 joins are explicit, and that's key to my new understanding of how joins work.

The New Way

The new syntax allows us to specify join conditions in the FROM clause of a SELECT statement. Thus, I can rewrite the outer-join shown in the previous section as follows:

SQL> SELECT c.city_name, c.population, b.business_name 
  2  FROM up_city c LEFT OUTER JOIN up_business b
  3    ON c.city_name = b.city_name;

CITY_NAME            POPULATION BUSINESS_NAME
-------------------- ---------- ------------------------
Ishpeming                  6686 DA Yoopers Tourist Trap
Sault Ste. Marie          16542 Museum Ship Valley Camp
Munising                   2539

No surprises here, not yet anyway. The results are in a different order, but that doesn't matter as I didn't specify an ORDER BY clause. The data returned is identical to that returned by the previous query. What is different is that the join condition is explicit from the syntax of the statement. The ON clause specifically identifies c.city_name = b.city_name as the join condition. This is a key point to understand, as you'll soon see.

The Surprise

The surprise for me came when a reader asked whether it might be more efficient to put what would otherwise be a non-join condition into the FROM clause in the hopes that it might be able to eliminate rows earlier in the query execution process, thus making the query more efficient. Consider the following query, which is an extension of our previous query that restricts the results to only those cities with populations 5,000 and higher:

SQL> SELECT c.city_name, c.population, b.business_name
  2  FROM up_city c LEFT OUTER JOIN up_business b
  3  ON c.city_name = b.city_name
  4* WHERE c.population >= 5000;

CITY_NAME             POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Ishpeming                  6686 DA Yoopers Tourist Trap
Sault Ste. Marie          16542 Museum Ship Valley Camp

This query works just as expected. Now, let's try that reader's suggestion and move the restriction on population into the FROM clause. By doing that, we explicitly make c.population >= 5000 one of the join conditions. It's one of the join conditions even though it doesn't link columns from two tables. Following are the results:

SQL> SELECT c.city_name, c.population, b.business_name
  2  FROM up_city c LEFT OUTER JOIN up_business b
  3    ON c.city_name = b.city_name
  4*  AND c.population >= 5000;

CITY_NAME            POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Munising                   2539
Ishpeming                  6686 DA Yoopers Tourist Trap
Sault Ste. Marie          16542 Museum Ship Valley Camp

Look at this: Munising is back in the game. Why? My query still contains the condition: c.population >= 5000. Why does Munising, which clearly violates my condition, still show up in the results? This behavior came as a complete and utter shock to me. I was quite dumbfounded and convinced I'd uncovered a rather serious bug in Oracle's software. But it was my mental model which needed adjusting, not Oracle's software.

The Explanation

Remember my mental definition for join condition. I had thought that the use of the ON clause versus the WHERE clause was mere eye-candy, and that all conditions were treated identically no matter where they were placed in the query. That's not the case. By moving the restriction on population from the WHERE clause to the FROM clause, I made it one of the join conditions, thus changing the semantics of my query. To understand why that affected the results the way it did, you need to understand conceptually how a query with an outer-join is executed:

  1. First, a Cartesian product is produced from the two tables involved in the join. This results in all combinations of all rows. The join conditions are then evaluated for each row in the Cartesian-product.
  2. If the join conditions evaluate to TRUE, the row in question is retained in the result set.
  3. If the join conditions evaluate to FALSE, the row is still retained. The columns from the non-optional table (up_city in my case) are preserved, and the columns from the optional table (up_business) are set to NULL.
  4. Any duplicate rows as a result of step 3 are ultimately removed.
  5. The WHERE clause is evaluated, and any rows not satisfying the conditions in the WHERE clause are removed.

The key to why Munising shows up in the results lies in step 3. Join conditions are explicit in the SQL92 syntax. By moving c.population >= 5000 into the FROM clause, I made that condition one of the join conditions, and moved it's evaluation up from step 5 to step 3. Thus, in step 3, for my query, Oracle is evaluating the following:

c.city_name = b.city_name AND c.population >= 5000

Clearly for Munising, with its population of only 2,539, this expression evaluates to FALSE. However, it doesn't matter, because the expression is evaluated as part of step 3. The row is still retained.

Updating My Mental Model

You can see that the ability to make join conditions explicit broadens the scope of what can be considered a join condition. I had to update my mental model of how joins work to include the following definition:

A join condition is any condition that I explicitly specify in the FROM clause, regardless of what columns, if any, it references. Join conditions may still be specified in the WHERE clause (for backwards compatibility), in which case they are those that use a comparison operator (such as =) to link a column from one table to a column from another table.

Related Reading

Oracle SQL Plus Pocket Reference
A Guide to SQL*Plus Syntax
By Jonathan Gennick

It's worth pointing out that with an inner-join, it won't make a difference if you move a condition from the WHERE clause to the FROM clause. That's because an inner-join doesn't allow for optionality: all join conditions must evaluate to TRUE or a row is rejected.

I don't know about the efficiency aspect of that one reader's question as to whether it might sometimes be more efficient to evaluate a condition in the FROM clause rather than the WHERE clause of a query. He hails from the DB2 world where, he tells me, queries do sometimes execute faster if you push restrictions into the FROM clause. I've run a few tests with Oracle, and always Oracle comes up with the same execution plan each way. I'm keeping an open mind on the issue, but at the same time my advice is not to play such games with your queries. Aim for clarity. Don't obfuscate things by moving what should be a non-join condition into the FROM clause.

New Possibilities

Now that I've aligned my mental model of joins to reality, I realize that there is a benefit to the way things work. The ANSI committee wasn't sitting around trying to make things confusing for us. Using Oracle's old syntax for outer-joins, it would never have been possible for a condition such as c.population >= 5000 to be ignored. That condition would never have been considered a join condition, because it doesn't reference one column from each table. Only using the SQL92 syntax do you have the choice of making it a join condition. And choice is what's important.

Acknowledgments

My thanks to the reader from the DB2 world (you know who you are) who first posed the question about pushing conditions into the FROM clause, which ended up pushing me down the path of discovery. Thanks also to Joe Celko, author of SQL for Smarties and former member of the ANSI SQL standards committee, for enlightening me as to the conceptual execution of an outer-join.

Jonathan Gennick is an O'Reilly Media, Inc. editor specializing in database and programming titles.


Return to the O'Reilly Network.


Comments on this article
Full Threads Oldest First

Showing messages 1 through 10 of 10.

  • Good Article!!!
    2006-04-19 12:27:05  Santosh_Gaikwad [View]

    I found this article interesting. The author has detailed the mechanics of joins in a very elaborate way. Very easy to understand and was a good refreshing moment reading the same.
    Thanks,
    Santosh Gaikwad
  • For too much text from small change in concept
    2002-12-10 11:32:47  anonymous2 [View]

    I am very sorry, but are a bit disappointed about the noise and bravo that comes with the subject of this article. It is quite a clear thing, not that difficult to grasp. The condition on inhabitants > 5000 is just being moved to an inline view, which is being outer joined to the base table.
  • Why so long?
    2002-10-24 09:45:30  anonymous2 [View]

    Why did it take Oracle ten years to implement this vital syntax?
  • Pushing the WHERE clause into the FROM clause
    2002-10-07 12:23:49  anonymous2 [View]

    Although I was already aware of the behaviour being described, I enjoyed this article because it explained it a lot more clearly than I would have managed.

    However (just to be picky) pushing the WHERE clause into the FROM statement to me implies structuring this SQL statement:

    SELECT c.city_name, c.population, b.business_name
    FROM up_city c LEFT OUTER JOIN up_business b
    ON c.city_name = b.city_name
    WHERE c.population >= 5000;

    to something like this:

    SELECT c.city_name, c.population, b.business_name
    FROM ( select *
    from up_city
    where c.population >= 5000
    ) c LEFT OUTER JOIN up_business b
    ON c.city_name = b.city_name;

    In principle if the SQL is processed according to the conceptual model then the filtering should take place before the join is performed.

    Adrian Miley

  • Donald Bales photo Refreshing
    2002-10-04 07:44:48  Donald Bales | O'Reilly Author [View]

    This is an excellent article. It will help clarify the mechanics of SQL92 outer join syntax for anyone who uses SQL to access a relational database, Oracle or otherwise. It is especially helpful for relational database users whose primary experience is using Oracle, because Oracle has been slow to implement this syntax (no doubt because their existing syntax has been used since the beginning of relational database technology, so why change something that has works?). Even more impressive though is Jonathan's courage to admit that he doesn't know-it-all and is willing to share his learnings whilst taking on the risk that some know-it-all would publicly scold him because he admitted he learned something new. Thank-you Jonathan! Public ridicule is why adults stop learning. Most adults won't make themselves vulnerable to "ignorance" ridicule and therefore stop learning. AUTARCH, rather than ridicule people for admitting they've learned something and are willing to share what they've learned, be productive, take the risk, and share what you know.
    • Refreshing
      2002-10-04 15:42:31  Dave Rolsky | O'Reilly Author [View]

      Exactly where did I ridicule the author? I didn't call him ignorant, or scold him, or any of the other things you're implying I did.

      You might also notice that I pointed out that for a while, I too did not understand the difference between join and non-join conditions. But because he is an author and editor about _databases_, I would simply expect him to have known this a long time ago. That hardly constitutes public ridicule!

      Overall, I think that this is symptomatic of a larger problem. Part of this problem is the fact that SQL is not relational (so there are no real relational databases available). Given that most database practitioners learn solely from practice, combined with such fundamentally flawed tools, leads to a fundamental lack of understanding of basic principles. Is this all the fault of the author? Not by any means. Is it a problem? Yes.
  • This is a bit scary ...
    2002-10-02 12:39:50  Dave Rolsky | O'Reilly Author [View]

    I will confess that for quite a while, when I first started working with databases, I did not really understand that joins weren't just another part of the where clause.

    But they're not even remotely such a thing and frankly, I find it disturbing that someone billed as an "editor specializing in database and programming titles" might lack that fundamental knowledge about what relational databases are all about. This ignores, of course, that there are no truly relational products out there, including Oracle. Rather, they're all SQL databases, and SQL isn't truly relational.

    I think it behooves someone editing books about databases to understand that difference. It's pretty sad that I didn't know this for so long, but anyone involved in the creation of database books should have a pretty solid understanding of relational theory.
    • John W. Adams photo This is a bit scary ...
      2002-10-03 05:55:48  John W. Adams | O'Reilly Blogger [View]

      I think you are confusing the persona in which the article is written with the actual person who wrote it. When I'm explaining something, I often assume a persona in which I, too, am confused about something. I then give an explanation starting from the same point as those to whom I am explaining. That looks to me like what is being done here. Good pedagogy sometimes involves not being a showoff.
      • Jonathan Gennick photo Learning is nothing to be scared of
        2002-10-03 08:09:16  Jonathan Gennick | O'Reilly AuthorO'Reilly Blogger [View]

        I think you are confusing the persona in which the article is written with the actual person who wrote it.

        Thanks. I appreciate the support. However, the article is substantively correct. I wasn't at all confused about what a join is, but I was indeed surprised by the result of moving what I took to be a non-join condition from the WHERE clause into the FROM clause of a query that was doing an outer-join.

        To address the first poster, I don't see why anyone should find it "scary" that I managed to learn something new that I didn't know before. Just because I edit and write database books doesn't make me omniscient. And it would be utterly foolish for me to stop learning just because I'm an editor. For that matter, I'm actually learning a lot this week: I'm in San Francisco attending a Hotsos Clinic (about Oracle tuning) put on by Cary Millsap.

        The scary part, for me at least, was to write the article in a way that conveyed my initial ignorance, and then my subseqent enlightenment. I could have written the article such that I appeared to know all along what I was talking about. Instead, I shared my learning experience just as it really happened.
        • Learning is nothing to be scared of
          2002-10-03 13:39:49  Dave Rolsky | O'Reilly Author [View]

          But it still reflects a lack of knowledge about relational theory that is a bit unsettling to me.

          I think the key quote is this:

          I had thought that the use of the ON clause versus the WHERE clause was mere eye-candy, and that all conditions were treated identically no matter where they were placed in the query.

          Well, if you only know Oracle and SQL, that's pretty much what I'd expect. But you _should_ know more than that, shouldn't you?

          SQL (at least the version used by Oracle) obscures the difference between join and non-join conditions, but you should have known that already.

          And yes, learning is good, no matter what your position, but I'd still expect someone who writes and edits books about databases to know better.


Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom

Related to this Article

Access 2013 For Dummies Access 2013 For Dummies
March 2013
$24.99 USD

PHP, MySQL, JavaScript & HTML5 All-in-One For Dummies PHP, MySQL, JavaScript & HTML5 All-in-One For Dummies
by Steve Suehring
March 2013
$44.99 USD

Advertisement
Sign up today to receive special discounts,
product alerts, and news from O'Reilly.
Privacy Policy >
View Sample Newsletter >
  • Youtube
  • http://www.youtube.com/OreillyMedia
  • Twitter
  • Subscribe
  • View All RSS Feeds >
O'Reilly Media

800-889-8969 or 707-827-7019
Monday-Friday 7:30am-5pm PT
©2011, O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  • About O'Reilly
  • Academic Solutions
  • Contacts
  • Customer Service
  • Careers
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly
  • Community
  • Authors
  • Forums
  • Membership
  • Newsletters
  • RSS Feeds
  • User Groups
  • Partner Sites
  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com