In the last column, I introduced you to one of the most important workhorse commands in SQL -- the
JOIN command. Now we'll take a good look at what joins are and how to create them by focusing on the
INNER JOIN, the most commonly used SQL
JOIN clause is probably one of the most used and most confusing parts of SQL, I'll do my best to demystify it over the next few columns.
One of the fundamental precepts of SQL is that databases are relational. As we discussed in previous columns on database design, one of the goals of good database development is to use relationships to decompose data tables into the most atomic constituents possible.
While that's the point of good database design, the point of a good SQL statement is to reverse the process and pull that data back into some semblance of its original shape. In essence, we must create a virtual table from the information stored in two (or more!) difference tables in the database.
As I discussed last week, you can use the
SELECT ... WHERE statement to combine two different tables on a common field:
SELECT * FROM Artist,CD WHERE (CD.ArtistID=Artist.ArtistID)
which can be translated into the equivalent
SELECT * FROM Artist INNER JOIN CD ON CD.ArtistID=Artist.ArtistID
What are your questions, or your comments, about using the JOIN statements.
Also in aboutSQL:
In both cases, we're using a
SELECT statement to pull data from the tables in the database, so the only real difference is syntax, right? Well, yes. These queries test the equality of the tables -- finding the places where the two columns in the join overlap. So obviously the question becomes ...
Well, there isn't a completely compelling reason to prefer one way of writing a
JOIN command over another. Didn't think that would be my answer, did you? But's important that you understand that they are equivalent statements if you work with SQL created by other people -- as long as you can translate in your head you should be fine.
BUT ... Oracle doesn't give you the option of using the
INNER JOIN syntax. Oracle uses the symbolic syntax exclusively and will throw an error if you try to use the inner join method. Most other databases are more forgiving.
One argument in favor of using the explicit
JOIN syntax, especially as we learn more about the various kinds of joins, is that the text makes it explicit which kind of join we're trying. I find
x LEFT OUTER JOIN y clearer than the symbolic
x *= y. The downside of this approach is that your SQL code will simply not work in Oracle. Here's one area where your choice of databases makes a difference in how you'd develop SQL code.
You'll notice that in both examples of the SQL
JOIN, the join was performed where
Note that we've fully-qualified the fields that will be joined -- this is required whenever a column name is ambiguous. This has not been a problem when we've been dealing with single tables (DBMSes make sure column names are unique in a single table), but will become an increasingly common issue as we join tables together. DBMSes will generate an error when the query is executed if the column names are ambiguous.
While we're looking at syntax, we can also examine using an alias for a table name. Typing SQL statements like:
SELECT * FROM Artist,CD WHERE CD.ArtistID=Artist.ArtistID
can become more time-consuming as more tables are added to the
JOIN statement. It's extremely common to see DBAs to create queries that have shorter table names by aliasing the table name to a one- or two-letter abbreviation. For example,
SELECT * FROM Artist AS A, CD AS C WHERE C.ArtistID=A.ArtistID
AS statement can be used in many DBMS to assign an alias to a table name that can be used throughout the SQL statement. Of course, Oracle is slightly different (and many other databases support this notation as well).
SELECT * FROM Artist A, CD C WHERE C.ArtistID=A.ArtistID
AS keyword, otherwise everything is the same.
If we were after just the name of the artist and the album for a summary list, we could combine all these techniques together into the following block of SQL
SELECT A.ArtistName, C.Title FROM Artist A, CD C WHERE C.ArtistID=A.ArtistID
This doesn't look like that much of a savings, but wait until we're writing complex multi-table joins with aggregate functions and filtering rules and then you'll appreciate these tips!
In this article, I showed you a little bit about the
INNER JOIN statement and some general syntax that applies to all of the SQL
JOIN statements. Next week, I'll jump into the various other types of joins --
RIGHT OUTER, and
Until then, feel free to contact me with comments and questions about this column, or use the Talk Back feature to share your comments with other readers. Believe it or not, I read all the e-mail you send me and answer the vast majority of it. I'm contemplating using this column to address real-world SQL problems -- be it a thorny SQL challenge, a poorly written SQL query you inherited from another developer, or a newbie question.
If you'd like to see that sort of analysis, let me know -- and if you need that sort of analysis, drop me a line. I won't promise to solve your problem, but I'd like to see what questions are out there. I've gotten some mail already and am incorporating the suggestions into planning for future columns.
John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.
Read more aboutSQL columns.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.