“I thought the whole point of Rails was that it hid the database and just let me write Ruby code! Why do I need to know about these things?”
This excerpt is from Learning Rails . Most Rails books are written for programmers looking for information on data structures. Learning Rails targets web developers whose programming experience is tied directly to the Web. Rather than begin with the inner layers of a Rails web application -- the models and controllers -- this unique book approaches Rails development from the outer layer: the application interface. You can start from the foundations of web design you already know, and then move more deeply into Ruby, objects, and database structures.
Rails has all kinds of features for building web applications, but its foundation component is the way that it lets you get information into and out of relational databases. You can build simple applications without knowing much about databases, just telling Rake to do a few things and making sure you gave Rails the right data type for each field. You don’t need to know Structured Query Language (SQL), the classic language for working with databases.
Building a more complex Rails application, though, really demands at least a basic understanding of how relational databases work. It helps to think about tables and their connections when defining Rails models, at least when you first set them up.
The foundational idea underneath relational databases is a simple but powerful structure. Each table is a set of sets, and within a single table all of these sets have the same data structure, containing a list of named fields and their values. For convenience, each set within a table is called a row, and each field within that row is part of a larger named column, as shown in Figure B.1, “The classic row–column approach to tables”. It looks a lot like a spreadsheet with named columns and unnamed rows.
The resemblance to a spreadsheet is only superficial, however. Spreadsheets are built on grids, but those grids can have anything in them that any user wants to put in any given place in the spreadsheet. It’s possible to build a spreadsheet that is structured like a database table, but it’s definitely not required. Databases offer much less of that kind of flexibility, and in return can offer tremendous power because of their obsession with neatly ordered data. Every row within a table has to have the same structure for its data, and calculations generally take place outside of the tables, not within them. Tables just contain data.
You also don’t normally interact with database tables as directly as you do spreadsheets, though sometimes applications offer a spreadsheet-like grid view as an option for editing them. Instead, you define the table structures with a schema, like that shown in Table B.1, A schema for the table in Figure B.1, “The classic row–column approach to tables””, and move data in and out with code.
Table B.1. A schema for the table in Figure B.1, “The classic row–column approach to tables”
Depending on the database, schemas can be very simple and terse or very complicated and precisely defined. Rails isn’t that interested in the details of database schema implementations, however, because its “choose your own database backend” approach limits how tightly it can bond to any particular one. As a result, Rails takes the terse and simple approach, supporting only these basic data types:
:string :text :integer :float :decimal :datetime :timestamp :time :date :binary :boolean
Rails won’t create a database schema much more complicated than the one shown in Figure B.2, Multiple but unconnected tables in a database”, though it will probably add some extra pieces to the schema that you don’t need to worry about. There are timestamps, which Rails adds even when you don’t ask for them, and IDs, which you don’t control but which come up in URLs all the time. The Rails ID serves another function inside the database: it’s a primary key, a unique identifier for that row in the table. Databases can find information very rapidly when given that key.
You can build many simple applications on a single-table database, but at some point, working within a single table is just way too constraining. The next step might be add another table to the application, say for some completely separate set of issues. A users table that identifies users and their administrative roles might be the next thing you add to an application, as shown in Figure B.2, “Multiple but unconnected tables in a database”.
With these tables, you can write an application that checks to see if users have the rights to make changes to the other table. You could add lots of other disconnected tables to the database as well (and sometimes you’ll have disconnected tables), but at the same time, this isn’t taking advantage of the real power of relational databases. They’re much more than a place to store information in tables: they’re a place to managed related information effectively and efficiently.
So, how does that work? Remember the primary key? Rails uses it to get to records quickly, but the database can also use it internally. That means that it’s easy for data in one table to refer to a row in another using that same primary key. That yields structures like the one shown in Figure B.3, “Connected tables in a database”.
Establishing connections between tables is simple—one just has to reference the other using its key. When you link to a record in another table by storing the key for that record in your own table, that key is called a foreign key. By using foreign keys to connect to primary keys, databases can assemble related information very quickly. Whose “2007 Best Handwriting” award was that? Student 1, who we can find out is Giles Boschwick by checking the other table.
You can link tables to tables to tables. You might, for example, have a table that lists who presented each award, which links to the award table the same way that the award table linked to the students table, as shown in Figure B.4, “Connected tables in a database”.
With tables linked this way, you can ask questions like, “Which presenters gave Jules Bloss Miller awards in 2007?” and get the answer of, “Dr. Milo Jonstein, DDS” and “Mr. James Withers.” You—or more likely a program—can follow the IDs and the links to those IDs to come up with the right answer.
These kinds of links allow the table doing the pointing to establish one connection per row. That might lead to no connections to some rows in the targeted table, one connection to a row, or even many connections to given rows in the targeted table. You can constrain those options, but there’s one kind of connection that isn’t supported by this simple mechanism. It doesn’t allow for many-to-many relationships.
A classic many-to-many relationship is students and classes. Often, each student takes many classes. Each class contains many students. The mechanism shown in Figures B.3 and B.4 isn’t very good at this. You could create multiple fields for holding multiple links to the same table, but any time you have more than one field pointing at the same table, you’re setting yourself up for some complicated processing. It’s hard to know how many pointers you’ll need, and all of your code would have to look in multiple different places to establish connections. None of this is fun.
It’s fine, even normal, to have multiple foreign keys in a table, as long as they all reference different tables.
There is, however, a convenient way to represent many-to-many relationships without creating a tangle. Instead of putting pointers from one table to another inside of the table, you create a third table that contains pointers to the two other tables. If you need to represent multiple relationships between different rows in the two tables to be joined, it’s easy—just add another row specifying the connection in the table representing connections.
Figure B.5, “Connected tables in a database” shows the students table, a new courses table, and a new table connecting them. (For convenience of drawing, the courses table has its ID values on the right side, and the join table has its mostly useless ID in the middle, but it doesn’t really matter. You can leave IDs out of join tables entirely if you want.)
If you work through the connections, you can see that course 5125, Mathematical Opera, is popular, at least in these tiny fragments of what is probably a larger data set. It has Jules Miller, Greva James, and Giles Boschwick in it. Working the other direction, you can also see that Jules Miller is taking both Mathematical Opera and Lavatory Decorations of Ancient Rome. Using this approach, students can have many courses, and courses can have many students, and all our queries need to do is ask for all of the connections.
In addition to linking through keys, there’s one other critical aspect of database table design that you should know before embarking on writing applications: data granularity matters! If you read traditional explanations of relational databases, you’ll see a lot about normalization, which is the process of creating tables that can be easily manipulated through code.
Much of normalization is about reducing duplication, which is usually best done by breaking data into multiple tables, as shown earlier. Another key part, however, is deciding how small (or large) each field in a table should be.
In the students table, shown originally in Figure B.1, “The classic row–column approach to tables”, each piece of a student’s name had a separate field. Why? Well, it’s pretty ordinary to want to sort a list of students by last name. It’s also normal to leave out middle names in most correspondence. That’s much easier to do when the pieces are already broken out, and avoids the problem of figuring out which part of a name is which algorithmically. In the presenter’s table in Figure B.4, “Connected tables in a database”, it probably wasn’t worth breaking out those pieces—the name would go on a certificate once and never be examined again.
Doubtless, some purists would want those presenters’ titles and names broken into smaller pieces, and you could do that. The question, though, is always going to be what you want to do with the data. If you’re not interested in sorting the presenters’ names, it may not be worth the extra effort on your part of fragmenting them. Similarly, if you only use street addresses for mailing, it might make sense to keep them as one field rather than separating house number from street number.
Problems, of course, arise when you realize that you really did need to sort a list of addresses by street or presenters by last name. Splitting existing data into smaller pieces once you’ve already built an application can be extremely annoying. For your first few applications, you may want to err on the side of breaking things up, as it’s easier to recombine separate fields than to split them out again.
For more than a decade, most web applications that used a database used Structured Query Language (SQL) to move information into and out of databases. SQL is a powerful tool for creating and manipulating database structures, as well as for moving information in and out of those structures, but it’s tightly focused on database projects only. You can’t build a complete web application using SQL, so historically developers have written the bulk of their applications in another language, and then made SQL calls against a database. Developers needed to know both SQL and the other language.
Rails changes all of this, taking the position that it’s better to
manage data and logic in the same language, in this case Ruby.
ActiveRecord abstracts the SQL calls away, though they still exist if
you look through the development logs. At the same time, Rake and
migrations handle the care and feeding of the database, defining and
creating (or removing) tables. You define the tables in Ruby, and call
rake db:migrate to make things
If you already know SQL, you have a bit of an advantage when it
comes to debugging Rails applications by checking logs and tinkering
inside of the database. You may, however, have a disadvantage in getting
started with Rails, as Rails pretty much expects developers to put the
SQL toolkit away. There may be times when SQL is still actually
necessary, so Rails supports a
find_by_sql method, but
in general, if you find yourself writing SQL, odds are good that you
just haven’t found a better way to do things in Rails itself.
You do have one critical choice to make regarding databases, however: which database to use with Rails. By default, since Rails 2.0.2, SQLite is the default database. It’s easy to use with minimal configuration, keeps its information in a single (easily transferred) file, and is widely available.
For many applications, though, you will want to consider heavier-duty options that can handle more simultaneous connections. For many people, MySQL will be the right choice—heftier than SQLite, but not as intimidating as PostgreSQL. Bindings for all three are built into Rails by default, so that part’s relatively easy, and bindings for many other databases are available as plug-ins.
You don’t need to be a database expert to learn Rails. You will want to have administrators who know how to manage, optimize, and backup whatever database system you choose to use for deployment—but those issues should get addressed after you’ve finished learning Rails. You may want to pick up Learning MySQL (O’Reilly, 2006) if you’re new to relational databases and you want to take your knowledge to the next level.
If you enjoyed this excerpt, buy a copy of Learning Rails .
Copyright © 2009 O'Reilly Media, Inc.