Visualizing Database Information with Tableauby Ben Lorica
Tableau Software is a provider of visual analysis and reporting tools. Although there are several tools available to help users efficiently and easily create pivot tables or cross-tabulations, Tableau allows you to take it one step further: visualize the cross-tabulations in real time. With Tableau, you can drill-down database tables, visually.
Tableau's products allow a business user to plow through large dimensional databases quickly. As we discovered, it quickly generates graphs even against large database tables. By giving an analyst the ability to spot trends and relationships visually, information buried in the data becomes more readily available.
Tableau's products came out of R&D projects of Stanford University professor Pat Hanrahan. Hanrahan was a founding employee of Pixar and was the chief architect of RenderMan--a graphics protocol widely used in the film industry. Led by Hanrahan and Chris Stolte, the technologists at Tableau developed VizQLTM (Visual Query Language), a declarative language that allows a user to interact with a database and get graphical/visual results. Tableau's easy-to-use and intuitive GUI hides VizQL, although there are plans to expose it in future versions.
Installation and Data Access
Tableau is a Windows client application. Installation on a 512MB Windows XP machine was straightforward. Unlike other analytic and reporting products, installation and connecting to your data sources is simple: your team of analysts can be up and running and using the tool in minutes.
The current version can access Microsoft Data Sources (Excel, Access, SQL Server), MySQL, Oracle, Hyperion Essbase, IBM OLAP Server, and delimited text files. The software comes with some sample data in Excel. After installation, we had no trouble accessing the MySQL server installed on the same machine.
The quickest way to familiarize yourself with Tableau is to take a few minutes to go through the Getting Started Tour, found under the Help menu.
In Version 1.0, Tableau assumed that all of your data was in one database table or delimited text file. Version 2.0 places no limits on the number of table joins. Table joins are easy to perform through the GUI, and both inner and outer joins are available.
Version 2.0 allows the user to perform analysis on a subset of a larger data set: for extremely large databases, users can perform an initial analysis on a subset of a table (or the resulting join of multiple tables) before performing the final analysis on all the data. You can extract the subset in three ways: using filters, randomly, or by simply limiting the number of records. Exploratory analysis on a smaller random sample speeds up performance, and Tableau also allows offline analysis on the extracted sample. This convenient feature allows you to perform offline analysis and resume analysis using your most current data once you are able to reconnect to your network.
Once you connect to a data source, Tableau automatically divides the fields of your data source into dimensions or measures. In our example, we connected to a MySQL table consisting of some fake weekly book sales data, and joined to another table (using the ISBN), which contained attributes of a book: the programming language, DBMS, operating system, and where it lies in a product hierarchy (two database fields representing the two levels in a hierarchy).
In Tableau, you perform analysis by dragging fields into "shelves," which are places in the user interface where the fields being used are displayed. We found this paradigm to be very intuitive and quickly adopted to this mode of analysis. Cross-tabs and pivot tables are easy to generate in Tableau. In our example involving the join between two tables, we found that Tableau generated such tables quickly (see Figure 1).
Most business intelligence tools are capable of generating pivot tables and cross-tabs, but with Tableau, you are able to do so within minutes of installing it: simply point it to a data source and you are ready to pivot and drill-down to your heart's delight. Other tools require the creation of cubes or other special data structures.
Cross-tabs and pivot tables are interesting, but it is frequently difficult to discern patterns when staring at numerical information. Processing large quantities of information in graphical form is easier. Tableau allows you to start analyzing the same information visually, and it generates graphs as quickly as it generates tables (see Figure 2).
Small dimensions (dimensions with a few distinct categories) are easy to work with in tabular form, and the information become easier to process visually using Tableau's graphs. Large dimensions are a challenge even with Tableau, but the ability to filter and work through large dimensions in smaller chunks helps. A new feature of Version 2.0 (animation or data playback), expedites the analysis of large dimensions.
As a default, Tableau uses the same scale range (the minimum and maximum of a scale) for a group of graphs, a conservative approach that guards against confusion and the potential for misinterpretation. However, for a large dimension with many categories, a consistent scale can lead to some visually unsatisfactory graphs. Tableau's answer is a variable scale axis feature that allows a user to set the scale range for each individual chart in a group.
New features in Version 2.0 address the scaling problem: dynamic calculations and animation. In our example, we created pages of graphs using the categories and used the animation feature to scan all the categories separately. Animation also allows you to process large dimensions in an efficient manner. Setting up an animation takes only two mouse-clicks. You can also take advantage of a new feature called dynamic calculations. In our example, we quickly plotted the year-over-year percentage growth by category, thus allowing us to compare categories using the same scale. Note that Tableau allows you to do these dynamic calculations in a few steps; you don't have to perform these complex calculations in advance in your database (see Figure 3).
Drilling-down is the process of adding more detail. As an example, you can start with sales by year, for a region and drill-down geographically by adding states. In Tableau, you can accomplish this by adding more fields in the appropriate shelves. Although other tools allow you to drill-down in tabular format, Tableau allows you to drill-down visually and in tabular form. Being able to visually explore our data in this manner allowed us to detect patterns and trends quickly. Tableau is also able to recognize the hierarchies present in multidimensional data sources, in which case drilling down amounts to clicking on the plus sign next to a field when it is on an appropriate shelf. Tableau automatically creates hierarchies for date fields (year, quarter, month, and day) and you can easily reset the start of the fiscal year.
There are two other new features in Version 2.0 worth mentioning: dynamic data inspection (DDI) and exploratory modeling. DDI allows you to isolate outliers by highlighting them in a graph and excluding them from further analysis. A simple mouse-click allows you to re-introduce the excluded points in a later analysis. DDI also allows you to highlight and inspect regions of a graph. Exploratory modeling makes linear models and trendlines available in Tableau. These easy-to-use analytic tools are now available to your team of analysts, reducing the need for other statistical packages. The analytic tools also fit nicely inside the rest of Tableau: analytics and trendlines get recomputed with the ease and speed Tableau generates families of graphs.
In the course of data exploration, an analyst might uncover a graph that she wants to disseminate. In Tableau, you can easily copy an image and paste it to any Office application. Like a web browser, there is a back button that allows you to go back to previous steps in the analysis. As with a database reporting tool, an analyst can save graphs and tables in a workbook and rerun them later on the same database, or run them on a different database with the same field names.
Tableau takes minutes to install, is easy to learn, and definitely fills a need. We can think of no other tool that allows an analyst to explore dimensional data visually with so little setup, just as well as it allows traditional tabular exploration. Version 2.0 now adds some analytic tools, better data access through table joins, visual data exclusion and exploration, and animation. The animation feature is well suited for viewing large categories and time-series data.
Tableau is a proprietary Windows client application. As Linux and Mac users, we eagerly await ports to our favorite operating systems. Geographic mapping is not available, although there are plans to incorporate it in future versions. The open source PostgreSQL database is currently not supported. Although it provides some simple modeling and forecasting capabilities, if you need sophisticated predictive or statistical modeling, stick to your favorite statistical or data mining software.
Tableau is a welcome tool in the arsenal of an analyst working on dimensional data. Don't take our word for it! Download the full-fledged Tableau product as part of a two-week trial, and be amazed at how quickly you can install and use it. Once you go down the path of visual exploration, you may want to replace your current business intelligence and reporting tools.
- Microsoft Windows XP, or Windows 2000
- 128MB RAM recommended
- 50MB free disk space for installation
- Web browser (for example, Microsoft Internet Explorer or Netscape Navigator)
- Microsoft Excel or MS Access (versions 2000 or later)
Database compatibility requirements:
- Microsoft Excel: Excel 2000 or later
- Microsoft Access: MS Access 2000 or later
- MS SQL Server: SQL Server 2000 or later
- MS Analysis Services: MSAS 2000 or later
- MySQL v3.23 or later
- Oracle v9i or later
- Hyperion Essbase: Version 7.1, 7.x or later
- IBM DB2 OLAP Server: Version 8.1 or later
Ben Lorica is the Senior Analyst in the Market Research Group at O'Reilly Media, Inc..
Return to O'Reilly Databases