Creating neat, informative summaries out of huge lists of raw data is a common challenge. And while Excel gives you all the tools you need to create such summaries, the actual work of writing formulas, cutting and pasting information, and organizing your totals into a new table can be extremely tedious. Even worse, this approach isn't very flexible. For example, once you've created the perfect summary that compares, say, sales in different regions, if you want to compare sales across different product lines or different customers, you'll need to start from scratch and build a whole new report.
Fortunately, Excel has a feature called pivot tables that can solve all these problems. Pivot tables quickly summarize long lists of data. By using a pivot table, you can calculate summary information without writing a single formula or copying a single cell. But the most notable feature of pivot tables is that you can arrange them dynamically. For example, say you create a pivot table summary using raw census data. With the drag of a mouse, you can easily rearrange the pivot table so that it summarizes the data based on gender or age groupings or geographic location. This process of rearranging your table is known as pivoting your data: you're turning the same information around to examine it from different angles.
Pivot tables are a hidden gem in Excel. Many otherwise experienced spreadsheet fans avoid them because they seem too complicated at first glance. The real problem is that pivot tables are rarely explained properly. Most books and the online Excel help use no end of cryptic jargon like "cross-tabulated computations" and "n-dimensional analysis." But if you stick with this chapter, you'll discover that pivot tables are really just a convenient way to build intelligent, flexible summary tables—nothing more, and nothing less.
When you analyze large amounts of data, you can look at the same information in many different ways. How you organize and group the data often determines whether you find or overlook important trends.
For example, consider the small table of information shown in Figure 1. This table lists all the customers of a small business, along with information about their gender, the city they live in, and their level of education. Looking at this table, an important question comes to mind: is there a relationship between these different pieces of information and the amount of money a customer spends?
Figure 1. In this example, there are only 10 records, so patterns aren't difficult to spot. However, if you extended this list to hundreds or thousands of rows, you would definitely need a summary table to spot any relationships that might exist.
To look for trends and patterns in the customer list, it helps to build a summary table—a table that tallies key amounts, like the average amount spent for a customer in a specific city, education level, or gender. However, there are several potentially important relationships, and, therefore, several types of summary tables that you could create. Pivot tables are the perfect tool because they give you almost unlimited flexibility when you want to figure out different relationships. But before you learn about how to build pivot tables, it first helps to understand what life is like in Excel without them—because only then can you see why pivot tables make sense and decide whether you need them in one of your own workbooks.
The most basic way to calculate summary information is to
SUMIF( ) and
) functions. For
example, to find the average annual purchases for an individual in
New York, you first calculate the total purchases made by all New
York customers. Here's the formula you need:
=SUMIF(C2:C11, "New York", E2:E11)
This formula scans the City column (C2 to C11) looking for the text. Every time it finds a match, it adds the corresponding purchase amount from the Annual Purchases column (E2 to E11).
Once you have the total purchases, you need to divide this figure by
the number of New York customers. To count the number of New York
customers, you use the
COUNTIF( ) function.
Here's the revised formula for finding the average annual purchase amount:
=SUMIF(C2:C11, "New York", E2:E11)/COUNTIF(C2:C11, "New York")
Finally, you can make this formula more generic so that you don't need to create a new version for each city. Instead, change the formula so that it retrieves the text it should match (the city's name) from a cell just to the left of where you're going to place the formula. (As shown in Figure 2, this group of "city" formulas resides in cells B15, B16, and B17.) Then, you can copy your formula into multiple rows. The formula won't change, but, in each row, the cell on the left will have a different value, instructing the formula to search for customers in a different city.
Figure 2. These three summary tables retrieve their data from the table shown in Figure 1 and calculate average purchases by grouping the customers into different categories. The purchase-by-city summary shows that there isn't a significant amount of difference based on location (although Seattle customers tend to spend a little less). The purchase-by-education summary demonstrates a dramatic difference, with less educated individuals making smaller purchases. Finally, the purchase-by-gender comparison turns up no variance at all.
Here's the perfected formula:
=SUMIF($C$2:$C$11, A15, $E$2:$E$11)/COUNTIF($C$2:$C$11, A15)
Note that in this formula, all the search ranges are fixed as absolute references using the dollar sign ($). The search text isn't fixed—that way, when you copy the formula to a new row, the formula uses the search text on that row.
Figure 2 shows several summary tables that use this technique for calculating averages.
In this example, building each table isn't terribly difficult, but the situation clearly becomes more tedious the more ways you want to compare the same data. It's not difficult to imagine a more realistic scenario where you might want to look at customer purchases based on age bracket, income level, and the answers to different customer survey questions. To get the full picture with this information, you would need to build each table from scratch.
In fact, in some of these more complex scenarios, you might need to group and then subgroup your information. Figure 3 shows a more advanced example of a table that calculates the variance in average annual purchases by city, and then shows the subdivided totals in each city by gender.
Figure 3. This summary table is a little more advanced. It groups and then subgroups data, which lets you find hidden trends. For example, in this case it identifies New York females as the best purchasers. Of course, there are far too few rows in this example for the results to be truly persuasive. In a table with thousands of rows, however, a grouped and subgrouped breakdown becomes much more meaningful.
The summary in Figure 3 performs
a two-dimensional comparison. In other words, it
compares two different groupings—one by city in different rows,
and one by gender in different columns. This is a step up from the
one-dimensional summary tables you saw earlier, but
it's also far more difficult to correctly calculate.
You could use the
SUMPRODUCT( ) function, or create the table
in two steps (first grouping the records by city, and then totaling them by gender).
In either case, life gets much more complicated, and
that's when pivot tables really shine.
With a pivot table, you can create summary tables like the ones shown in Figure 2 and Figure 3 just by choosing the columns you want to compare. And once you've built your summary, you don't need to stick with it—instead, you can transform a purchase-by-education table into a purchase-by-city table just by dragging and dropping. You can even drill down into the details, apply filter criteria, and apply advanced subtotaling calculations. The following sections describe all of these techniques.
Now that you've learned the role that pivot tables play in summarizing data, it's time to create your own. Before you begin, you need to have a long list of raw data that you want to summarize. You could use the customer list from the previous example, but it's too small to really demonstrate the benefits of pivot tables. A better example is something like the list of order information shown in Figure 4.
TIP: You can also build a pivot table based on records that you select from an external database. But before you take this step, you must configure your database as an Excel data source.
Not all data is suited for a pivot table. To work well, your data needs to meet a few criteria:
Figure 4. This worksheet shows some entries from a list of 2,155 grocery store items that have been ordered. Lists like this make for great pivot table candidates.
It must include at least one column that has duplicate values. For example, in the order table shown in Figure 4, there are multiple records with the same value in the Customer column. Accordingly, you can create a separate group of ordered items for each customer..
It must include some numeric information. This is the information you'll use to create subtotals. Often, you'll be interested in generating a simple count, total, or average, although you can also find maximum, minimum, and standard deviation, and use your own formulas.
NOTE: It's technically possible to create subtotals without using a numeric column. In this case, the subtotals just count the number of values in the group. This approach is occasionally useful, but it's not as powerful as other types of subtotals.
The order information table is perfect for a pivot table because there are several columns you can use to group the order rows. These include:
Product and Category. Find out how well specific products are selling or what the hottest product categories are.
Customer. Find out who's making the most purchases.
Ship City and Ship Country. Find out where the majority of your customers reside.
Remember, a pivot table can handle all of these comparisons. You don't need to choose one or another before you start building it.
TIP: The best way to learn about pivot tables is to perform the steps in this chapter, and then start experimenting. If you don't happen to have a table with hundreds of records on hand, you can download the workbook shown in Figure 4 from the "Missing CD" page at MissingManuals.com. It gives you 2,155 rows to summarize and pivot to your heart's content.
Creating a new pivot table is a two-step process. First you need to run the PivotTable and PivotChart Wizard, which asks you to identify the data you want to summarize and select the location where you want to place the pivot table. The next step is to actually define the structure of the pivot table and try out different ways of organizing and grouping your data.
The following steps lead you through the first step in creating a new pivot table:
Select the range of cells that you want to use for your pivot table.
If you plan to add more rows later on, it might be worthwhile to use a data list instead. That way, when you add rows to the data list and refresh the pivot table, Excel automatically includes them in the pivot table.
Select Data --> PivotTable and PivotChart Report.
The first step of the PivotTable and PivotChart Wizard appears. (Figure 5 shows all three steps in the pivot table creation process.)
Select the type of data you're using. Usually, this is "Microsoft Office Excel list or database," which corresponds to a range of cells or a data list on the worksheet.
If you want to create a pivot table based on rows from an external database, choose "External data source." You can also create a pivot table that uses multiple separate ranges, which is occasionally useful if you need to combine reports in separate worksheets or workbooks. In this case, select "Multiple consolidation ranges."
Choose PivotTable for the type of report.
You can also create a pivot chart and pivot table in one step by selecting "PivotChart report (with PivotTable report)." The end of this chapter explains pivot charts.
Click Next to continue.
In the final step, you can choose a location for the pivot table.
Select "New worksheet" to create a new worksheet for your pivot table, which is usually the easiest.
Pivot tables are fairly complex creations, so the easiest way to manage them (and keep them separate from the rest of your data) is to pop them into a new worksheet. Alternatively, you can choose "Existing worksheet" to insert your pivot table on a worksheet that's already in your workbook. In this case, you need to specify the cell reference for the top-left corner of the pivot table. If there's data under this cell or to the right of it, Excel may overwrite it as it generates the pivot table. Usually, the best approach is to place a new pivot table on a separate worksheet.
The last step of the wizard also gives you the chance to set some miscellaneous options (by clicking the Options button) or start laying out your pivot table (by clicking the Layout button). Stay away from these options for now. You'll find that it's easier to organize a pivot table on your worksheet after you create it.
Figure 6. This worksheet shows a newly created pivot table that doesn't yet contain any information. When you select a region inside the pivot table, Excel displays the PivotTable toolbar and shows the PivotTable Field List (as shown on the right), which contains a list of all the columns in the data that you're summarizing.
To build a pivot table, you simply drag columns from the PivotTable Field List on the right side of the Excel window, and drop them into one of the regions on the pivot table. As you work, Excel generates the pivot table, updating it dynamically as you add, rearrange, or remove columns.
NOTE: Excel refers to all your source data's columns as fields.
To understand how to fill up a pivot table with data, you need to know how each region works. Altogether, a pivot table includes four regions:
Data Items. These are the fields that you want to subtotal. Usually, this is the numeric information you use to build averages and totals.
Column Fields. Often, you want to subdivide your data in more than one way at once. Like row fields, column fields create groups. Excel adds one column for each group, according to the field you choose. For example, if you use Category for a row field and Ship Country for a column field, you'll wind up with a table that divides sales figures into rows by product category, and then divides each category row into columns by country.
It really doesn't matter whether you use a field for row grouping or column grouping. The pivot table shows the same data either way, but one approach may be more readable than another. For example, if you have a field with extremely long names, it will probably work better as a row field than as a column field (where it would stretch out the width of the whole column).
Also, consider how many different groups you want to create. For example, if you want your pivot table to compare sales by category and country, and your list features 5 categories and 20 countries, you'll probably be best off if you use the country field as a row field and the category field as a column field. That keeps the table long and narrow, which is easier to read and print.
To get a better understanding of how to create a pivot table, it helps to follow along with a basic example. These steps lead you through the process of creating a summary that compares the products and shipping locations shown in Figure 4.
From the PivotTable Field List, drag the Ship Country field to the Drop Column Fields Here region.
When you drop the field, Excel fills in the names of all the countries from the list from left to right, in alphabetical order. In other words, each country is listed in its own column.
TIP: If dragging and dropping is a little too awkward, there's another way to lay out a pivot table. In the PivotTable Field List window, simply select the field you want to add to the pivot table, and then, in the Add To list underneath, choose where you want to place the field. Then, double-click the field name to add it.
Figure 7. In this example, the list of products has already been added to the row area (column A), and the second grouping criteria (the list of columns for each country) is being dragged onto the chart. Notice that once a field is linked, its name is listed in boldface in the PivotTable Field List.
Now you need to choose what data you want to examine. Drag the Quantity field over the Drop Data Items Here region.
This step is designed to actually fill the table with data—specifically, the numbers of products that were ordered by customers in various countries. It's helpful to remember what's in the table just prior to this step: a list of products in column A and, from column B on, each column is labeled with a different country.
Once you complete this step, Excel generates the pivot table. However, Excel doesn't necessarily perform the calculations you want. In this case, Excel's standard choice is to perform a row count in each cell, which calculates how many order records it found for a given product and shipped to a given country. However, in this case, you actually want to count the total number of units shipped, not just the number of orders. That means that instead of counting the number of matching rows, you want to add the Quantity value for each row in the group. You'll make this change in the next step.
Double-click the Count of Quantity field header, which appears at the top-left corner of the table. In the "Summarize by" list, choose Sum instead of Count.
This option tells Excel to total the Quantity rows instead of counting how many there are.
NOTE: Excel provides support for a wide range of automatic subtotaling calculations. You can use counts, averages, sums, or find the maximum value, minimum value, or standard deviation.
Excel refreshes the pivot table with the new information, as shown in Figure 8.
Figure 8. Top: When you drag a data item onto your pivot table, Excel makes a quick decision about what type of subtotaling it thinks makes most sense. In this example, it chooses to simply count the number of orders for a given product and sent to a given country. In other words, if you look in the Argentina country column and the Boston Crab Meat product row, you'll see that there's one order. Bottom: In this case, it may make more sense to total the quantity of units in every order. Once you make this change—by double-clicking the Count of Quantity field header and, in the "Summarize by" list, choosing Sum instead of Count—you'll see that 20 units of Boston Crab Meat were shipped to Argentina.
NOTE: Pivot tables also calculate row and column subtotals. For example, if you want to find the total number of units shipped for a given product across all countries, scroll to the far right end of the chart. If you want to find the total number of units sold in a given country, scroll to the totals at the bottom of the chart.
This example built a fairly sophisticated two-dimensional pivot table. Most of the pivot tables you'll see in real life are two-dimensional, which means that they compare two different groupings (one represented with rows, and the other represented with columns). Of course, there's no reason you can't create simpler one-dimensional pivot tables. All you need to do is leave out either the column or row fields. For example, Figure 9 shows a pivot table that simply totals the number of units sold for particular products.
Unlike most other elements in Excel, pivot tables don't refresh themselves automatically. That means that if you change the source data, the pivot table may show out-of-date totals. To correct this problem, you can refresh the pivot table by moving to one of the cells in the pivot table and selecting Data → Refresh Data (or by clicking the exclamation mark on the PivotTable toolbar). This action tells Excel to scan the source data and regenerate the pivot table.
Figure 9. This pivot table is one-dimensional. It groups orders by product, but it doesn't subgroup them by country. When creating this pivot table, don't drop anything in the Drop Column Fields Here region.
So far, you've seen how to use a pivot table to quickly build complex summaries. However, pivot tables have another key benefit: flexibility. There's no limit to how many times you can move fields or recalculate your summary so that it performs different calculations.
To change a pivot table, you can use the following techniques:
To remove a field from a pivot table, click the field header (which looks like a gray box containing the name of the field) and drag it back to the PivotTable Field List window. The mouse pointer changes to an X symbol to indicate you're removing the field.
To move a field from one position to another, just drag the field header to the correct region. For example, you can reverse the example shown earlier by dragging the column field (Country) to the row region, and dragging the row field (Product) to the column region.
Figure 10 shows one way you could rearrange the pivot table shown in Figure 8.
Figure 10. Another way to organize the data shown in Figure 8 is to see how the product categories perform in each of the countries, as shown here. This summary table makes it easy to spot categories that do unusually well in certain countries.
Editor's note: The content for this article was excerpted from Chapter 20 of O'Reilly's Excel: The Missing Manual, which covers Excel 2002 and 2003, the most recent versions for Windows at the time of this writing. If you use Excel, this book should be on your shelf.
Matthew MacDonald is a developer, author, and educator in all things Visual Basic and .NET. He's worked with Visual Basic and ASP since their initial versions, and written over a dozen books on the subject, including The Book of VB .NET (No Starch Press) and Visual Basic 2005: A Developer's Notebook (O'Reilly). His web site is http://www.prosetech.com/.
Return to the Windows DevCenter.
Copyright © 2009 O'Reilly Media, Inc.