Recently I conducted a survey of members of a professional group, using an online survey service. The data from the survey was provided in a spreadsheet, along with simple breakdowns of the results (for example, numbers of males and females, or numbers of people in each age group).

In addition, I needed to make tables or graphs showing combinations of the data (for example, number of males in each age group or number of females in each salary group).

I’ve had very little experience setting up formulas in spreadsheets. But I wasn’t concerned, because lots of people do this sort of basic analysis, so I assumed that information on how to do it would be readily available. Surely I could find an appropriate spreadsheet function or formula, or even better a template, and apply it to my data.

No such luck. I’m convinced that the information is out there on the Web somewhere, but I haven’t been able to find it. I wasn’t surprised to not find much detail about OpenOffice.org Calc usage, but I did expect to find a clue in Excel literature. Despite some differences in notation, Excel and Calc use essentially the same function names.

I waded through the descriptions of functions in the OpenOffice.org online help and in some Excel books, but many of the descriptions didn’t make much sense to me and even when they did make sense, the examples didn’t seem to fit my situation.

The closest match I could find was DCOUNT. I could get it to work on any one combination of cells (for example, females in age group 3), but not in a way that would generate a table of results without doing a lot of manual work. Perhaps I’ve missed some trick to make this function do what I want.

COUNTIF also seemed like a good candidate, but I couldn’t find a way to get that to work on two columns of data instead of just one.

SUMPRODUCT does the job

Finally I stumbled upon a great article by John Walkenbach titled “Count and Sum Your Data in Excel 2002″ (http://www.microsoft.com/office/previous/xp/columns/column10.asp), which gave me the answer. Scroll down in the article until you get to “Conditional Counting and Summing Using Multiple Conditions.” This does exactly what I want.

Here’s the basic formula for the Gender-Age Group table:

=SUMPRODUCT((A2:A141=1)*(B2:B141=1))

The parameters vary depending on the number of rows of data and the value of the variable I’m analyzing. To make this work on an array of data, press CTRL+SHIFT+ENTER after entering the formula.

Replicating the formula into other cells

Now that I had the basic formula to do the calculation for one cell of my results table, I needed to replicate the formula in the other results cells. Here are the formulas for the other cells in the first row of the Gender-Age Group table:
=SUMPRODUCT((A2:A141=1)*(B2:B141=2))
=SUMPRODUCT((A2:A141=1)*(B2:B141=3))
=SUMPRODUCT((A2:A141=1)*(B2:B141=4))

And here’s the formulas for the second row of that table:

=SUMPRODUCT((A2:A141=2)*(B2:B141=1))
=SUMPRODUCT((A2:A141=2)*(B2:B141=2))
=SUMPRODUCT((A2:A141=2)*(B2:B141=3))
=SUMPRODUCT((A2:A141=2)*(B2:B141=4))

These were easy to input by copying and pasting the basic formula and changing the parameters as needed (remembering to press CTRL+SHIFT+ENTER), but it’s still a tedious and time-consuming process when you have lots of survey questions.

I’m still hoping to find a trick to doing it with less manual fiddling about.

I’ve put on my website a longer version of this article, including some sample data and results and a more detailed explanation. http://www.taming-openoffice-org.com/calc/surveydata.htm

Do you have a better way to accomplish this job using a spreadsheet? I’d love to hear what it is!