O’Reilly Associates has developed some extremely useful tools for analyzing the current state of the technology book market using nothing but open source technologies. This morning at O’Reilly’s FOO Camp 2004, Tim spent some time going over exactly what they’ve done and how they went about it.
Their internally developed tools begin with a data warehouse based on MySQL. They’ve designed the data warehouse with a traditional star-schema to allow reporting and analysis of data across many dimensions or categories.
One of the really innovative aspects to what they’ve done is how they’ve augmented the traditional weekly sales data provided by Neilson Bookscan with other data feeds from book sales sites on the Internet.
This ability to creatively find and harvest data from other places to augment traditional book sales data allows them to find relationships between book sales and other events or trends that aren’t visible without having all the data in front of you and the right tools to analyze and display it.
This tool is very similar to the data visualization provided by Smart Money magazine for their Map of the Market. This tool allows users of the Smart Money website to analyze stocks in different segments of the market and to understand which market segments are performing better or worse than others. It also allows people to quickly find individual stocks that are performing well above or below average.
O’Reilly, based on open source database and visualization tools, has now created a similar functionality to track the technical book market.
So what are some of the big trends in the technical book market? Here are some of the trends that Tim identified:
- About half of all books sold in America are sold through Amazon.com, B&N, Borders and B&N.com.
- .NET, PHP and C# book sales are currently growing while Java is falling off. (Although, Java is still the largest category.)
- Books on Open Source technologies (if you count books on Java and related open source Java technologies as ‘open source’) dominate the combined sales of books on all ‘proprietary’ technologies.
- Sales of books on Red Hat Linux (including Fedora) have dropped off considerably, although the overall sales of Linux books has remained pretty constant. It seems that other distro’s and general Linux tech books are growing fast enough to take up the slack.
The challenge for them now is to understand how to hack the ‘collective intelligence’ of the Internet to understand what data might be lurking out on the Internet to help them understand causes and effects better. For example, would analyzing current job postings on the “help wanted” sites of major newspapers help them understand how the technology book market is reacting?
What data do you think might help understand or forecast what’s happening in the technology book market?
What data can be collected from the Internet to help understand trends in book sales?