I always enjoy a conversation with the folks from Splunk, which I wrote up at LinuxWorld Expo 2006. Three of the major enhancements to their 3.0 release, being announced today at Interop, demonstrate a very flexible, interactive–dare I say it? Web 2.0-style–approach to their model of collaborative system troubleshooting.
One key to Splunk has always been its enormous repository of material about errors and events, submitted by the system administrators. This repository, called Splunk Base, is searched by sysadmins for anything that can help them debug problems on their own systems: error messages, IP addresses, port numbers, and so forth.
Each company maintains a Splunk server (or several servers) on the organization’s local network to collect that site’s data. Users can connect to the Splunk server using a browser to search their data and interact with the information on Splunk Base. The server software has been downloaded by 100,000 users, but a smaller number obviously are contributors to Splunk Base.
Splunk Base is already organized as a wiki, but users of 3.0 can create and upload self-contained bundles that can be downloaded by others and added to their Splunk servers. In this way, Splunk hopes to get more people to submit useful saved searches, reports, event classifications, scripts, and other sharable solutions.
Splunk has also hired a cohort of 30 experts to act as facilitators for particular areas. These experts will seed the areas with interesting content, find and encourage sysadmins to do more, clean up tags, and generally try to foster the kind of community feeling that makes end-users keep coming back, as well as contributing. Like many other sites (including O’Reilly), Splunk has found that you need these seeds in order to form the crystals of community, as well as to maintain order.
Scripts for storage
Splunk has always been able to index data and let its users search it in real time from logfiles and network sources like syslog. Now Splunk has added a general data access feature, where you can write a script to retrieve information (say, from standard ps and vmstat commands) and send it directly into your Splunk server, conduct a search, and correlate the information with errors and other log events.
Michael Baum, CEO and co-founder of Splunk, says, “Haven’t you ever wished you could go back in time and look at the state your system was in when something started to go wrong?” The new scripting system allows you to do just that.
Real-time report generation
Ad hoc search is useful for investigating problems, but Splunk users also need birds-eye views of their environments. The real power comes not only from retrieving data, but also summarizing output in structured reports and dashboards.
Now Splunk can combine results of searches into charts in real-time. They’ve always used an Ajax interface so sysadmins can drill down quickly through screens’ worth of data, but a new Flash interface allows the data to be combined and viewed in various graphical formats (which are quite attractive, I must add). Now, for instance, you can see how an activity changed over time on the ten sites where it was most active. You can map the trajectory of a problem on your system and compare it to the trajectory of somebody who went through the same thing earlier. You can create your own dashboard and move the reports you want to it.
Community input, aggregation of readily available data that was previously allowed to slip away, tools for real-time graphing–these are common factors in many modern information system. Splunk seems to get it, I think.