Companies are constantly opening new veins of ore as they attempt to mine the Internet for useful information. Developers and open source system users will be particularly interested in a SourceLabs announcement of a service called Self-Support Suites that has been in beta since December. This tool combines enormous amounts of information indexed by SourceLabs from bug trackers, technical mailing lists, and other sites to help open source users diagnose problems. They’ve just put up a free download.

The proof of concept I heard from Byron Sebastian, CEO of SourceLabs, concerned a site that spent two weeks trying to track down the failure of an Apache Project module. SourceLabs’s system found a bug report with the fix in a few minutes by finding a match between a stack trace provided by the user and a stack trace provided by a question in a public forum message. This search was more difficult than it might sound, because stack traces don’t match precisely and their contents are not unique strings that are easy to search for. Sebastian says that stack traces and log files tend to have the most useful information–but if other information was organized better, it might rise in value.

Current Self-Support Suites databases are aimed at Java developers and Linux users. It is used mostly by individuals, although SourceLabs is talking to managers about site licenses as well.

Self-Support Suites is reminiscent of Splunk, a company I profiled in 2006 and 2007. Splunk is devoted to system administration, and provides tools for making sense of the loads of log files, error messages, and other information that system administrators routinely collect. Splunk facilitates searches on this often fuzzy data, and lets users flag common problems as “transactions” so they can track and handle them better.

For instance, one Splunk transaction might be “a spike of connections on a server, followed by a lost message.” Users can automate searches and reports, package them up neatly in order to run them regularly, and share them with other Splunk users through its Splunk Base.

The company has also recently released a set of APIs so users can write applications that search their internal databases of events for such purposes as security tracking and compliance. These applications can also be shared on Splunk Base.

Black Duck Software is also experimenting with culling publicly available data for useful business applications. It established its reputation as a validation service that ensures companies they are using open source licenses properly. As its offerings evolved, it effectively turned into a service that let companies advise their developer which open source projects they could reuse software from.

With this process in place, Black Duck added security to licensing as a criterion for companies to use when evaluating software. Black Duck combines security reports such as CERT advisories (using some heuristics I don’t know) to create a measurement of each project’s security to guide companies in choosing software.

Soon they plan to add another measure of support, based on various measures including the size and participation of the community.

Services such as these from SourceLabs, Splunk, and Black Duck should lead to change in the tools used by developers and users to submit bug reports. The more data users provide, and the more that the data conforms to a format that is easy to search and index, the more likely it is to turn up in searches later.

I have already suggested, for educational purposes, a more structured repository for user questions. Bug tracking systems are hightly structured and therefore very easy to search. A similar standard for user forums would encourage users with questions to collect and provide useful information from the start, which would make it easier to answer their questions and leave the information in a more useful format for later searching.

The section on Splunk was updated on March 22 after discussions with Christina Noren, VP of Product Management.