My company rolls its own version of Xerces for our products. We can add fixes or enhancements without fear of conlicts. Over the years, the list of things to do has decreased. Now it is just about down to removing HTML stuff, adding a SAX feature to ignore all entity references (editors need this) and customizing the horrible validation messages (humans need this.) One part of doing the in-house fork is running the program checkers on each Xerces release to gauge its quality.
Verdict? Pretty good.The best yet as far as automated software tests go: as a user of Xerces, it was a very encouraging result to me.
Heres the results for Xerces 2.9 (released late November 2006):
Changing the JLint settings to not report half the poblems, it reports 1460 issues. Most of these are to to with thread-safety or scoping and so not interesting to my requirements. But there are a few concerned with variables that may have NULL values that are more reasonable. And it flags a handful of possible array out-of-bounds: if one is not a bug it is at least worthy of a comment in the code to explain why it never happens.
Changing JLint to report the other half of problems, I get 1499 issues. A lot of variable shadowing, a lot of strings compared as objects, a lot of hashcode() overridden without equals(): nothing that screams for attention.
While it is a lot of issues, they seem the kinds of nannying issues you expect from JLint. It does suggest however that there are at least a lot of i’s that need dotting and t’s that need crossing still in the code; I suspect these might make JLint shut up rather than fixing any errors, but errors are always where you didn’t expect them. So the verdict for non-threaded use is good, from JLint. But for threaded use it is another matter: I think what JLint’s numbers suggest is that it would be worthwhile for some organization who use Xerces in a server on threads to do a more complete code review on the synchronization and deadlock issues.
I bump up the out-of-range metrics defaults to double them: I’m only interested in extremes here, and if I recall correctly the Eclipse metrics tool fails to exclude switches from cyclomatic complexity etc: so the numbers will often be overstated anyway. As I’d expect, the classes for regular expressions and grammars are off the scale for the McCabe metric and for the Lines of Code metric: this is the switch statement effect…actually, since Xerces is mainly made up of parsers and switches of various kinds, the metrics are pretty unusable for it. I used to run a proprietary tool from India that was much better. The conclusion: many of the Java files are too big or too complex for comfort, and the lack of comments only makes it worse. But the measures are unreliable.