Prof. Rob Cameron of Simon Fraser University has just announced on the XML-DEV mail list his open source Parabix XML parser, which seems to set new benchmarks for parsing speed, using the SIMD instructions of modern processors.
I am particularly interested in this, because a year ago when Cameron released his UTF-8 converter that trialled his approach, u8u16, I said
I would love to see an XML parser that combines Cameron’ SIMD work with the optimizations from IBM’s XML Screamer, which seem to increase the speed of Java processing by two or three fold.
I’ll have a look at this over the next few days, time permitting, in more detail. There are not many areas in text processing where there is new work being done: the 60s and 70s saw most of the basic work and data structures, so I think it may be a quite startling development. Well done, Rob!
Intel has also being doing work in the area of hardware speed-ups to parsing. Anyone else doing research in this area?