Geekfodder!

Rob Cameron, who is a professor at Simon Fraser University, has released u8u16 in open source beta, a really exciting library which implements an “iconv” like transcoder (i.e. it converts data from one character set and encoding to another), and which uses the SIMD instructions that modern CPUs have.

I think I was the first person to write something on this technique, certainly on the Internet, in my blog item Using C++ Intrinsic Functions for Pipelined Text Processing a couple of years ago, but only because the idea was too obvious to people involved with DSP to write about, I gather: of course you can use instrinsic functions for text processing! My code just used C++ intrinsics as an optimization on top of C++ code. But Cameron takes it to another level: his code abstracts out the features of the most common SIMD devices so that his algorithms can be arranged to work on this abstraction and compile to a wide range of targets processors, and he can dispense with the code. He reports 4 to 25 times speed increases, depending on the data; which is very promising.

I would love to see an XML parser that combines Cameron’ SIMD work with the optimizations from IBM’s XML Screamer, which seem to increase the speed of Java processing by two or three fold. Cameron’s work is important because it gives a working abstraction that can inform decision-making on buiding SIMD-using capabilities into Java’s text processing.