What's New with Regular Expressionsby Jeffrey E. F. Friedl, author of Mastering Regular Expressions, 2nd Edition
Not long after finishing the first edition of Mastering Regular Expressions in early 1997, I started to work for Yahoo, writing programs that processed and managed financial news and data. I worked the industry-standard 20-hour days, using regular expressions day in and day out to parse data feeds.
Yet despite this long term, intensive use of regular expressions, my job didn't leave time for keeping up on how regular expressions evolved in the larger world beyond the one I spent my time in. So, when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed. Originally, I'd naively thought that the second edition would require only a short update (perhaps three months), mostly consisting of adding HTML-related examples. In the end, it became almost a complete rewrite, taking two years.
Yes, the new edition now has many more HTML-related examples, but there's so much more than that. This article touches on some of the high-level changes between the first and second edition of the book.
Perhaps the largest and easiest-to-notice change is the new coverage of languages that have risen in prominence since the first edition.
Five years ago, there were no regular-expression packages for Java, but
today there are many. Sun now even includes
as of Java 1.4. Is it the best one? What others are popular? Which are good
for what I want to do? What are the tradeoffs?
Java wasn't even mentioned in the first edition, but it receives a thorough treatment throughout the second, with its own chapter devoted to Java-specific issues. In it, I look at no less than seven different packages (including the popular Apache Regexp and Jakarta ORO packages), and help guide you in choosing which is best for you.
The .NET Framework
Whether you love Microsoft or hate it, there's no denying the popularity of Visual Basic. With the regular-expression package in the .NET Framework, Microsoft provides a package that can be used by VB.NET, C#, Visual C++, and any other language that wants to link to it -- even Python and Perl! The consistency is appealing, but even more important is the package itself: it's powerful and fast, and can it can hold its head up high next to Perl or any other regex package out there.
Like any package, it has its good points and bad points, and its share of bugs. A full chapter on .NET-specific regex issues helps to clarify things, and helps to make up for the exceedingly poor documentation that Microsoft provides with the package.
Other languages touched on in the second edition that were not mentioned in the first
include Ruby, PHP, and even
procmail and MySQL.
Pages: 1, 2