Lexing Your Data (16 tags)
Perl is famous for its text-processing capabilities. However, sometimes the data you want to process is too complicated for regular expressions and you reach for a parser for HTML, RTF, or other common format. What happens you don't have a pre-defined parser, but the text you need to work with is too complicated for regular expressions? Curtis Poe shows how to do proper lexing with Perl.
An Introduction to StAX (14 tags)
StAX, the Streaming API for XML, is a new API for pull-parsing of XML, developed under the Java Community Process as JSR 173. O'Reilly author Elliotte Rusty Harold gives an introduction to this API, which combines the efficiency of SAX with the ease of use of tree-based APIs.
Analyzing HTML with Perl (8 tags)
Kendrew Lau taught HTML development to business students. Grading web pages by hand was tedious--but Perl came to the rescue. Here's how Perl and HTML parsing modules helped make teaching fun again.
Using PHP 5's SimpleXML (4 tags)
Unless you've worked with SGML, you may find it ironic that XML can be hard to parse. Most choices boil down to event-based parsing, bulky tree-walking, or writing more XML. The upcoming PHP 5 has another option, SimpleXML, that can take the pain out of simple and common XML uses. Adam Trachtenberg explains.
Parsing an XML Document with XPath (4 tags)
Pulling just a single node value or attribute from an XML document can be inefficient if you have to parse over a whole list of nodes you don't want, just to get to one you do. XPath can be much more efficient, by letting you specify the path to the desired node up front. J2SE adds XPath support, and the JDOM API also offers support through an XPath class. Deepak Vohra looks at both approaches.