|MySQL Conference and Expo April 14-17, 2008, Santa Clara, CA|
After finding a tag, we have to transform it into a DOM node. The structure of a start tag is defined as follows:
An opening angle bracket, followed by the element's name. Optionally, it can contain a set of attributes, which consist of key/value pairs separated by an equality sign and separated from the name using white space. The tag is closed with a right angle bracket. Empty elements are indicated by a slash before the closing angle.
This pattern can be expressed easily with a regular expression like this (we'll handle empty elements later):
The parentheses are used to group sub-patterns which can be accessed after executing the regular expression:
Now that we have created an element, the next step is finding the attributes and adding each to the element. If present, the complete set of white-space separated attributes can be found in
Using a loop, we access all items and split them once again, this time at the equality sign that separates the key and the value. Then we create a new attribute node from the key. Creating a new attribute node is a method of document (like creating elements). Before we can actually set the value, we have to remove the quotes of the value string:
That's it. After connecting the new attribute node to our formerly created element node, we're done:
The actual library function additionally performs some checks; I only outlined the basic process here. See the complete source code of
The algorithm is simple but effective and can handle every valid markup.
A few final words ...
Actually, the DOM Level 1 spec defines a class for sub-trees. These are called DocumentFragments. Unfortunately, current implementations we're working with don't support this very well.
Of course, this wouldn't be a real browser application unless we had to deal with some wrong behavior in the DOM implementation.
Fortunately, this application only contains a single line that had to be changed to work in Internet Explorer also. This is because IE doesn't implement the