I had noticed Dan Sickles reference/link to LINQ in his follow-up post earlier today (e4(x)linq), but assumed it linked back to the LINQ entry page on MSDN, and didn’t click it. However, I needed to reference the e4x specification for some related work, and given I still had his post open in a tab it was easier to just switch to the tab than it was to search for “E4X” (didn’t have it bookmarked — now I do :).

However, in hovering over the general area (the links are separated by “(x)”) I noticed that “linq” was linked to a URI other than its above linked MSDN home. Out of curiosity I clicked through and found an interesting post from Steve Eichert that apparently solved *ALL* of Scoble’s problems. Curious, I quickly parsed through looking for a Green M&M sorting algorithm of some sort, but unless I’m missing something, there doesn’t seem to be one.

Not-to-fret, however, as what seems to be in place of the missing algorithm is something even better: Something that the remaining 6,642,658,382 (based on the result of the formula (((World Population on August 23rd, 2006 @ 11:00PM EDT) - 100)) can use to our advantage… A URI Filter!

Okay, so maybe not all 6,642,658,382 of us… But close!

Setting this potential time wasting point-of-argument aside,

Always curious to see how one might go about solving XML-based problems using native XML-processing techniques, I threw together a quick-and-dirty solution using XSLT (2.0, though this was mostly due to xsl:result-document (in which I use the resolve-uri function to dynamically generate the value of href); the actual processing code is 1.0 compliant). I’ve checked the result into the GoogleCode-based XSLT project repository @ http://xslt.googlecode.com/svn/trunk/Modules/DataFilter/.

Couple of notes: I had no desire to dynamically access what is currently a 17+meg data file, so I downloaded the changes.xml file from http://weblogs.com and used the local copy for the transformation process. If you want a more dyamic solution, you will need to adjust the value of filter/@xml:base contained in the init.xml file accordingly.

Also worth noting: While I REALLY like the solution Steve developed using LINQ, the following result which implements a separate data file that contains a list of filter keys and a separate processing file is 29 lines of XML (data XML + processing XML(AKA XSLT)) compared to 48 lines of LINQ.

However, to be fare, if you remove the empty lines, but leave the “}” to keep the structure (for ease of reading as well as implementing a more accurate “apples to apples” comparison of line count), you could get this down to 43 lines.

With the above in mind, the code,

init.xml


<?xml version="1.0" encoding="UTF-8"?>
<filter xmlns="http://xsltransformations.com/webfeed/filter" xml:base="file:///G:/xslt/trunk/data/" src="changes.xml" output-file="filtered-data.xml">
    <key value="spaces.live.com" />
    <key value="typepad.com" />
    <key value="blogspot.com" />
    <key value="wordpress.com" />
</filter>

process.xsl


<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform version="2.0" xmlns:filter="http://xsltransformations.com/webfeed/filter" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:result-document href="{resolve-uri(filter:filter/@output-file, filter:filter/@xml:base)}">
            <filtered-url-list>
                <xsl:apply-templates select="filter:filter/filter:key">
                    <xsl:with-param name="data" select="document(resolve-uri(filter:filter/@src, filter:filter/@xml:base))"/>
                </xsl:apply-templates>
            </filtered-url-list>
        </xsl:result-document>
    </xsl:template>
    <xsl:template match="filter:key">
        <xsl:param name="data"/>
        <xsl:variable name="filtered-data" select="$data/weblogUpdates/weblog[contains(@url, current()/@value)]"/>
        <key value="{@value}" count="{count($filtered-data)}">
            <xsl:apply-templates select="$filtered-data"/>
        </key>
    </xsl:template>
    <xsl:template match="weblog">
        <blog url="{@url}"/>
    </xsl:template>
</xsl:transform>

Enjoy!

Quick Update
: To be more accurate, you should really implement either a RegEx string evaluation, or even easier (dependent upon if you prefer using RegEx, or the XSLT 1.0 compliant string processing functions), a combination of substring-before() and substring-after(), appending a “/” to the end of each URI (it seems most URI’s in the changes.xml are sub-domain instead of path-based (e.g. http://foo.spaces.live.com) first, to then substring your way after http:// or https://, and before the first “/“. While I actually have the XSLT 1.0-based code to process a URI into all of its proper fragment types, for this solution it would have made things more complex than they really needed to be. While its true that any of the keyed values of each domain filter could exist in in the path portion of the URI of someones blog, its an edge case, and one that wasn’t worth stressing over.

That said, I just realized that using RegEx, it would make the task of locating the keyed value of each domain filter only if the keyed value exists in the sub-domain.domain portion of the URI MIND NUMBINGLY SIMPLE.

Yet one more reason why XSLT 2.0 ROCKS! :D