Related link: http://www.rdfdata.org
I love RDF. it’s great for implementing linking technology because it lets you specify typed relationships between addressable resources. It even lets you specify attributes of the relationships themselves. It’s also great for many new classes of database applications, because of the unstructured way it lets you accumulate property values and the ease with which distributed collections can be aggregated.
Discussions of RDF often mention the Semantic Web as well. The first mention I can find of the two of them together is in a February 1999 Society for Technical Communication article by Tim Berners-Lee. Five and half years later, we see lots of Semantic Web talk, tools, and FOAF files, but outside of the FOAF files, where’s the data upon which this web will be created? Where are these machine-readable facts that will be linked into this Semantic Web? (I’m talking about a Semantic World Wide Web here—if you you’re using Semantic Web tools to manage RDF data on your company intranet or on your personal local storage, I’m happy to hear RDF success stories, but the Semantic Web vision I’ve always heard about described connections between widely dispersed data contributed by systems that were unaware of each other. In other words, one big Semantic Web, not a collection of smaller, unconnected Semantic Webs. This requires data to be available on the public internet.)
RSS files as we know them can’t play much of a role in any web, because their data is too transient. Yes, use cases exist for the value of transient data, such as looking up movie times, but a format designed to notify one system about new resources available on another system isn’t the best way to do this, and people aren’t doing it anyway. With very few exceptions, such as Monkeyfist and the Center for Science in the Public Interest, few sites even archive their RSS files. If a feed holds ten items, then after the next ten appear, all the data currently in that feed will be lost.
But enough of my complaining. I decided to really look for publicly available RDF and to accumulate a list. When I saw that the domain name rdfdata.org wasn’t taken, I couldn’t resist grabbing it. With some help from the rdf-interest mailing list, some Google tricks, and a wiki page, I’ve accumulated an initial list. I try to spend some time each day searching for new entries, and I hope to see more suggestions added to the wiki page.
The rdfdata.org site includes an RSS feed to notify people about new entries, and you can download all of its entries as a single RDF file. While entries that point directly to RDF files are distinguished from the rest, most rdfdata.org entries point to HTML files and directory listings that include links to multiple RDF files. In many cases, the RDF files are zipped or gzipped, making them a little less useful for a live Semantic Web, but any large collection of publicly available RDF helps.
FOAF files tend to be small, and a list of individual FOAF files on rdfdata.org would be redundant with other lists out there, so rdfdata.org points to the existing lists instead of to individual files. I’m mostly interested in collections of RDF that weigh in at 90K or greater. (If we’re interested in the semantic content of these RDF files, then RSS files bulked up to that size by CDATA sections don’t really count—when you tell an XML parser “don’t treat any of this as structural markup,” which is what CDATA delimiters do, then that section has little if any semantic value in the context of that document.)
Perhaps it’s a bit bombastic to assert that the September 2004 RDF Semantic Web is little more than talk, tools, and FOAF files, but I don’t see a whole lot of data outside of those FOAF files that can be used by those tools. I’d love to be proven wrong. Show me the data! I want URLs. Add some to the wiki, and I’ll move them to the rdfdata.org collection. Then, hopefully, the rdfdata.org list of resources will grow large enough that people will easily find plenty of machine-readable data to work with as they build a real Semantic Web.