There are numerous misconceptions about the Semantic Web, largely caused by a misunderstanding of its aims and technologies. I’ve created this simple FAQ help dispel some of the myths.
I have basic technical knowledge. Tell me what the Semantic Web is.
The Semantic Web is a vision, that hopes to join together dispersed bits of data on the internet, very much like web pages are currently joined (linked) together. As with the current web of pages, anyone can create data on the Semantic Web, and anyone can “join” one bit of data to another.
Data isn’t forced into using a specific structure (like web pages are with HTML), so the data can be about anything; people, weather, books, movies, currency exchange rates or the distribution of geese in Europe.
Put simply, it’s like installing a huge relational database on the internet, where anyone can add tables to the database, and anyone can add data to a table. Tables and data can map to one another if you want, like foreign and primary keys.
Note that the Semantic Web (capital S, capital W) is often differentiated from the semantic web (small s, small w), which is basically a smaller-scale vision of making the current web more ’semantic’ (e.g. marking-up the ‘meaning’ of words on a page, rather than how they should look).
What’s the point of that?
If you’ve ever used a relational database — such as MySQL or SQL Server — you’ll know how useful they are, which is why relational databases are used ‘behind the scenes’ on most websites today. By storing distinct data-sets that inter-relate (for example, books, authors, and sales), you can very quickly find data that matches certain rules, such as books by a particular author, the top five best selling books, or top five best selling authors. Although each data-set can be maintained separately, they become increasingly powerful as they are joined together.
Imagine the possibilities if these data-sets were not just inter-related inside one database, but each database in the world was also inter-related. The authors table from a bookstore could be mapped to a birth records table in a government department (so, for example, you could get top five best selling books by nationality of the author). A historical database of world conflict could then be included to show the top five best selling books by authors who had been born in a country during time of war. And so on.
By treating the many large sets of data on the web as a single database, we’ll be able to create some incredibly powerful and valuable tools. The current trend of ‘mash-ups’ goes some way towards exploring this idea (usually mixing the data from only two data sets).
It sounds like a lot of hard work. I don’t want to have to re-do everything.
If everything goes to plan, you won’t have to do anything more than you are already. The Semantic Web doesn’t rely on any new data being created; there are already millions of suitable databases being used on the web. These databases just need to be ‘made available’ in a Semantic Web friendly format. So it’s more a job for application developers, to enable this to happen. Application databases are already capable of being published to a variety of formats (usually including SQL, XML, CSV, and so on), so adding another format to this list isn’t Earth-shattering.
So what is a “Semantic Web friendly format”, as you put it?
This is where it gets a bit tricky… There are a number of “formats” suitable for the Semantic Web, the most popular of which is RDF/XML. It doesn’t really matter that there are other formats also in use; nearly all of them are based on the same ‘data model’ (RDF), so they can be easily combined and made interoperable.
There are two basic approaches for making this data available to the Semantic Web. You could publish the whole database as a big RDF/XML ‘dump’, like dmoz do. Alternatively, you could make your database accessible via a SPARQL interface, which basically allows people to query your database via a web service, and have the relevant results returned in RDF (Microsoft have adopted this approach for their Profile Manager).
Remember that — unless you’re an application developer — you won’t have to worry about this, as this functionality will be provided by the software you use.
As it happens, I am an application developer. The Semantic Web technologies are too confusing.
Well, yes they are a bit. The documentation doesn’t help, and there aren’t enough high-level libraries around. Hopefully this will change now that organisations such as Microsoft and Adobe are seriously investing in the technologies.
The RDF model is fairly simple, and basically boils down to three bits of information: something (the ’subject’) has a something (the attribute or ‘predicate’) of a certain value (often referred to as the ‘object’). So, Dog X has a Height of 2 feet. Dan Zambonini has a Nationality of British. “Don’t Make Me Think” has an Author of Steve Krug. You get the idea; anyone familiar with metadata or databases should recognise the basic model.
RDF/XML is where it gets a little tricky… Don’t worry about all the little intricacies; read up a little on the striped syntax, and you’ll be able to start creating some RDF/XML files in next to no time (i.e. these files encode lots of RDF-modelled data into an XML file).
If you’re an application developer, you almost certainly know Object Oriented (OO) programming. Keeping the model of OO programming in mind, take a quick glance over the RDF Schema documentation, and you should recognise some of it. It basically lets you define ‘classes’ and ’sub-classes’ to use in your RDF, along with a level of ‘data-typing’ (so, for example, you could say that ‘British’ could be used as a Nationality, but not as a Height).
Don’t worry if this is getting confusing; you can probably start churning out RDF/XML without worrying too much about the schema; just take your lead from existing RDF/XML data (e.g. FOAF data). If, on the other hand, you really dig data-typing and the OO paradigm, you may even want to venture into OWL, which follows on from RDF Schema.
These “top down” approaches never work; you’ll never get everyone to agree on how things relate to one another.
The Semantic Web doesn’t rely on a top down approach. It can start with individual groups — or even individuals — putting their own data out there, without worrying about how it relates to what other people are doing. We’ve seen this with FOAF, RSS did it for a while (when it was RDF based), and there’s no reason that many of the newer formats emerging today couldn’t have made themselves RDF compatible (e.g. the sitemaps protocol wouldn’t have been that much different if it has been RDF based).
If you’re designing a new XML format, it’s worth considering making it RDF/XML compatible; you’ll still get all the benefits of XML, plus you’ll be fully equipped to take advantage of RDF’s extensibility and interoperability.