When it comes to the Semantic Web, you might call me a disillusioned advocate. I’ve been dipping in and out of the technologies for the last 5 years or so, but am increasingly frustrated by the lack of any visible progress.
This entry should be regarded as constructive criticism of the Semantic Web — I still believe in it, but need to bring the flaws (as I see them) in to the open, in the hope that discussion and communication is the first step towards resolving problems.
1. Not all Semantic Web data are created equal.
If you do a ball-park estimate with Google, you’ll find about 5-10 million RDF files on the web, in various formats. Many of these are FOAF, most are RSS (the RDF versions). Sounds pretty successful, doesn’t it? On top of that, we have a whole bunch of semantic web data being created by life sciences groups, policing departments, and others.
But semantic web data does not necessarily build the Semantic Web (big S, big W) - the (or, at least, my) vision of a Utopian web of data. How come?
All of that data being created by the life sciences, and other similar groups, is behind closed doors. It isn’t available to the public, and doesn’t hook into any web. As far as the Semantic Web is concerned, they could be using Turbo Pascal, rather than RDF and SPARQL.
And RSS? Well, there are two types of semantic web data - useful, and not so useful (with regards to building the hooked-up ‘Semantic Web’). And I’d guess that 99.9% of RSS data comes under the latter classification.
RDF data really needs to have URIs as the three parts of the triple in order to fully enable the ‘joining up’ of data. But the vast majority - in fact, probably all - of RSS data has at least one ’string literal’, e.g.
Which isn’t too handy when it comes to hooking this data up to news items about “Science” or “Technology” or “Nature” - or basically any string literal that doesn’t match “Science/Nature”. So the RSS data doesn’t really help.
2. A technology is only as good as developers think it is.
Let’s take a look at what developers - and the public - think of Semantic Web technologies.
Hmm, so more people search on Google for Prolog and Fortran than Semantic Web topics? Not a particularly good sign, and the trend doesn’t seem to be getting any better, but let’s keep looking…
Hmm, so the same number of books on SMIL as on RDF, on Amazon? And AJAX, that’s been around for less then half the time, has really got developers excited enough to write books.
(Note that these graphs are not all to the same scale). Ah, the primary measure of how interested developers are in a technology - the amount of blogging on a subject. And the semantic web technologies don’t seem to fare too well. Comparing it to AJAX/Web 2.0 is a bit harsh, but you get the idea - it’s not grabbing any headlines.
So why is this?
Another reason could be that the semantic web community is (or at least appears to be) academic, inward-looking and uninviting.
I’ll take a recent post from the general interest W3C Semantic Web mailing list - this is a mailing list that any of us should go to if we want to discuss semantic web issues (not just technical issues):
“IMHO, RDF/RDFS/OWL are not well suited to proving validity, due to the open world assumption. There are usually too many possibilities to prevent any incorrect interpretations.
On the other hand, it does just fine with consistency. The only trick is that people are often surprised that many constructs can be considered consistent… again due to the open world assumption. As an example, I first learnt this when I discovered that (under OWL) a predicate with cardinality of 1 for class C can be used multiple times on a single instance of C.”
This very technical discussion could certainly be making beginners nervous. Maybe a less technical mailing list should also be made available?
As for the academic/inward-looking part, well I don’t think I’m the only one that feels like this. The brilliant Uche Ogbuji has previously stated that:
“I get the feeling that in trying to achieve the ontological purity needed for the Semantic Web, it’s starting to leave the desperate hacker behind.”
And another brilliant (semantic web) developer, Christopher Schmidt, has said:
“I’ve lost a lot of my interest in working with the Semantic Web lately, and I don’t see it coming back anytime in the near future.”
“…frustration with evangalizing being part of the process of proceeding in the Semantic Web world. Every time I take a step forward with some code, I find another 5 steps I have to take back in order to defend my position and the way I’ve done it.”
We need to win these people back, not brush these complaints under the carpet.
3. Complex Systems must be built from successively simpler systems.
I read recently about how - no matter how much resource Microsoft throw at developing a search engine - they can’t beat Google. The explanation given was that Google learnt by basically building up slowly, learning the lessons as they go, and creating an increasingly complex system. You can’t take a short cut and go straight into constructing a complex system from day 1.
A similar problem faces the Semantic Web:
With the semantic web, there’s a much higher first rung to the ladder. Getting to grips with RDF/XML, SPARQL, and the other core technologies is a big ask for most developers. To then get useful semantic web applications out of these takes a couple more exhausting jumps of complexity.
And it seems like this complexity is starting to show - Swoogle (http://swoogle.umbc.edu/) estimates that about 1/3rd of the ‘pure’ RDF files that it has harvested contains errors.
4. A new solution should stop an obvious pain.
So what kind of thing will the Semantic Web provide? Amongst other things:
- Finding information/services (search)
- An automation infrastructure
- Data mining opportunities
Although there are minor grumblings with Google, the majority are happy with it for Searching. And nobody lies awake at night worrying about a lack of consistent annotation functionality. The Semantic Web needs to prove what problem(s) it’s going to solve, and not just show that it can create pictures showing you that you know your friends.
5. People aren’t perfect.
Creating metadata and classifications is difficult (let’s not get started on Ontologies). People are biased (whether they mean to be or not), and fallible. Metadata, which the Semantic Web relies on, is not always going to be of great quality (anyone remember putting ‘free’ and other dubious key words into their meta HTML tags in the late 90s, when search engines still used them?).
Maybe this one isn’t such a big issue though - much of the data will be coming from existing databases (rather than ‘hand created’), and it’s not like HTML doesn’t have problems with noise and signal. But I do worry that solving this problem with “Trust Policies” and other technologies will just add to the suffocating complexity that already surrounds the Semantic Web.
6. You don’t need an Ontology of Everything. But it would help.
Sorry, I don’t mean to go all Clay Shirky, but the Semantic Web would be a lot easier if we did have a central ontology. All the semantic web developers and groups will disagree with me, but from my experience in trying to implement Semantic Web technologies in the real world, it would be a great enabler.
My clients don’t want to create ontologies. They don’t want to map one set of data to another. They want to use something that’s out there and ready for them to use, and will give them the maximum benefit (so if the Imperial War Museum say that they have a tank from “World War One” and the Science Museum has a video of the firing mechanism from a gun from “World War One”, they can both use the same term/URI).
I know - it’s an authoritarian, top-down view that’s next to impossible, but maybe we should try it, rather than thinking that we don’t need it because it’s hard to achieve? I’m guessing people had the same doubts over Wikipedia, but that seems to be doing OK.
7. Philanthropy isn’t commercially viable.
We need organisations all over the world to buy into the Semantic Web vision, and to start exposing their data in a compatible format (actually, getting them to expose their data is going to be pretty tricky…). But why should they? What reason has any organisation currently got to invest in RDF, SPARQL or any Semantic Web enabling technology? We need that killer application! Anyone got any bright ideas?