Sign In/My Account | View Cart  

advertisement

AddThis Social Bookmark Button

Weblog:   SPARQL My Opera!
Subject:   how?
Date:   2005-12-01 10:26:34
From:   Kendall
Response to: how?

So how does one go about providing such a thing? Are there any open source implementations one could build on? Maybe a drop-in Rails generator, or PHP proxying script?


Yes, there are lots of open source SPARQL implementations, including Sesame 2, ARQ, Redland, RDF::Query, and some others. There will be more and more of these as the spec gets finalized.


There's nothing as simple as drop-in Rails generator yet, not that I know of.


Also are there performance concerns with providing an openly queryable API? Security/privacy concerns aside, the main reason not to provide an openly queryable API is the concerns of performance.


Indeed there are. There are lots of things you can do, most of which are orthogonal and depend on what kind of service you're fronting. Of the top of my head:


1. SPARQL Protocol defines QueryRequestRefused, which a SPARQL service may return if a query is impractical.


One might determine which queries those are by doing dynamic or static query analysis and optimization, or some kind of process monitor with a simple timer, etc.


2. For a site like Flickr or delicious, if I were in charge, I'd build an RDF adapter layer over my RDBMS data. Then I'd start by segregating all of my data by, say, users. A query would always be against a user's data, not against all of the data extant. If you want to do aggregations between users, you retrieve an RDF representation, do a merge on the requester side, and query that.


If the strategy is to translate SPARQL queries into SQL, you could then do various optimizations and analyses (or process monitoring) on the SQL queries -- the database literature and market knows about this stuff.


3. Smaller sites might consider using an RDF-native database instead of RDBMS; which still allows KB segregations along various vectors. And RDF query engines are going to have to mature to do query cost analysis anyway if they're going to be serious players.


4. There are few simple things that anyone can do; for instance, I'm surprised that the Opera service allows the SELECT ?s ?p ?o WHERE {?s ?p ?o} query since it's equivalent to transfering the entire KB, which is going to be an expensive operation.


5. Finally, if you don't want to segregate KBs, you can segregate requests or users by requiring authentication (since SPARQL Protocol's HTTP binding is a simple GET) at the HTTP level, where you can log, trace, ban, (or charge more!) users who insist on expensive queries.