The New Bloglines Web Servicesby Marc Hedlund
Bloglines today announced a set of new web services APIs, allowing developers to write applications for reading RSS and Atom feeds by drawing data directly from the Bloglines databases. This is a very significant change in the landscape of RSS/Atom aggregators, the newsreading applications that have become more popular over the past few years. Along with the release of its web services, Bloglines announced that several desktop RSS/Atom aggregators, including FeedDemon, NetNewsWire, and Blogbot, will begin using these APIs to provide additional capabilities in their applications. The Bloglines Web Services make it very easy for developers to use RSS and Atom content for many purposes, and the services will also ease the traffic pileup that aggregators are beginning to cause for many large RSS/Atom publishers.
This article will take a look at the new Bloglines Web Services and their effect on the RSS/Atom landscape. We'll look at the bandwidth issues surrounding RSS/Atom aggregators and how the Bloglines Web Services help conserve bandwidth; then examine the APIs and what they offer; and finally, present a complete, three-pane desktop RSS/Atom reader written in just 150 lines of code, using the Groovy programming language.
The RSS Overload
eWeek recently reported on the bandwidth problems RSS/Atom aggregators have been causing for Web publishers. Spurred in part by Microsoft's announcement that even it was having trouble keeping up with requests for its blogs.msdn.com feeds, publishers have been talking about how much traffic a popular RSS/Atom feed can bring to bear. As one publisher in the eWeek article put it, "Any site that becomes popular is going to be killed by their RSS."
So what's the problem? Haven't web sites been able to keep up with traffic from all over the world for years now? It's true that web servers and protocols are very scalable, but RSS/Atom readers present a new kind of challenge. With a web browser, users visit a web site only while they are in front of their computer and reading that site--in other words, when they are actively browsing. An individual will visit some very large sites (such as My Yahoo or Google News) repeatedly throughout the day, but such sites are usually commercially run and able to support larger streams of traffic. The difference with an RSS/Atom aggregator is that it automatically pulls information from a publisher's site on a regular basis--sometimes as often as once every 5 minutes. Regardless of whether the site has changed or the user is out to lunch or home for the evening, the aggregator will update itself continuously as long as it is running, to ensure that it is able to present the latest information when called on by the user.
Some people joke that a popular RSS site is indistinguishable from a security attack. In security circles, a large number of clients repeatedly making requests to the point of overload is known as a distributed denial-of-service attack, and attacks of this sort have taken down the largest sites on the Web, including Yahoo, eBay, and Amazon. For a small Web publisher, even a moderately popular RSS/Atom feed can cause serious bandwidth consumption, running up ISP bills and preventing users from reaching any part of the site. For larger publishers, RSS/Atom feeds can bring in many more users but can also consume extensive resources.
While many in the RSS/Atom developer community have long recognized the bandwidth overload problem, the possible solutions require that nearly all aggregators adhere to a variety of "polite" practices to ensure that servers are not overwhelmed. As of yet, not all aggregators have done so. Even where developers have made determined efforts, users want very fresh news and therefore often configure their aggregators to poll very frequently.
Bloglines As a Feed Cache
Bloglines is different from most other RSS/Atom aggregators. Like NewsGator, Bloglines is a server-side aggregator. This means that Bloglines maintains a database of RSS/Atom feeds in the same way Google maintains a database of web pages. Bloglines users query that database instead of polling individual RSS/Atom publishers from their desktop machines. In other words, Bloglines appears to publishers--and consumes bandwidth--like one single RSS/Atom aggregator but is able to serve tens of thousands of users.
By offering web services APIs, Bloglines is opening up its database of feeds for anyone to use. Any developer making an RSS/Atom-based application can draw from the Bloglines database, avoiding bandwidth overload for RSS/Atom publishers.
Bandwidth savings, though, is not the only reason to use Bloglines as a feed cache. RSS and Atom are emerging formats on the Internet, and there are many variations on feed formats to deal with. By drawing feeds from the Bloglines database, developers are presented with a single format--Bloglines normalizes all of the feeds it collects before distributing feed content. Another benefit is one that Bloglines users have long enjoyed: synchronization across computers. If you read news on one computer at work and on another at home, using a server-based aggregator lets you have the same set of feeds on both machines, and allows you to update those feeds as you read them from any machine. Using the Bloglines Web Services, client-side (desktop) aggregators can provide this same functionality. You could even use, say, FeedDemon on Windows and NetNewsWire on Macintosh, and share the state of your feeds between them through Bloglines.
While not all of the Bloglines features are available through its web services, many of the key benefits for publishers and users are, and developers have less work to make aggregators, too.
The Bloglines API Calls
The Bloglines Web Services APIs are made available through two
Both of these calls, as well as other APIs that Bloglines provides, are
documented at www.bloglines.com/services/api.
We'll first walk through the setup of a Bloglines API application,
then each of the calls in turn. Finally, we'll look at a sample
Bloglines API application.
Before getting started with your Bloglines API application, collect the following:
- All users of Bloglines API applications must have their own Bloglines account. For development, if you do not already have a Bloglines account, register for one now. If you plan to distribute your application to other users, make sure they know they need to get an account, and prompt them for their account email address and password. Once you have a Bloglines account, subscribe to one or more feeds so your account will have data in it.
- All Bloglines API calls are authenticated using Basic HTTP Authentication. Whatever programming language you use to develop your application, make sure you have a client HTTP library that provides authentication capabilities, or read up on how to implement authentication yourself (which isn't hard). Java and Groovy users will probably want to use HTTPClient; Perl users will want to use LWP. Other languages have similar libraries available. To authenticate, use the email address and password for your Bloglines account.
- The returned information from the API calls is an XML document containing the information you requested in the call. You will need to have an XML parser available, or you can parse the returned document yourself with regular expressions or otherwise.
When you have a Bloglines account, an authenticating HTTP library, and a way to parse XML results, you're ready to start making API calls.
listsubs call is used to list all of the
subscriptions for a given user account. The APIs do not provide a
way to add feed subscriptions to your account, nor do they provide
methods for editing or updating feeds. In order to create or modify
subscriptions, you must go to the Bloglines site and use the Bloglines
web interface. After your subscription list is registered with
Bloglines, you may access that list using
listsubs call is simple and takes no
parameters. Every call to
listsubs looks the same:
listsubs will return by way of HTTP, using the HTTP
response code to indicate the status of the request--
OK to indicate success, and
401 Unauthorized to
indicate that the given email address or password is not valid. Be
sure to check the response code from the request and prompt the user
to correct his address or password as needed. If the
listsubs call succeeds, the HTTP response will contain an
XML document with a list of the user's subscriptions. This response
document is in OPML format but
also contains some Bloglines-specific extensions to OPML:
BloglinesIgnore attributes on
<outline> tags, indicating the state of that
subscription in the Bloglines user account.
BloglinesSubId is an
identifier for the subscription within Bloglines' database--you'll
need this later to request feed content.
BloglinesUnread shows the number of items in the feed that the user has not yet read.
1 means ignore and
0 means don't ignore)
indicates whether the user wants to be notified of new items on that
One item to note: Bloglines, like many RSS/Atom aggregators,
lets users organize subscriptions within subfolders. As a result, the OPML file that
listsubs returns may contain several
levels of nested
<outline> tags, some representing
folders and some representing feeds. One good way to check an outline
tag to see whether it represents a folder or a feed is to look for the
presence of an
xmlUrl attribute in the
tag. If an
xmlUrl is present, it's a feed; if not, it's a folder.
listsubs will return a
BloglinesSubId for both
folders and feeds, so you can't use that as a distinguishing factor.
response to the
listsubs call, along with more
documentation of the call responses, is provided on the Bloglines
getitems call is used to retrieve all unread items
on a feed to which the user is subscribed; or all items since
a given date and time. In order to make a
you are required to know the
BloglinesSubId for the feed you want to
retrieve, so you will need to make a
listsubs call first
and get the
BloglinesSubId from the
Especially during development, you may not want to have your
Bloglines API application update your feed states on the Bloglines
server--you may want to test-read your feed in your application,
and then later read the feeds on Bloglines itself. You can control
getitems will update the read status
of the feed you request by adding
n=1 (update) or
n=0 (do not update) to the end of the
getitems URL call. (A call to
not affect your read status at any time.)
If you do not specify a date parameter to the
call, the call will return an RSS 2.0 document containing all of
the unread items in that feed. The number of items should be the same
as was listed in the
BloglinesUnread attribute returned by
listsubs for that feed, but it may contain more--for
instance, if another item arrives between the time of the
listsubs call and the time of the
call. It could also contain fewer items, if the same user has read
items through another application (the Bloglines web interface or
another Bloglines API application). If there are no unread items on
getitems will respond to the HTTP request with
304 Not Modified response code.
You can retrieve items that you've previously read by specifying a
d=DATE parameter to
getitems. The date
should be given in Unix time--that is, the number of seconds since
January 1, 1970. As before, if there are no items on that feed after
the date you specified, you will receive a
304 Not Modified
response and an empty response body.
Here's what a typical
getitems call might look like:
This call says that you want to retrieve all unread items for BloglinesSubId 270 and that you do not want your read status updated by this call.
Other examples of the
call format, and an example return document, are available on the documentation
getitems on the Bloglines site.
Pages: 1, 2