advertisement

Print

The New Bloglines Web Services

by Marc Hedlund
09/28/2004

Bloglines today announced a set of new web services APIs, allowing developers to write applications for reading RSS and Atom feeds by drawing data directly from the Bloglines databases. This is a very significant change in the landscape of RSS/Atom aggregators, the newsreading applications that have become more popular over the past few years. Along with the release of its web services, Bloglines announced that several desktop RSS/Atom aggregators, including FeedDemon, NetNewsWire, and Blogbot, will begin using these APIs to provide additional capabilities in their applications. The Bloglines Web Services make it very easy for developers to use RSS and Atom content for many purposes, and the services will also ease the traffic pileup that aggregators are beginning to cause for many large RSS/Atom publishers.

This article will take a look at the new Bloglines Web Services and their effect on the RSS/Atom landscape. We'll look at the bandwidth issues surrounding RSS/Atom aggregators and how the Bloglines Web Services help conserve bandwidth; then examine the APIs and what they offer; and finally, present a complete, three-pane desktop RSS/Atom reader written in just 150 lines of code, using the Groovy programming language.

The RSS Overload

eWeek recently reported on the bandwidth problems RSS/Atom aggregators have been causing for Web publishers. Spurred in part by Microsoft's announcement that even it was having trouble keeping up with requests for its blogs.msdn.com feeds, publishers have been talking about how much traffic a popular RSS/Atom feed can bring to bear. As one publisher in the eWeek article put it, "Any site that becomes popular is going to be killed by their RSS."

So what's the problem? Haven't web sites been able to keep up with traffic from all over the world for years now? It's true that web servers and protocols are very scalable, but RSS/Atom readers present a new kind of challenge. With a web browser, users visit a web site only while they are in front of their computer and reading that site--in other words, when they are actively browsing. An individual will visit some very large sites (such as My Yahoo or Google News) repeatedly throughout the day, but such sites are usually commercially run and able to support larger streams of traffic. The difference with an RSS/Atom aggregator is that it automatically pulls information from a publisher's site on a regular basis--sometimes as often as once every 5 minutes. Regardless of whether the site has changed or the user is out to lunch or home for the evening, the aggregator will update itself continuously as long as it is running, to ensure that it is able to present the latest information when called on by the user.

Related Reading

Content Syndication with RSS
By Ben Hammersley

Some people joke that a popular RSS site is indistinguishable from a security attack. In security circles, a large number of clients repeatedly making requests to the point of overload is known as a distributed denial-of-service attack, and attacks of this sort have taken down the largest sites on the Web, including Yahoo, eBay, and Amazon. For a small Web publisher, even a moderately popular RSS/Atom feed can cause serious bandwidth consumption, running up ISP bills and preventing users from reaching any part of the site. For larger publishers, RSS/Atom feeds can bring in many more users but can also consume extensive resources.

While many in the RSS/Atom developer community have long recognized the bandwidth overload problem, the possible solutions require that nearly all aggregators adhere to a variety of "polite" practices to ensure that servers are not overwhelmed. As of yet, not all aggregators have done so. Even where developers have made determined efforts, users want very fresh news and therefore often configure their aggregators to poll very frequently.

Bloglines As a Feed Cache

Bloglines is different from most other RSS/Atom aggregators. Like NewsGator, Bloglines is a server-side aggregator. This means that Bloglines maintains a database of RSS/Atom feeds in the same way Google maintains a database of web pages. Bloglines users query that database instead of polling individual RSS/Atom publishers from their desktop machines. In other words, Bloglines appears to publishers--and consumes bandwidth--like one single RSS/Atom aggregator but is able to serve tens of thousands of users.

By offering web services APIs, Bloglines is opening up its database of feeds for anyone to use. Any developer making an RSS/Atom-based application can draw from the Bloglines database, avoiding bandwidth overload for RSS/Atom publishers.

Bandwidth savings, though, is not the only reason to use Bloglines as a feed cache. RSS and Atom are emerging formats on the Internet, and there are many variations on feed formats to deal with. By drawing feeds from the Bloglines database, developers are presented with a single format--Bloglines normalizes all of the feeds it collects before distributing feed content. Another benefit is one that Bloglines users have long enjoyed: synchronization across computers. If you read news on one computer at work and on another at home, using a server-based aggregator lets you have the same set of feeds on both machines, and allows you to update those feeds as you read them from any machine. Using the Bloglines Web Services, client-side (desktop) aggregators can provide this same functionality. You could even use, say, FeedDemon on Windows and NetNewsWire on Macintosh, and share the state of your feeds between them through Bloglines.

While not all of the Bloglines features are available through its web services, many of the key benefits for publishers and users are, and developers have less work to make aggregators, too.

The Bloglines API Calls

The Bloglines Web Services APIs are made available through two simple REST-based URLs: listsubs and getitems. Both of these calls, as well as other APIs that Bloglines provides, are documented at www.bloglines.com/services/api. We'll first walk through the setup of a Bloglines API application, then each of the calls in turn. Finally, we'll look at a sample Bloglines API application.

Setup

Before getting started with your Bloglines API application, collect the following:

  1. All users of Bloglines API applications must have their own Bloglines account. For development, if you do not already have a Bloglines account, register for one now. If you plan to distribute your application to other users, make sure they know they need to get an account, and prompt them for their account email address and password. Once you have a Bloglines account, subscribe to one or more feeds so your account will have data in it.
  2. All Bloglines API calls are authenticated using Basic HTTP Authentication. Whatever programming language you use to develop your application, make sure you have a client HTTP library that provides authentication capabilities, or read up on how to implement authentication yourself (which isn't hard). Java and Groovy users will probably want to use HTTPClient; Perl users will want to use LWP. Other languages have similar libraries available. To authenticate, use the email address and password for your Bloglines account.
  3. The returned information from the API calls is an XML document containing the information you requested in the call. You will need to have an XML parser available, or you can parse the returned document yourself with regular expressions or otherwise.

When you have a Bloglines account, an authenticating HTTP library, and a way to parse XML results, you're ready to start making API calls.

listsubs

The listsubs call is used to list all of the subscriptions for a given user account. The APIs do not provide a way to add feed subscriptions to your account, nor do they provide methods for editing or updating feeds. In order to create or modify subscriptions, you must go to the Bloglines site and use the Bloglines web interface. After your subscription list is registered with Bloglines, you may access that list using listsubs.

The listsubs call is simple and takes no parameters. Every call to listsubs looks the same:


  GET http://rpc.bloglines.com/listsubs

listsubs will return by way of HTTP, using the HTTP response code to indicate the status of the request--200 OK to indicate success, and 401 Unauthorized to indicate that the given email address or password is not valid. Be sure to check the response code from the request and prompt the user to correct his address or password as needed. If the listsubs call succeeds, the HTTP response will contain an XML document with a list of the user's subscriptions. This response document is in OPML format but also contains some Bloglines-specific extensions to OPML: BloglinesSubId, BloglinesUnread, and BloglinesIgnore attributes on <outline> tags, indicating the state of that subscription in the Bloglines user account. BloglinesSubId is an identifier for the subscription within Bloglines' database--you'll need this later to request feed content. BloglinesUnread shows the number of items in the feed that the user has not yet read. BloglinesIgnore (where 1 means ignore and 0 means don't ignore) indicates whether the user wants to be notified of new items on that feed.

One item to note: Bloglines, like many RSS/Atom aggregators, lets users organize subscriptions within subfolders. As a result, the OPML file that listsubs returns may contain several levels of nested <outline> tags, some representing folders and some representing feeds. One good way to check an outline tag to see whether it represents a folder or a feed is to look for the presence of an xmlUrl attribute in the <outline> tag. If an xmlUrl is present, it's a feed; if not, it's a folder. listsubs will return a BloglinesSubId for both folders and feeds, so you can't use that as a distinguishing factor.

An example response to the listsubs call, along with more documentation of the call responses, is provided on the Bloglines site.

getitems

The getitems call is used to retrieve all unread items on a feed to which the user is subscribed; or all items since a given date and time. In order to make a getitems call, you are required to know the BloglinesSubId for the feed you want to retrieve, so you will need to make a listsubs call first and get the BloglinesSubId from the listsubs result.

Especially during development, you may not want to have your Bloglines API application update your feed states on the Bloglines server--you may want to test-read your feed in your application, and then later read the feeds on Bloglines itself. You can control whether the getitems will update the read status of the feed you request by adding n=1 (update) or n=0 (do not update) to the end of the getitems URL call. (A call to listsubs does not affect your read status at any time.)

If you do not specify a date parameter to the getitems call, the call will return an RSS 2.0 document containing all of the unread items in that feed. The number of items should be the same as was listed in the BloglinesUnread attribute returned by listsubs for that feed, but it may contain more--for instance, if another item arrives between the time of the listsubs call and the time of the getitems call. It could also contain fewer items, if the same user has read items through another application (the Bloglines web interface or another Bloglines API application). If there are no unread items on the feed, getitems will respond to the HTTP request with a 304 Not Modified response code.

You can retrieve items that you've previously read by specifying a d=DATE parameter to getitems. The date should be given in Unix time--that is, the number of seconds since January 1, 1970. As before, if there are no items on that feed after the date you specified, you will receive a 304 Not Modified response and an empty response body.

Here's what a typical getitems call might look like:


  GET http://rpc.bloglines.com/getitems?s=270&n=0

This call says that you want to retrieve all unread items for BloglinesSubId 270 and that you do not want your read status updated by this call.

Other examples of the getitems call format, and an example return document, are available on the documentation page for getitems on the Bloglines site.

Pages: 1, 2

Next Pagearrow