December 2003 Archives

William Grosso

AddThis Social Bookmark Button

Related link: http://www.oreillynet.com/pub/wlg/4105

Last year, a bunch of the O’Reilly webloggers made predictions. That didn’t work out so well (by my count, I scored a 2.5 out of 5. With a big miss at the number 1 and 2 slots). So this year, when Steve Mallet asked “what are you going to work on during the coming year,” I thought “This is easier than predictions.” So here’s the technologies I’m planning on learning more about in the first 6 months of 2004:

  • Lucene and Nutch. Call me a slap-happy fool but I think that understanding search and indexing algorithms, knowing when they’ll work and when they’ll fail, and being able to use them effectively, is going to be more and more important. I’ve already used Lucene quite a bit; I want to play with Nutch to hone my intuition (plus, there’s some very interesting scalability issues when you try to build a search engine).
  • XQuery. Jason Hunter gave a talk on XQuery at the Emerging Technology SIG and I thought “Oh shit. That’s compelling.” It just feels like a piece of infrastructure that’s suddenly going to become ubiquitous.
  • MySQL. I’ve never actually looked deeply into MySQL (never installed it on m home box, never actually put the pedal to the medal and played with it). I’m playing catch-up here, but I really ought to know a lot more about MySQL than I do.
  • Java 1.5. I’m a Java guy. That means that downloading the 1.5 alpha and playing with it is important to me.


The omission here is web-services. I’ve used (and quite like) XML-RPC. And I understand SOAP, and have used Axis. But the vast amount of stuff that sits above SOAP is still mostly unknown to me, and I’m not really curious about it right now (it feels like a huge stack of cards waiting to tumble. When it comes to middleware as complex as the “web-services stack” is turning out to be, I start yearning for the good old days).

What about you? What new technologies are you going to learn about (and why)?

William Grosso

AddThis Social Bookmark Button

Related link: http://blog.santa.com

The case against extending copyrights has been made, clearly and eloquently, by numerous people. The idea of the creative commons, of building on the artistic works of prior generations and thereby creating an ever more grand and rich cultural infrastructure, is a compelling vision.


But, sometimes, I wonder.


Santa apparently has a weblog. In which:

  • There are new elves with clever names like Rock Tock (Rock Tock likes to begin his sentences with “Dude”) and Cuckoo.
  • The elves are gender segregated. Boy elves work on boy
    toys; girl elves (named “Twinkie” and “Dazzle”) work on girl
    toys.

  • The boy elves are, ahem, interested in the girl
    elves and can be convinced to work on the girl toys (the girls fell behind in their work and needed the boys to help them out) because of that.

  • Comet and Vixen (the reindeer) are married and expecting.
  • Rudolph has a girlfriend named Clarice. In fact, Rudolph gets Clarice a position on the sleigh team because she’s his girlfriend.
  • Product placements abound. In ALL CAPS, of course (side note: I’m also a little distressed by the very existence of the Barbie Cruise Ship).
  • References to actual traditional Christmas themes or stories are non-existent. I’m not saying Santa’s weblog should be a litany of Christian thought. But maybe it could reach a little beyond elves making toys and various forms of pair-bonding (Rock Tock likes Dazzle. Vixen and Comet are married. Rudolph is in love with Clarice, and so on). Occasionally reinforce a platitude or two, that sort of thing.


Now, you might look at Santa’s weblog and think “Hmmm. A weblog attributed to Santa that makes frequent reference to BARBIE CRUISE SHIPS or KICK ‘N DRIVE GYMS [the weblog has them in all caps]. Kinda depressing, but not entirely surprising.”


Which is what really bothers me. Isn’t the current state of Christmas, that we’re not surprised that some guy in a marketing department somewhere took advantage of the latest communication tools to shill for profits (and do so in a way that continues the stripping of all meaning from the event), distressing? Doesn’t it hint, a little, at something being wrong with allowing anyone to build on common cultural themes? Isn’t it, even a little bit, a refutation of the notion of a creative commons?

Are there some cultural themes that should be off-limits, or more tightly controlled?

Eric M. Burke

AddThis Social Bookmark Button

I’ve been using OptimizeIt pretty heavily over the past week or so in an effort to reduce the memory footprint of a rather huge Swing client GUI application. We are finding quite a few memory leaks and making great headway. In a nutshell, I cannot imagine a company developing Java software without a license of OptimizeIt available to at least someone on the team.

In my previous weblog, I mentioned one complaint: lack of JBuilder support. They fixed this, providing pretty clear directions on configuring OptimizeIt to work with JBuilder. We found several performance bottlenecks in our Data Access Objects that would have been very difficult to find without a profiling tool.

Are you considering OptimizeIt? My advice…

  • You only need a few licenses. This is not the sort of tool you run every day. I find myself using it maybe once a month, in 2-3 day bursts of activity. It then sits unused for another month or so.
  • While OptimizeIt is a great tool, tracking down performance bottlenecks and memory leaks is damn hard. Again, don’t buy a license for every team member. Experienced Java developers will get the most mileage. Beginners probably won’t have much luck.

There are other profilers out there. I evaluated JProbe but found it consumed too many system resources — on my machine, it ran at least 4 times slower than OptimizeIt. I have not looked at other tools, but I’m sure they are out there.

William Grosso

AddThis Social Bookmark Button

Related link: http://www.bloglines.com

The other day, at the Emerging Technology SIG, Doug Cutting gave a talk on Lucene and Nutch. Before the talk, Doug casually mentioned that he used a server-based RSS aggregator.
Similarly, in the responses to my blog entry on RSS Aggregrators, someone mentioned they use bloglines.


This is interesting to me. In my mind, and I was probably guided by the intuition that a “web browser is a client,” RSS Aggregators were naturally client side. By which I mean, my first inclination was that RSS Aggregators naturally run on the end-user’s machine, rather than on a centralized server farm. There are counterexamples, though. For example,
Bloglines is an RSS Aggregator that runs out there somewhere and returns your results as a web page (and, by the way, Scott Rosenberg likes Bloglines).


Which led me to spend some time pondering: what’s the boundary line between “standalone application” and “server-based” application. That is, when should an application live entirely on an end-user’s machine, and when should it live on a server and be accessed through a client program (this distinction gets hazier in the case of RSS Aggregators, which are, in a loose sense, web-clients anyway).


The classic reasons for making an application a server-based application are:

  • Application Load. The application has some memory or cpu requirement that makes end-user machines not applicable. For example, an application that briefly requires 1 Gig of memory for efficient processing of an intermediate data structure.
  • Resource Sharing. The application enables users to effectively amortize the cost of some computational resource. For example, Google amortizes the cost of spidering and indexing.
  • Data Sharing. Many users, or applications, are using the same data set. In addition to search engines (sharing the index), this is the classic database-driven application. In addition, things like an authentication server (”single signon”) live here.
  • Connectivity requirements. The app has to be there on a 24 x 7 basis, or some simulation thereof. E-mail servers shouldn’t go off line (as end-user machines often do).
  • Manageability (it’s often easier to manage a data center whose configuration you control than it is to repeatedly deploy complex functionality on thousands of desktops).
  • Accessibility. It’s easier to access your information if it’s stored in a central repository. It’s easier to access an application if it’s running on a server.
  • Security. If some information needs to have access restricted, it’s easier to manage that control centrally.


The classic reasons for making an application stand-alone are:

  • Responsiveness. A local application has the potential for a better user experience. Any time you insert round-trips to a server, you add the potential for the user to wonder “What’s it doing?”
  • Application load. While the individual client might not need a lot of resources, the overhead of serving many clients can overwhelm a server-based design.
  • Sheer performance. Some applications (read: games and complicated spreadsheets) are simply infeasible in a server-based model. This is actually a combination of the first two, but I think it deserves its own bullet point.
  • Personal information. It’s difficult to store deep amounts of personal context on a server. If an application truly benefits from a large amount of personal context, then it’s probably a standalone application.
  • Security. The user might have qualms about storing personal data somewhere remote. In addition, a security hole can compromises many people in one exploit.
  • Standalone aspects. What if the machine isn’t connected to the server? If someone is going to be intermittently connected, or in low-bandwidth situations, standalone might be the way to go.


Of course, I’m blurring the lines and ignoring fat clients that do more than provide a better gui (e.g. which slide some “server” functionality over the client). It’s a simple list. And there’s nothing in here about P2P applications or the ways in which the faster release cycles engendered by web-based applications can be a significant competitive advantage. But I still think it captures a lot of the considerations and so I’d like to ask:

Did I miss anything in these lists?


Now the interesting thing is to think about RSS Aggregators. Why is Bloglines an internet-service and why is FeedDemon a standalone application?

Obvious things

Let’s start by making the easy comparisons. From the end-user’s perspective, the standalone approach has the following advantages:

  • A richer user interface (although note that Tim Bray doesn’t think this is obvious).
  • Better performance on small feed sets. There’s a caveat here: I’ve only played around with small feed sets on current applications (approx 100 feeds) where the feeds get updated frequently. If you have a lot of feeds which are infrequently updated, the bandwidth of fetching old feeds might be significant (unless people are starting to use last-modified again, which would be nice).


From the developer’s perspective, the standalone approach has the following advantages:

  • No need to worry about scalability concerns.
  • No need to create an administer a server farm.
  • Better support from IDEs and other development tools.


From the end-user’s perspective, the server-based approach has the following advantage:

  • Location and OS transparency. You can use it from anywhere (or, at least, from any PC. There’s not a lot of “use it from the cellphone” going on yet).
  • Ability to use a customized browser (for example, one with advanced pop-up blocking, tabbed browsing, or searching functionality). Similarly, integration into the user’s standard browser (ability to bookmark an article for later) seems like an advantage.


From the developer’s perspective, the server-based approach has the following advantages:

  • No need to worry about deployment of complex applications to uncontrolled environments.
  • Ability to use large, server-side libraries and pieces of functionality.
  • Fast release cycles. The ability to quickly modify and update code.

Applying the Server-Based / Standalone Bullet Points


With that out of the way, let’s talk horse-racing. Given that you can build an RSS aggregator that’s server-based or standalone, how do they compete with each other? How will they evolve?

Server-based designs


How do you, as the designer of bloglines, make your application compelling? Well, you want to build something that is a classic server-based application (cause you’re server based and it makes sense to leverage that). You want to add features that require resource sharing, data sharing, or connectivity (you’ve already got the accessibility thing nailed).


What do those look like? You might think connectivity’s a nice one. If you can stay up 24 x 7, and you can cache RSS feeds, then people can find out about blogs which are currently off-line, but have changed. The problem is: this assumes the feed indicated a change, but then the site went off-line. And if a user is interested enough to wonder whether a feed changed, they might want to be able to fetch the article. Which means this isn’t that big an advantage (the feed, or the site, being down is pretty much a bummer, unless your aggregator’s going to cache a lot of data for people).


Data sharing? Well, there’s potential here in that the RSS feeds are fetched much less often. This is a very good thing for authors with low-capacity servers and interesting weblogs. But it’s not so compelling for the end-user. Unless we run into a scenario where a significant percentage of weblog’s are up, but responding slowly. Or, a scenarios which is perhaps more likely, a significant percentage of weblogs decide to give higher priority to server-based RSS feeds on the theory that doing so will decrease their overall load.


Resource sharing? Here’s where the server-based designs have a chance to shine. Bloglines has features like Top Blogs, Blog Recommendations, and the ability to subscribe to a search which are hard to imagine incorporating into a standalone design.

I think these resource sharing functions are the compelling advantage bloglines has. The interesting thing is, of course, that other applications which aren’t RSS Aggregators (like Feedster) also offer some of them.

Standalone designs


How about the other side? How do you, as the designer of FeedDemon, make your application compelling? Well, you want to build something that is a classic standalone application (cause you’re standalone). You want to add features that require significant personal application load, personal information, or enable you to run even when you’re not connected to the net (you’ve already got the performance thing nailed).


The last of these is the easiest– it probably means building a local database and having a “fetch my web” feature for offline RSS browsing. Given that even the FeedDemon help is on-line right now (the help system sends you to online help pages), it would appear that this isn’t a priority (in spite of the “work offline” button, which seems to simply prevent FeedDemon from attempting to talk to the world).

“Fetch my web” seems nice even when you’re on-line too. Wouldn’t it be great to improve the performance of the web by having a predictive cache? Of course it would. And by subscribing to feeds, I’m telling the web browser exactly how to build the cache. The software gets simpler, and better.


In slogan form: UI is Better than AI.


How about significant load or significant personal information? What could you add to an RSS Aggregator that would make it more useful along these lines? Well, the obvious thing is memory: Suppose the RSS Aggregator not only knew about your feeds, it know about which articles you fetched over time, and was somehow taking advantage of that big database of information. Suppose you could search the database for old blog articles (though, in a shamless personal plug, I’ll point out that you can do this for bloglines by incorporating the toolbar I helped build into your web browser)?

Platform Thoughts


Another point, which isn’t necessarily client or server based, is that applications are platforms. By building a server-based application, and relying on a web browser for your client, you are doing two things: you are limiting the extensions that third parties can make to your application to browser-based plugins AND you are enabling the existing browser-based plugins to augment your application.


On the other hand, if you built a robust plug-in architecture into your standalone aggregator, it’s possible that you could harness a intermediate-to-long-term competitive advantage– as RSS grows in importance, and
we all believe it will, people will want to customize their RSS experience (on the other hand, you have to support a developer community. Uuugh).

Are RSS aggregators naturally standalone or server-based. And where do P2P, the Semantic Web, and Worse is Better fit into all of this?

William Grosso

AddThis Social Bookmark Button

Related link: http://www.bradsoft.com/feeddemon/index.asp


Over the past month or so, I’ve been polling people and asking “Who uses an RSS aggregator” (and, quite often, backing off to “Who knows about RSS”). It turns out that, even in the software community, not a lot of people are using aggregators (though it’s starting to occur).


So, to help out in a very small way, I offer the following advice: WildGrape is a very good “starting aggregator” (people tend to “get” the idea very quickly). On the downside, it requires .NET 1.1, and the project looks a little moribund right now.


On the slightly less-friendly-but-possibly-more-advanced front
FeedDemon is the best aggregator I’ve found (for Windows). Note that I’m not claiming categorical coolness, or “best possible,” just that it suits me.

Note that both of these really do need pop-up blockers. Why would anyone build an RSS aggregator (or any other application that views web pages) and not include a popup blocker?


But since the starter feeds in both of those are limited, I figured I’d go once step further and offer a new starter set, right here, on this blog.


Click here to download them.


There are only 100 or so feeds here (I cleaned a few out, to make this a more manageable list) and the categorization is a bit off-kilter (I tend to put people with strong voices in the “pundits” category) but … if you download an aggregator and want to quickly add some feeds, these might help.


The obvious disclaimers: I don’t get any money from any of this. I don’t know the people who built the RSS aggregators, and I don’t know most of the people writing on the weblogs mentioned here. And if I left your site off the list, it’s not because you’re not special. It’s because I felt like limiting it to a small set of “central” feeds.

Where do you go to find feeds that might be of interest?

William Grosso

AddThis Social Bookmark Button

Related link: http://news.com.com/2100-7355_3-5119440.html?tag=nefd_top

Ever have one of those days where it feels like the universe is distintegrating around you? Ever feel like a Roman, staring out over the city walls at the vast teeming hordes of barbarians?

Item: I already get 200 or so spams a day. Enough that, even with filtering, e-mail’s become much less useful.


Item: I’ve already blocked Windows Messages, because of Windows Messaging Spam


Item: In the past month, I’ve noticed a dramatic upswing in AOL Spam, even though I haven’t exposed my IM id any more than usual. At this point, roughly once an hour while I’m on IM, “Aimee 12779″ (or the equivalent fake address) sends me an enthusiastic IM about sex with farm animals (or similar topics).


Item: Bruce Schneier is speculating that the MSBlaster virus might have helped sink the power grid back in August.


And now:, Secunia is saying “Don’t follow links from untrusted sources.”

My first reaction when I heard Secunia’s advice was “Ummm. Yeah. That’s advice I can follow.”


But it raises an interesting question…. We already have ways to turn off “adult content” in google. Maybe there’s a way to tune search engines to only return trustworthy links.


If I hand you a link, could you tell me whether you would follow it?


What makes a link trustworthy? My first guess is that, if a link’s been there a long time, it’s more likely to be safe.
Links on pages that are part of large sites are more likely to be safe, as are links on pages that come from large companies?


But what else? What determines whether a link is trustworthy? And is detecting “links you don’t trust” really any different to, or harder than, detecting spam?

What other parameters are there for determining link trustworthiness?

Advertisement