September 2003 Archives

Eric M. Burke

AddThis Social Bookmark Button

People like Microsoft Word. It is easy to use and widely available. So you end up working on teams where lots of technical documents are created as Word documents. Things like project standards, procedures, tutorials, use cases, requirements, meeting minutes, etc.

I contend that project-related documentation should be designed for easy collaboration. Word documents are easy for the original author to create, but lousy for collaboration.

Some technical complaints…

Word docs are binary. This means they don’t play well with version control tools. It is next to impossible to determine what changed between successive revisions of binary files. You also end up in situations where only one person at a time can update the document. If concurrent edits occur, you have to merge changes manually.

Word docs are slow. When documentation lives in a web site, you just click on hyperlinks and view HTML in your browser. Although you can embed links to Word docs in your HTML, those docs can get quite large and take too much time to load.

Linking is lousy. With HTML, you can cross-reference docs quite easily. Word docs are much more self-contained, making it very difficult to cross reference a specific portion of a given document.

Searching is lousy. As you build a library of project documentation, how do you search for information when it is scattered across lots of isolated Word documents?

etc…I could really come up with a lot more reasons why the most popular documentation tool introduces lots of problems, but I’m running out of time. So here is an alternative.

Another option

I believe tools like TWiki offer an excellent alternative to Word documents for many kinds of documentation. I suspect that most non-technical people haven’t been exposed to Wikis and don’t know what they are missing. Why not set up a Wiki for your next project and see how it goes?

NOTE: I’m picking on Word, but this same set of arguments is equally applicable to any word processor. While word processors are good for creating formatted documents easily, they are not the best choice for creating a “web” of easily accessible and maintainable project documentation.

William Grosso

AddThis Social Bookmark Button

Related link: http://research.microsoft.com/~Gray/

I went to see Jim Gray speak the other night. He was the first speaker in this fall’s Distinguished Speaker series at SDForum.
I liked the talk a lot. In particular, I very much enjoyed the part of his talk dealing with Distributed Computing Economics.


The argument itself is basic economic analysis, and can be boiled down to the notion that since everything costs money, you should consider the costs of everything when building applications. In particular, Gray focuses on the costs of cpu time (small, and dropping all the time) and the cost of network bandwidth (not so small, and decreasing at a slower rate). By putting actual dollar values on things, Gray is able to draw some startling conclusions about when it makes sense to use grid-computing techniques, and when it makes sense to either use a LAN-based system or a single machine (as opposed to distributing the computation over a WAN, or using “on-demand” computing).


In particular, he says the following: the break-even point is 10,000 instructions per byte of network traffic, or about a minute of computation per MB of network traffic. That is, unless the cpu time at the other end of the pipe is free, and you get a minute of computation for every MB of data you send to it, you’re better off doing the computation locally.


There’s an interesting reverse to this. If you’re running a database on the wire, you’d much rather someone ask you to do a computation than ask you to send a large amount of data in response to a query (the economics apply when you’re sending an answer as well).


Two things struck me while Gray was speaking. The first is that the analysis isn’t very different from that in Gray’s classic papers on the five minute rule. But despite the fact that a Turing Award winner repeatedly uses this style of argument, I don’t see it being applied very often in other areas.


The second is that I think it very much applies to the semantic web. If you’ll recall, the idea of the semantic web is to create a giant distributed knowledge-base, with lots of information encoded in RDF triples so that the machines, as well as the humans, can process the data.


Now along comes Gray, making an argument that, when you think about it, implies that the semantic web, as currently conceived, might just be all wrong. His basic point is that it’s far cheaper to vend high-level apis than give access to the data (because the cost of shipping large amounts of data around is prohibitive). Since the semantic web is basically a data web, one wonders: why doesn’t Gray’s argument apply?


Here are three possible counterarguments:

  1. The idea of the semantic web is that there are literally hundreds of thousands of data sources. In such a universe, the only feasible programming model is to gather data into a central location and then perform the computation (coordinating a distributed application on such a scale is simply not feasible).

  2. The point of the semantic web is that it concerns data which is inherently impossible to gather in one location. Gray’s economic argument doesn’t apply because it assumes that it is possible to put the application on a LAN (or using high level apis) instead of fetching data over a WAN. Clearly that’s not currently the case for web-applications like Google, and the proposed semantic web applications (what are they, anyway?) are more google-like than not.

  3. Gray’s argument assumes infinite divisibility of computing resources. While it may be true that, once you’ve bought a computer, the cost of computation is cheap, you can’t buy a single unit of computation– you have to buy the entire computer and then amortize. So, depending on cash flow considerations, and the amount of computing power you really need, some applications might still make sense in an on-demand model.


My point? In everything I’ve read about the semantic web, nobody’s addressed Gray’s implicit question. Have I missed a large section of papers? Is it obvious that one of the above three arguments is the “killer rejoinder” to “vend high level APIS, not data”? Is the semantic web really about APIs (and I just missed it)? Or is there a crucial hole in the roadmap to the semantic web?

So what’s the deal? What applications will the semantic web make both possible and economically feasible?

William Grosso

AddThis Social Bookmark Button

Related link: www.google.com

A friend and I had an interesting conversation the other day. It turns out that both of us think that, in general, searching the internet for high quality partially specified information is harder than it was a couple of years ago. Putting that into English: We use search engines less, and are more frustrated when we use them. In particular, I find Google frustrating these days– with the exception of the cases noted below, I always wind up making three or four searches to find anything (not “looking at three or four results” — “making three or four searches”).


So we both use search engines less. And we’re both building lists of “reliable sites” (frequently weblogs) that we read regularly, and decreasing our reliance on search engines these days.


Of course, there are lots of useful cases where the internet has gotten easier to search. Here’s, for example, are two searches I make all the time:

  • Want to find something that you know is on MSDN? Use Google with MSDN as a keyword string. It’s wonderful. Google indexes MSDN much better than Microsoft does (which might well explain Microsoft’s recent efforts. In general, if you know enough details about the thing you’re looking for, and there’s a single canonical result, most search engines do fine.

  • If you know the Java class name, the easiest way to pull up the javadocs for the class is often to use google with three keywords: “java”, “class” and the actual classname.


But, in general, I’ve felt that things have slid backwards in the past year (in fact, Seruku grew out of personal frustration). And I’ve been noticing that other people are starting to voice the same opinion.


And what I’m wondering is:

Is this really true? Is it getting harder to find things on the web? Are search engines really getting more frustrating?


The reason I’m asking is, of course, that it’s entirely possible that I might simply be spoiled. Since the conversation I mentioned above, I’ve asked 13 more people. 9 agreed with me; 3 think the quality of search engines has held constant, and 1 thinks that things have improved in the last year.


But that’s my circle of acquaintances. I’m still not sure what’s really going on in the wider world.

What do you think? How has your “search engine experience” changed in the past year?

William Grosso

AddThis Social Bookmark Button

Related link: http://www.nationalreview.com


This entry, while it mentions politics, is not about politics. I’m not particularly political, in any case. What I am is something of a politics junkie.
And, as a politics junkie who works in technology, one thing I’ve noticed is that conservative groups (magazines and political orgs), in general have been a bit faster on the technology uptake than the other political varieties.


Case in point: National Review has made the transition to mostly electronic form, and has done so in a measured and reasonable fashion.


They’ve had a weblog for quite some time. They call it the corner, and it’s a refreshingly impromptu sort of place. In addition to the occasional shilling for their print magazine (charmingly referred to as “National Review On Dead Tree” or “NRODT”) and some rather standard conservative fare (including a number of dismayingly dense discussions about why it’s hard for Republicans to win California elections), it’s contained a number of links to political debates going on around the net, hosted any number of fairly obscure discussion threads, and sports the occasional bizarre non-sequitur (yesterday, for example, it was the engimatic one line entry: “I went to high school with the dwarf king of Mordor. He was a good dude.” I read the corner fairly often, and I have no idea where that came from).


In addition, yesterday they started offering National Review Magazine Online. That is, you can now buy the entire print magazine, offered as a PDF file, for a substantial discount over NRODT.


And they’ve introduced it well. Here’s an example from one of the articles heralding the new version of their magazine.

When I first came to work at NRODT we still used manual typewriters (Royal Standards). Great machines, but time has gathered them to the passenger pigeon and the Great Auk. Join National Review in the next phase of journalism in its digital incarnation.


The point is: National Review is open to technology, and they’re moving (whether they realize it or not) to an all-electronic presence.


And they’re not alone. Their political compadres are all doing this too. The right wing gets the web in a way that the left simply doesn’t (as far as I can tell).


And what I’m wondering is: why?

Name a left-wing publication or organization that really gets the web?

William Grosso

AddThis Social Bookmark Button

Related link: http://www.accelerating.org


Prodded by the inestimable Mark Finnern, I signed up for the accelerating change conference. Immediately after I registered, they asked me to answer three simple questions about the future and I thought– I’ll ask everyone who reads this weblog to answer them too.


Here they are:



  1. Passions and Futures.
    What subjects fascinate you most? Where will the world, and you, be in the next 30 years? Do you expect continual acceleration of technology? What are the risks and opportunities? What should be our development priorities?


  2. Projects.
    Existing or potential projects or unsolved problems you’d like to work on or are working on. Areas of collaborative opportunity. Business, social, and personal issues of accelerating change and technological development you find challenging, and want to discuss in the group.


  3. Resources to Recommend.
    Personal web page, if any. Groups you are affiliated with or promote. Web community and other info sources you use and recommend (e.g., sites you regularly read/participate in, news sources, magazines, tools, techniques, courses, other “conversations” you value).

I’ll post my answers in a couple of days. First, I want to hear what you think.

How would you answer these questions?

William Grosso

AddThis Social Bookmark Button

Related link: http://www.sdforum.org/sigs/emerging

SDForum’s Emerging Technology SIG is a monthly meeting focused on the short-term (2 to 4 year outlook) future of technology. The goal is to talk about technologies that aren’t quite here yet, to discuss trends in software technologies, and, in general, to examine what’s coming down the pipeline. We meet on the second Tuesday of each month in Palo Alto, CA.


Our next meeting will feature William Jolitz speaking on
TV Quality Reliable Wireless Video Streaming
.


Here’s the abstract:


Everybody wants to watch movies on a computer, but nobody does. It’s worse on wireless, and even more hobbled on a cellphone or PDA. So what’s a service provider to do? The customer expects TV or the service is toast. Is this possible?


The problem’s maintaining continuous service delivery quality. We know this, because downloaded movies play just fine. But download movies have other problems - copyright, storage demand, and unpredictable viewing availability. While there are cures, the cure is often worse - killing
the emerging business. Let’s get real - download is really “plan ahead, pay once, use forever”. Streaming is “pay per view” - what the service provider wants. It would be so much easier if reliable streaming worked
over wireless.


Today we either rollout unreliable wireless services or demo reliable services in a fantasy sandbox - the real world is a hard test. When you still can’t get good cellphone coverage on the financial mecca of Sand
Hill road, can you expect to pull out a PDA at Starbucks and get Shrek even half as good as a cheap TV can? Yet large scale wireless deployment is happening worldwide - the business opportunity of delivering on the services that get a ROI on this immense gamble is extreme.


We can make reliable streaming over wire connections right now using bandwidth surplus, fast failover switching, and careful network design. But it still takes remarkably little to destabilize a single stream, let alone thousands in competition for transport resource. In studying the stable stream we’ve found a novel hardware/software approach
for wireless video.


The network hardware is InterProphet’s Silicon TCP stack used as distributed transport that can achieve ontime delivery of video. A software framework that provides for reliability and embedded monitoring is a critical part of the deployment story in managing the infrastructure. With it, customer quality is maintained while challenges to the service are acted upon before a loss of quality can occur. Such a framework is repurposed from delivering reliable web services, as described in Jolitz’s article on “Web Services in Datacenter Environments” in the April edition of Dr. Dobbs Journal.

What emerging technologies do you think we should be covering?

William Grosso

AddThis Social Bookmark Button

Related link: http://www.craigslist.org/eby/sof/15657614.html


Let me be clear. I have nothing against “sweat equity.” A number of companies were started by people with a dream and very little cash. There’s an honorable tradition of starting off in a garage. And I think it makes a lot of sense that, during a downturn, a lot of engineers will work for equity, trading their skills for a stake in something that could be big.


But a lot of the advertisements being posted these days seem to go way over the line.


Case in point: a recent advertisement on craigslist. Before I spend a lot of time ripping it to shreds, I’d like to point out that I don’t know these people, I have no idea who they are, and, for all I know, this could be a case of good people with great intentions simply writing an advertisement that I’m misreading into oblivion.


But let’s look at it.


It’s an advertisement for programmers who will be working from home at your own time and dime. That means that you provide all the infrastructure, the equipment, and the tools. I wouldn’t be surprised if one programmer runs the CVS server on a personal machine, and another runs the build server on their personal machine, and so on (assuming, heh heh, that there is a CVS server).


What sort of programmer? Well, from the sound of

You should have deep experience and application of object-oriented methodologies and design patterns. You should have very strong experience with J2EE technologies including EJB, Servlet, JSP, Application Servers, Oracle


They’re looking for someone who’s at least an intermediate level programmer (depending on what deep means). Let’s say the minimum is 5 years of experience with a substantial project or two under their belt.


In addition, they want the programmer to be:

Highly intelligent and a quick learner with good problem solving skills and ability to think outside of box


So, we’re talking about a good programmer with some experience and lots of potential.


What are they offering? Well, they’re explicit: this is a volunteer participation offering. That means no money, no equity, nothing tangible at all. The only promise you get is that the later stages will evolve into equity compensation based on performance.


Interesting, that. Leaves you completely at their mercy as far as equitable arrangments. And, of course, I’d bet that you have to sign an NDA and a non-compete agreement in order to participate as well.


“What about intangibles,” you say. “Maybe if they were well-known visionaries with a history of inventing the future, this might be interesting?”


Nope. They’re a group of [anonymous] executives from ‘Blue-Chip’ companies. Let’s parse that further. They’re people from a non-technological background who want to build Business Applications, and who need programmers. Presumably, their application isn’t based on technological innovation, but on their ability to identify a latent market that no-one else has served (yet).


Since they’re blue-chip executives, let’s assume they’ve correctly identified a potential market. But, and this is the interesting part, they don’t have a lot of faith in their vision. They aren’t willing to risk any of their blue-chip-executive-level salaries on their idea, and either haven’t tried or haven’t been able to raise any money from anywhere else.


Not to mention they’re, ummm, naive if they think that relying on highly qualified volunteers will work — paradoxically, someone who matches the above description and signs on isn’t going to contribute much. Either she already has a paying job, or is actively looking for a paying one. And the next guy to volunteer is going to spend a lot of time figuring out what the last guy did, because software is knowledge-intensive. It’s not like, well, production lines at blue-chip companies, where you can replace the guy who left pretty easily, and the new guy’s startup time is minimal.


So we’ve got:

  • (definitely true) Want very good people.
  • (definitely true) Offering no monetary or equity compensation.
  • (very likely) Little or no faith in the business opportunity.
  • (very likely) Little or no experience in managing a software company.

Why would you sign up at all? Well, they do mention that

You will be a vital part of the engineering team and will participate in all aspects of software development, including architecture/design, coding, and testing. You can expect to work in a fun and fast-paced startup environment, where you will make a key contribution, interact with star engineers and expand your skill set horizons.


The only problem is: this sounds … optimistic. You get to sit at home, in front of your computer, working for them. “Fun and fast-paced” ? Yeah, and putting Velveeta on a Dorito is a party in your mouth. You get to work mostly by yourself and that’s certainly better than some work environments. But will it be fun to donate your efforts to risk-averse blue-chip executives in the hopes that one day they might decide to give you some equity? Personally, I’d find it demoralizing.


Well how about those “star engineers” ? Well, working from home limits the potential interaction in two crucial ways. First, star engineers are usually busy, and you’ll be far away. You can’t knock on their door or cubicle wall and say “Hey! Let’s talk about the architecture cause there’s something I don’t get.” And, second, interactions over DSL tend to be impoverished anyway (e.g. they tend to be IM messages). So the interaction with “star engineers” (if there are actually “star engineers” involved, and I truly doubt it), will be limited to brief bursts of instant messages, followed by long periods of aching loneliness.


So, brought back to Earth, this really says

We’ll assign you various tasks that are all over the map, depending on what we think we need right now (and, remember, we’ve never run a software company before). We’ll expect you to complete them at home, in a sort of quasi-isolation and with minimal interaction with your peers. And, of course, forget about learning anything related to the business side of the house: we’re putting you firmly in the “technology ghetto.”


Not only that, we’re going to use the old “fast-paced startup environment” excuse, and the carrot of a possibility of future equity, to work you hard.


“But wait,” you’re saying. “Maybe it’s a really fascinating and cutting-edge application and I’ll learn a lot about new technologies.”


There’s really only one response to that:

Business Applications in mobile and web-based environments.


It’s probably something like “Our application will enable factory supervisors to check the production line’s status from their PDA.”


The whole advertisement is just depressing. The kindest reading is that some guys with a half-baked idea and little or no stomach for actually taking risks decided to see if they could get some programmers to assume all the risk and work for them for free.


The more likely reading is that some unscrupulous hacks are trying to take advantage of the downturn to build their business on the backs of the unfortunate.

William Grosso

AddThis Social Bookmark Button


My friend Richard, who runs
the Business Intelligence SIG
for SDForum,
is putting together a slightly larger event; it’s a panel discussion
on relational databases.


He’s got a lot of interesting ideas in his head, and they got me to wondering: is the role of the relational database going to change dramatically in the next 10 years?


Right now, relational databases are used in almost every enterprise application. That’s a slight exaggeration, of course, but lots and lots of applications use them. And they’re convenient, and cheap, enough that they’re surreptitiously included in lots of client applications as well.


The key benefits that relational databases bring are persistence, transactions, reliability, and indexing. The price you pay is that your data has to be shoehorned into the relational model. Sometimes the shoehorning is gentle; sometimes it’s an act of violence. And you’ve either got to write the queries yourself, or use some sort of object-persistence tool. And you’ve got all the overhead of the RDBMS system itself.


I like using databases enough that, on occasion, I’ve advocated using them everywhere. For example, this snippet from 1997 on the original wiki.


I also think the use of embedded databases is going to skyrocket. Every time I look at them, I start drooling. Not because I have a compelling new application that requires them, but because I think they make a lot of currently-complex tasks a little easier. It’s going to require a mindset-change on the part of a lot of developers (using an RDBMS for persistent data is a lot different from using an embedded database within an application for non-persistent data), and it’s emphatically a step away from OO onto a much more declarative path, but I think the potential is mind-blowing.


What I was doing there was looking at Moore’s law, looking at how relational databases simplify my life in the enterprise application universe, and thinking “oooh. Moore’s law says that I can get me one of them database things in every process. Cool!”


But here’s what Moore’s law really says:


You can get you one of them indexing and persistence things in every process. And, as time goes on, you’ll be able to spend more and more cpu cycles on the indexing and persistence thing.

E.g. Moore’s law gives me permission to use indexing and persistence engines in every process, but it doesn’t insist that I use relational databases. And I’m starting to think I don’t really want to, for five reasons:

  • The fields in structured data are often disjoint sets (Peter Norvig first pointed this out to me). Suppose you have a database table for books, and it has the following columns: PUBLISHER, YEAR_PUBLISHED, AUTHOR, and PRICE. It’s fairly clear that “SELECT * where PRICE = $19.99″ is exactly the same query as “SELECT * WHERE [any column] = $19.99.”


    Side-note: does anyone have any references for this? It seems true enough, but have there been empirical studies done?

  • The fields in structured data are often enumerated types. Sometimes they’re numbers, and can take on any value. But they’re often one of a small list of nouns.

  • The world is full of semi-structured data (I count XML in here, but there’s lots more) that’s hard to fit into the relational model.

  • Text indexing systems like Lucene are pretty much available for free in any programming language you want to use. And they do a bangup job.

  • It sounds an awful lot like the next generation of operating systems are going to offer much better indexing into the file system as a matter of course.


If you’re reading this and thinking “all he’s saying is that lucene is useful for indexing xml fragments,” you’re halfway there. And If you’re an XML lunatic who then says “Hey! Wow! And the world, in its entirety, is entirely composed of XML (or possibly RDF) fragments,” then you’ve gone way too far. What I know is that the world is mostly made up of semi-structured data and I know that database schemas often evolve at a ferocious rate because, when we impose more structure, we often get it wrong.


And so now what I’m wondering is if I was completely off base in 1997. That is, I’m wondering if Moore’s law really says that relational databases are going to become vastly less important over time, because for most applications there’s a less-structured (and less efficient) way to do things that’s more convenient for the programmers.


In much the same way that we moved to “higher level” and “scripting” languages, I’m starting to think we’re going to move backwards, towards “more primitive” indexing systems where we just toss all the documents into the indexer and then pull out things based on text search.


The embarassing thing about this little essay (it’s too long to call it an entry) is that I think I might have just understood Perl for the first time.

What do you think? What’s the role of the relational database in 2006?

Advertisement