Related link: http://jroller.com/page/globalblogger/20040727#phoenix_no_fluff_just_stuff

A few weeks ago, I spoke at the No Fluff Just Stuff conference (highly recommended, by the way). During one of the speaker panels, one of my answers got blogged. Since the blogged answers were probably representative of what I said, but not of what I was thinking, I’m now taking the chance to correct the record. In my last post, I explained why programming languages are unlikely to change much in the next 5 years.

Today, I’d like to address the second part of what I said. I said something like “Instead of changes in programming languages, there are going to be changes in libraries. The biggest of which is that we’re all going to be worring about how to deal with huge masses of semi-structured and slightly incompatible information.”
And then I put a plug in for GATE.

Emad Benjamin blogged this and then wrote “I think he is walking the google line here.”

Which is interesting, but, I think, false. I tend to doubt that google is walking this particular line. And ownership of the line in question really belongs to Jaron Lanier anyway (I also highly recommend Jaron’s talks. If he’s in your area and speaking, GO).

Let’s talk about the line. Here’s some premises:

  • Today, approximately 50 billion years after the first computer programs were written, the vast majority of systems are still standalone.
  • In fact, most business systems are silos.
  • SQL is all about being able to access data, no matter which RDBMS it’s in.
  • CORBA was all about interoperability.
  • The web succeeded in a phenomenal way for a lot of reasons. But at least one of them was the fact that, finally, people could actually access each other’s information.
  • Suites like Microsoft Office use data compatibility as a powerful sales force. If enough people upgrade, then everyone else has to as well (Office used to do this; other products still do).
  • Data warehousing is really all about data integration.
  • Most XML usage is about being able to access other people’s data.
  • Web services were adopted because of interop.
  • Web services continue to evolve in the direction of increased interoperability (coarse grained service oriented architectures are better for interop than SOAP).

In fact, I would claim that most, not all but most, of the major trends in Enterprise Software (which is, for the most part, where the money is) for the past 20 years have been either primarily or largely concerned with interoperability.

Now, if you think about it, there’s are two possible reasons for this. One is that we, as an industry, know how to do interoperability really well, and we’re sticking to what we’re good at. Just like a shoemaker who is really good at making wingtips, and therefore only makes wingtips, we only do interoperability.

I don’t really buy that one. Do you?

The other is that interoperability is one of the major problems with computer systems today. After a couple dozen industry-wide shared solutions, most applications still can’t share data or work together in any meaningful way.

Now you could say “as soon as we all adopt the same set of XML formats for all our business processes and all use loosely-coupled service-oriented architectures, this problem will go away.” And I will cheerfully smile and nod my head and tell you that, by golly, you have a point there. And I will also make a mental note do not buy software from this guy.

Instead of the panacea du jour, I will put forth the following deeply skeptical (of current solutions) and highly optimistic (about future technology) proposition:

We will get out of the interoperability mess when a piece of software can run across a file format it has never seen before and, perhaps with an end-user (not a programmer) answering a few simple questions, figure out if the file contains useful data, what that useful data is, and be able to handle files in the same format later without any additional help.

Then, and only then, will we start to make a dent in the interop problem.

And, I think that by 2009 we will be starting to approach this level of fluency. Programs will simply be able to handle any data we throw at them, in a reasonable and robust manner. And tools like Lucene and GATE (and jtidy for that matter), along with really fast CPUs and tons of memory, are a very good start.

Was that walking the google line?