February 2008 Archives

M. David Peterson

AddThis Social Bookmark Button

I’m @ Bungee Labs today, tonight (*ALL NIGHT*) and into tomorrow morning as a judge for the Bungee Connect WideLens Intern DevFest. I plan to keep this post updated with progress from the event — action shot pics and comments from the joy, anger, laughter, frustration, etc. that takes place in the world of competitive application development. There are nine CS students from across the planet here in Orem, UT. Four of them gain a spot as an intern here @ Bungee Labs this summer, so the stakes are high.

Pics of the potential interns follow, led by a pic of the VP of Community here @ Bungee Labs, Alex Barnett,

Simon St. Laurent

AddThis Social Bookmark Button

In 2001, the sky fell for web development. Everything fell as the dot-com bubble broke. In 2008, even though the US economy’s not looking so good, there’s more hope for the web to follow the economy’s course rather than shatter and fall below it.

Rick Jelliffe

AddThis Social Bookmark Button

Prof. Rob Cameron of Simon Fraser University has just announced on the XML-DEV mail list his open source Parabix XML parser, which seems to set new benchmarks for parsing speed, using the SIMD instructions of modern processors.

I am particularly interested in this, because a year ago when Cameron released his UTF-8 converter that trialled his approach, u8u16, I said

I would love to see an XML parser that combines Cameron’ SIMD work with the optimizations from IBM’s XML Screamer, which seem to increase the speed of Java processing by two or three fold.

I’ll have a look at this over the next few days, time permitting, in more detail. There are not many areas in text processing where there is new work being done: the 60s and 70s saw most of the basic work and data structures, so I think it may be a quite startling development. Well done, Rob!

Intel has also being doing work in the area of hardware speed-ups to parsing. Anyone else doing research in this area?

Rick Jelliffe

AddThis Social Bookmark Button

PRESTO is not something new: its basic ideas are presupposed in a lot of people’s thinking about the web, and many people have given names to various parts, but I don’t know that anyone has given a name to this package. In any case, this combination of ideas which seems to me to be the sweet spot of practicality for large public document sets seem to have escaped the way that we approach many problems and systems. However, the question I ask is “How else are you going to do it?

The elevator pitch for PRESTO is this:

“All documents, views and metadata at all significant levels of granularity and composition should be available in the best formats practical from their own permanent hierarchical URIs.”

I would see PRESTO as the kind of methodology that a government could adopt as a whole-of-government approach, in particular for public documents and of these in particular for legislation and regulations. The problem is not “what is the optimal format for our documents?” The question is “How can link to the important grains of information in a robust, technology-neutral way that only needs today’s COTS tools?” The format wars, in this area, are asking exactly the wrong question: they focus us on the details of format A rather than format B, when we need to be able to name and link to information regardless of its format: supra-notational data addressing.

PRESTO is a combination of three ideas:

  • Permanent URLs
  • REST
  • Object-oriented

Legal documents such as legislation have three characteristics: they are highly structured, they are highly voluminous, but they have highly varying value. So many documents do benefit from the classic SGML treatment, with semantic Full Monty markup, but many others are accessed so rarely there is little benefit in having high-level markup for them. And in fact many documents may be scanned images with no text at all, and full markup entails re-keying.

So what PRESTO does (and people familiar with SGML PUBLIC identifiers will get the drift, and even more so people familiary with ISO Topic Maps) is to say that there is a real importance in being able to have permanent names even for resource that don’t have really brilliant representation available.

In fact, the legal documents may not exist physically yt all: it may be a base document and an ammendment document. So we want a permanent URL for the idea of that document, and we want our system to deliver the best fit it can when we want to get the representation. And we want to allow multiple formats, because often the best representation may be client-dependent. !

Some people might understand it better if we say that PRESTO is about naming and structuring the configuration items for document sets, and forms a precondition for vendor-neutral implementations, and to support plurality. What PRESTO does is say that when we drill down into a document, we do not want to drill down using media-dependent or presentation-dependent accidents, but according to the editorial/rhetorical (i.e. “semantic”) substance.

So why do I say “How else are you going to do it?

The reason is because if you are wanting to build a large information system for the kinds of documents, and you want to be truly vendor neutral (which is not the same thing as saying that preferences and delivery-capabilities will not still play their part), and you want to encourage incremental, decentralized ad hoc and planned developments in particular mash-ups, then you need Permanent URLs (to prevent link rot), you need REST (for scale etc) and you need object-oriented (in the sense of bundling the methods for an object with the object itself, rather than having separate verb-based web services which implement a functional programming approach: OO here also including introspection so that when you have a resource you can query it to find the various operations available)

What would a concrete example be? Lets say we are a government and we have adopted PRESTO so all our legislatation is online with these kinds of permanent URLs including every numbered thing inside the legislation. Then we want to be able ask “What other laws reference Part 4 of this Act?” In PRESTO, we say “OK, the object here is Part 4, so we want to extend the URL for Part 4 to add a name which means the list of references.” So we would have a URL like http://www.eg.gov/laws/ChildProtectionAct1904/1993/Part4/Referenced so that this gives a new URL, hierarchically based on the object it was dependent on. What we don’t do is http://www.eg.gov/functions/getReferences?to=/laws/ChildProtectionAct1094/1993/Part4 (which is procedural/functional) and not http://www.eg.gov/laws/ChildProtectionAct1904/1993/Part4?query=Referenced (some people would think this is OK, I don’t have a particularly strong view at the moment.)

Now what happens when we try to access this resource, using an HTTP GET for example? Well, that depends entirely on what information that back-end has to go on. It might be an HTTP 404 error. It might be an HTML file with a list of links. It might be an XML file of XPaths. It is up to the client to cope with the data that is sent, not the server to send in a standard, universal format. But if we allow introspection, we can then ask the resource for a list of the resources available (and HTTP content negotiation can be used too, potentially.)

I guess a rule of thumb for a document system that conformed to this PRESTO approach would be that none of the URLs use # (which indicates that you are groping for information inside a system-dependent level of granularity rather than being system-neutral) or ? (which indicates that you are not treating every object you can think about as a resource in its own right that may itself have metadata and children.)

Simon St. Laurent

AddThis Social Bookmark Button

REST offers a great way to build simple applications that Create, Read, Update, and Delete resources. But what if you want to get at part of a resource?

M. David Peterson

AddThis Social Bookmark Button

As per http://hacking.4lessig.org/,


Draft Lessig *HACKFEST EXTRAVAGANZA*

Who: Anyone who believes in the movement to Draft Lessig into Congress and would like to help bring this to fruition.

What: *HACKFEST EXTRAVAGANZA*

Where: irc://irc.freenode.net#draftlessig

When: Friday, Saturday, Sunday, February 22nd, 23rd, and 24th

Why: You decide why you believe it’s important to participate. That’s what living in a Free Culture is all about — or in other words, the freedom to make your own choices — correct?

Agenda

  • Friday, February 22nd, 2008 @ 8 A.M. Mountain, 11 A.M. Eastern, 7 A.M. Pacific, 3 P.M. GMT
  • Planning what needs to be done and then starting in on that plan.

  • Saturday, February 23rd, 2008 @ same as above
  • Evaluating status and adjusting priorities and focus and then continuing forward with the revised plan.

  • Sunday, February 24th, 2008 @ same as above
  • Evaluating status and adjusting priorities and focus and then continuing forward with the revised plan. Evaluating the end result, and determining next steps.

NOTE: While the official start times each day are specified, this is primarily as a marker as to when the planning and evaluation for each new day will take place. The hackin’, as usual, will be going 24 hours a day. :D

M. David Peterson

AddThis Social Bookmark Button

Basement Tapes: Open Source Cinema on blip.tv


Rick Jelliffe

AddThis Social Bookmark Button

The issue of the best rules for naming came up recently. I think the way of the future for real human readability is euphony. We want beautiful Code, so why not beautiful markup? With the current discussions on XML 1.0 version 5 and its naming rules buzzing around, euphony has surely never been a more critical issue.

So what might rules be? “Jelliffe” is an old spelling for jolly. My parent’s traced the familly tree back, they think, to Bosham, England a thousand years ago, which is when King Canute was there.

One of the interesting parts of the tree is how often the same names come up: John Jelliffe, William Jelliffe and so on. Common enough names, but relentless and I have often wondered what the euphonic principles were. I have been looking at various a genealogical sites recently, and I think I have cracked the rules.

We start of by discarding wives names. I am not sure many people picked their spouse because of their first name, as in The Importance of being Earnest (though there is an Earnest.) Here are the rules as I see them:

1) Any name that starts with J sound. But this must be a single syllable name (or single plus a diminuative.) John. Jane. James/Jimmy, Charles, Jessie

2) Any name that has an internal rhyme or sound in the same position. Emma, Thelma, Elton, Benton, Benson, Benjamin, Frederick, Fred, Terence, Preston, Elsie, William, Mildred, Helen, Ella, Illah (my favourite), Lillian, Ralph?, Pauline, Wellford, Edyth, Edwin, Jesse, Ethelyne, Elinor, Elward, Patrick

3) Any name with the same rhythm making dash dot dash dot: Catherine, Martha, Robert, Clara, Hazel, Joseph, Edith, David, Taylor, Martin, Emma, Roger, Myrtle, Walter, Waltby, Ella, Vincent, Arthur, Sarah, Eva, Richmond, Zada, Jesse, Elward, Rufus, Walter, William, Zalmon, Mahala, Thirza, Stebbins, Fannie

4) Any name with a soft sound that matches the ffs and soft J: Catherine, Martha, Howard, Edith, Nathaniel, Smith, Fern, Phoebe, Arthur, Eva, Virginia, Willis, Owen, Fond, William, Fannie

Of the rest, there is mob that seems to have surnames as their first name, in the American fashion (Burr). But these rules catch most of them. Some odd runs in New Jersey: Gustiss, Freylinghysen.

So a name like Estha really fits the euphonic bill: 2, 3 and 4.

And I guess it makes sense of the first name Luzon which crops up: rules 3 and 4.

So for people who are passionate about beautiful documents, I suggest that attribute names and namespace prefixes should be picked to be euphonic with the element names. We needn’t, indeed we mustn’t and shan’t, put up with names picked because of some arbitrary theory of utility or historical accident, someone’s mere whim or assertion. We need the science of euphony to get really beautiful markup and proper human memorability (surely a better goal than just readability?)

Rick Jelliffe

AddThis Social Bookmark Button

Standards Australia did me the honour of inviting me to join the Australian delegation to the DIS 29500 Ballot Resolution Meeting in Geneva, and this time I accepted. I can resist anything except temptation and I hope I can catch up with some old friends during the week and do some productive work.

As part of this, Standards Australia have asked me to put off blogging on the subject of OOXML and the BRM for the duration. So I too have an OOXML Purdah!

Standards Australia has issued two press releases. SA Statement on OOXML BRM gives an overview of the process. Standards Australia’s Delegation to OOXML Meeting is a response to a New Zealand ComputerWorld article concerning me.

(Reporters, please contact Standards Australia not me.)

M. David Peterson

AddThis Social Bookmark Button

Update: Subbu Allamaraju has followed up my post with “Idempotency Explained” which is worth a read. I’m not sure if I agree 100% with his comments due to the fact that — as far as I know — the same request to create/edit/update an entry/attribute on SimpleDB will always yield the same result no matter how many times the request was made. Then again, I could very well be completely off base here. /me is reading through the docs again to ensure I haven’t missed something.

Anyone in the know care to clarify one way or another?

Either way, thanks for the extended overview, Subbu!

[Original Post]
So for various reasons I’ve had the opportunity to get to know a lot of the folks who design, develop, deploy, market, and support the various offerings of Amazon Web Services, and it’s because of this I found it funny to hear people criticize Amazon for “setting back web architecture 10 years” with the release of SimpleDB. For example, Dare Obasanjo provided the following commentary,

I’ve talked about APIs that claim to be RESTful but aren’t in the past but Amazon’s takes the cake when it comes to egregious behavior. Again, from the documentation for the PutAttributes method we learn,

<snip/>

Wow. A GET request with a parameter called Action which modifies data? What is this, 2005? I thought we already went through the realization that GET requests that modify data are bad after the Google Web Accelerator scare of 2005?

I’ll admit that at first I was right in line with Dare’s point, or in other words, WTF?

But as I mentioned, I know a lot of these guys personally, and I can assure you not a single one of them could qualify as anything other than the best and brightest this world has to offer as it relates to the field of computer science. So I’ve always held off from criticizing, assuming that eventually it would all make sense.

Apparently eventually =~ February 19th, 2008,

Rick Jelliffe

AddThis Social Bookmark Button

By default, Schematron uses XPath 1 for setting contexts, testing assertions, and producing dynamic diagnostics. Actually, it is XPath 1 as used and extended in XSLT 1. This has lead many people to think it is just a nicer declarative front-end to XSLT, which indeed it usually has been.

However there have been many requests to allow more powerful languages, and ISO Schematron was designed to allow this. There is an attribute called queryBinding on the top-level schema element, and this lets you declare which query language you are using. The standard even specifies a document called a “Schema Language Binding” and says the information that this must provide. It also reserved several names: “xslt1, xslt2, xpath2, exslt” etc.

So here are the draft text for new annexes I will be submitting to SC34 (and thence to national vote) for augmenting ISO Schematron. EXSLT was a community effort to define some more powerful functions for XSLT1. XPath2 is the updated version of XPath from W3C, very much changed, in particular with a different and large function library; the xpath2 query language binding allows the minimal, untyped, untyped-data profile. XSLT2 is the reworked XSLT1, and the xslt2 query language binding allows the typed data (PSVI) if you want it (Schematron doesn’t provide any mechanism for making sure that is what you are working with) and also user-defined functions in the XSLT2 namespace.

Most interestingly, perhaps, is the STX binding. I am supposed to be contacting the STX editor to see about using this query language binding plus the STX specification as an ISO standard (another part of DSDL.) Actually, STX was voted on for this purpose, but without the query language binding some national bodies decided it couldn’t be classed as a schema language, but it should be an easy fix, since the hard work has been done and the NBs are onside at last.

The thing about STX is that works in streaming fashion. So you can test documents larger than your virtual memory. STX is much less limited than the subset of XPath that XSD uses.

The draft bindings are here (sorry in boring custom XML not typeset to HTML.) Comments are very welcome, and thanks to the schematron-love-in mail-list members for comments and prods. There are a few other issues on the table for a revised Schematron upgrade, but they all can procede independently of these bindings, if time is not my friend.

M. David Peterson

AddThis Social Bookmark Button


Lessig ‘08 - Change Congress.

From Lessig

This site hosts this video to explain the launch of two exploratory projects — first, a Change Congress movement, and second, my own decision whether to run for Congress in the California 12th.

I have decided I want to give as much energy as I can to the Change Congress movement. I will decide in the next week or so whether it makes sense to advance that movement by running for Congress.

Many friends have weighed in on that decision — both strongly in favor and strongly opposed. Many more have joined draftlessig.org and a Facebook group asking me to consider it.

Watch or listen and you will understand some of my reasoning. Feel free to send your thoughts or advice to lessig@lessig08.org (though please excuse any slowness in my response).

More ways to show your support to follow shortly…

David A. Chappell

AddThis Social Bookmark Button

I recently did an interview with Rich Seeley for SearchSOA/TechTarget on the relationship between eXtreme Transaction Processing (XTP) and SOA, CEP, BPEL and BAM. Here’s an excerpt -

David A. Chappell

AddThis Social Bookmark Button

I recently helped Khanderao Kand, one of Oracle’s Fusion Middleware lead architects, co-author an article on the current state of OSGi. Here’s an excerpt -

….The Open Services Gateway Initiative (OSGi) Alliance is working to realize the vision of a “universal middleware” that will address issues such as application packaging, versioning, deployment, publication, and discovery.

In this article we’ll examine the need for the kind of container model provided by the OSGi, outline the capabilities it would provide, and discuss its relationship to complementary technologies such as SOA, SCA, and Spring….[read more]

Dave

M. David Peterson

AddThis Social Bookmark Button

Pat Eyler works about a block and a half from where I live in downtown SLC, UT, and yesterday we met up for lunch. Amongst our far reaching topics of conversation included the proper way to pronounce Rubinius. In case any of you were like me and had no clue how to properly pronounce it, here’s the general idea,

Say “Rubik’s Cube”.

Then replace the “k” in Rubik’s with “n”, the end result sounding like a Reuben sandwich (mmm… my favorite! :D) or Rubin Stoddard (hmmm… not so much my favorite, though I’m not an American Idol fan (that’s cuz’ I’m not a TV fan, not because I despise the show itself) so that could be why.)

So now that I think of it you can probably just skip the Rubik’s cube ‘k -> n’ transfusion and move right on through to Reuben/Rubin, but whatever makes you happy; I would just run with that and call it good. ;-)

OK! So, now that we have the first part figured out… Say “I” (as in you, but in reverse) make that EEE, as in “I am an idEEEot”. See ‘probablycorey’’s comment and my follow-up below for all the gorey details and then “us” (as in “you and I”), putting all of them together to form,

Rubin EEE us

… and that’s it! You now know how to properly pronounce the Rubinius project :-) But no need to thank me. Thank Pat!

Thanks, Pat! :D

Keith Fahlgren

AddThis Social Bookmark Button

As someone who arrived much later to the XML party than most of my peers & mentors, this week’s series of XML @ 10 years posts has been a wonderful history lesson. Today, Norm Walsh posted an even more surprising quote:

I joined O’Reilly on the very first day of an unprecedented two-week period during which the production department, the folks who actually turn finished manuscripts into books, was closed. The department was undergoing a two-week training period during which they would learn SGML and, henceforth, all books would be done in SGML.

Simon St. Laurent

AddThis Social Bookmark Button

In celebration of XML 1.0’s tenth anniversary, I signed back on to XML-DEV to suggest that it’s time to do to XML - just the core of it, please - what XML did to SGML around SGML’s tenth anniversary.

Rick Jelliffe

AddThis Social Bookmark Button

Eve Maler and Jeanne el Andaloussi’s out-of-print book Developing SGML DTDs: from Text to Model to Markup has just been put online I see. (Through the magic of Docbook!)

Even though it looks dated in its SGML examples, it really is about a methodology for analysing and designing schemas (especially for literature, i.e. “documents” rather than “data”) that is just as useful today. We might call SGML XML, and we might use “MIME type” or “data type” instead of “notation”, but the development issues this book addresses never went away. Anyone who wants to be an expert in XML schemas and document analysis needs to be aware of it, IMHO.

A good taster might be Learning to recognize semantic components.

Rick Jelliffe

AddThis Social Bookmark Button

Patrick Durusau, the co-editor of ISO ODF and OASIS ODF, and the head of delegation for the US standards at SC34, and a forceful ODF advocate, has just taken the unusual step of issuing an open letter concerning DIS29500 (OOXML) preparatory to the BRM. Dr Durusau simply does not blog or participate in mass communication if he can help it, he is dedicated to making ODF and the Topic Map standards as good as they can be.

The open letter’s title is “Open XML: a Poster Child for Open Standards Development?”

I urge all NB committee members and BRM delegates to read it (and of my gentle readers): it is a call to reassess what progress has been made and for respectful dialog.

Keith Fahlgren

AddThis Social Bookmark Button

Via Norm, DocBook V5.0 is done.

Woot! Committee Draft! Thanks to all who worked on this.

Formally, too.

M. David Peterson

AddThis Social Bookmark Button

Did I do the math correctly? No. But the genereal idea is in place.

Dear Amazon SQS Developers,

We wanted to let you know about some changes we are making to Amazon SQS, based on customer feedback and watching the way customers are using the service. One thing we’ve heard consistently is that customers want to be able to use SQS along with our other services (e.g. Amazon EC2, Amazon S3), but need SQS to be less expensive for this to be more feasible. We looked at our architecture and feature set, and found a way to make a few, targeted changes, by deprecating a few infrequently used requests, which allow us to operate the service much more efficiently. Simultaneously, we are introducing a new pricing structure that replaces the previous per-messages-sent charge ($0.10/1,000 messages) with a new per-request fee ($0.01/10,000 requests, including all Amazon SQS operations). The net result is that the new pricing will result in significantly lower charges for most developers being billed for SQS.

Yeah, I’d say that’s pretty significant. Nice!

More details @ the AWS/SQS page.

M. David Peterson

AddThis Social Bookmark Button

“I think [forgiveness] may be the greatest virtue on earth, and certainly the most needed. There is so much of meanness and abuse, of intolerance and hatred. There is so great a need for repentance and forgiveness. It is the great principle emphasized in all of scripture, both ancient and modern.

Somehow forgiveness, with love and tolerance, accomplishes miracles that can happen in no other way.”

–Gordon B. Hinckley, “Forgiveness,” Ensign, Nov. 2005, 81

It is time for us to forgive. Please consider just this very act.

M. David Peterson

AddThis Social Bookmark Button

Update: Thorsten has followed-up with some interesting comments regarding the current state of the virtualized industry, the problems he’s seeing a lot of companies facing, and various obstacles they’re running into along the way. Interesting stuff!



[Original Post]
((AOTD == Advice of the Day) == True)


Amazon Web Services Developer Connection : Instances not responding …

A word of advice, easy to give, hard to follow: design your system so you can relaunch any critical instance!

Amazon has thousands of instances available, just waiting for you to hit the launch button. If a current instance smells bad and your own troubleshooting doesn’t resolve it, launch a new one and bring your service up on it. Actually, if it’s critical, you should have two running so you’d be left with one while you replace the failing one.

All this should be motherhood and apple pie on EC2 or any other hosting facility, or also in your own datacenter for that matter. Systems fail.

Thorsten von Eicken, Posted: Feb 2, 2008 10:53 AM PST

BTW: Thorsten is one of the smartest individuals I have ever had the fortune of coming to know. *GREAT* guy, and someone in whom if you need help with Amazon Web Services-related consulting, in particular EC2, I would *HIGHLY* recommend getting in contact with his company, RightScale. Just the right combination of open source, open minds, and openly giving more than he/they receive in return, so I believe it’s certainly both fair and in-line with the ideals of O’Reilly, and therefore this blog to provide promotion.

Kurt Cagle

AddThis Social Bookmark Button

Merger-mania is in full swing of late, which is rather astonishing given the current credit market problems. Oracle finally managed after months of trying to snare business services provider BEA, and Sun’s purchase of mySQL was both hailed as a master-stroke and derided as a last gasp hope by a fading giant to bolster its database claims. Today the announcement hit the fan that Microsoft had made a $45 billion dollar bid for Yahoo, something that’s made Wall Street happy but that has people in Silicon Valley scratching their heads.

I am not a CEO, nor am I a big investor … and I’m generally not a big fan of mergers and acquisitions (M&As), because, especially when the mergers involve two reasonably large, well-established companies, the results in terms of performance seldom justify the costs involved. This is especially true of tech sector companies, where so many of the assets of the companies are not tied up in physical capital but rather in abstract ideas carried around in smart people’s heads.

The overtures that Microsoft has been making to Yahoo have been obvious for some time. The integration of Yahoo and Microsoft’s IM formats, for instance, hinted that the two were playing footsies under the table, and certainly the announcements that Yahoo would be utilizing Microsoft technology (and Microsoft’s subsequent PR blitz to that effect) at least indicated that the relationship was serious. Thus, the signs have been around for a while, and both the tech and Wall Street press have generally been playing the role of matchmakers. Yet there are more than a few signs that this particular marriage, if consumated, may end up in divorce court nonetheless.

Rick Jelliffe

AddThis Social Bookmark Button

Bruce Byfield has a nice article A Field Guide to Free Software Supporters. On his typology I’d be in between 4) Softcore advocate and 5) Mainstream advocate.

What struck me when reading it was whether pretty well the same categories could also describe people’s attibutes to Standards (and Open Standards, Open APIs, Open Systems)? Not a bad fit, with different names sometimes. In that category I guess I would be somewhere between 6) Hardcore (see All Interface Technologies by Market Dominators should be QA-ed, ZRAND Standards!) and 3) the participating idealist (because the standards issues I participate in are the ones involved in my day-to-day jobs in the markup/industrial publishing industry).