Technical Archives

M. David Peterson

AddThis Social Bookmark Button

So as Jeff Barr recently pointed out over on the Amazon Web Services blog,

Amazon Web Services Blog: Redundant Disk Storage Across Multiple EC2

M_david_preparing_for_ec2_persisten
XML Hacker M. David Peterson has put together a really interesting article.

As part of his work at 3rd and Urban, he has implemented redundant, fault-tolerant, read-write disk storage on Amazon EC2 using a number of open source tools and applications including LVM, DRBD, NFS, Heartbeat, and VTUN.

Mark notes that "the primary focus of this paper is to present both a detailed overview
as well as a working code base that will enable you to begin designing,
building, testing, and deploying your EC2-based applications using a
generalized persistent storage foundation, doing so today in both lieu
of and in preparation for release of Amazon Web Services offering in
this same space."

The article provides complete implementation details and links to source code for the scripts that Mark developed.

You can read the article, and you can also follow progress via the discussion group.

– Jeff;

Firstly, and most importantly, as pointed out in the first portion of this article,

Michael C. Daconta

AddThis Social Bookmark Button

The IBM Information Server has a business glossary manager that I am implementing for several clients. Some of those clients have existing data dictionaries and glossaries that will need to be imported into the product. The IBM information server has an XML format to allow you to import/export business glossaries.

There is a lot to talk about in examining this format. There is the good, the bad and the ugly in this format. Before we begin our dissection there are two contextual topics in need of some discussion. First is examining the goals of the format and second is determining whether those goals could have been achieved using existing formats.

At a high-level, the format has three main goals which correspond to its three main elements: represent terms and their definitions (via the term element), categorize terms (via the category element) and add custom attributes to categories or terms (via the attribute element). Except for the metadata extension mechanism (custom attributes), this is a simple way to create and organize a dictionary in XML. When examining the schema or the example of the format it is clear that it is far from a complete standard. For example, the available data types for custom attributes is only String. So, it is clear that this format will evolve. A bigger question is - should it? And should it even have been created in the first place?

There are quite a few formats for capturing glossaries, dictionaries and thesauri in XML. A colleague of mine, Ken Sall, examined this for the government a few years back. The W3C has SKOS, IBM has subject classification in DITA (though DITA is much broader than glossaries), and XML topic maps can also serve this purpose.

So, although we will continue to explore the details of this format and even conversion of some of the others mentioned into this format, what are your thoughts on it?

Until next time, see you in the trenches… - Mike

David A. Chappell

AddThis Social Bookmark Button

I’m presenting a keynote at the next International Association os Software Architects (IASA) IT Architect conferences on May 22 - 23 in New York City.

I was looking through the agenda and I came across this -

Interesting Real-world Architectures and the Handbook of Software Architecture presented by Grady Booch via Second Life

I checked with the conference organizers, and sure enough, Grady is going to be at his home base (which is usually Hawaii), and broadcasting the presentation via 2nd Life, and conference goers will be witnessing his avatar giving the presentation on the big screen.

How cool is that!? I want some!

I assume that means he’ll be also giving it in second life, and others will be able to join him there.

Dave

M. David Peterson

AddThis Social Bookmark Button

SIDENOTE: @amike: The run was good while it lasted, eh? ;-)

SIDENOTE.NEXT: After rereading the title, I’m not even sure it makes any sense. But then again, what’s new? ;-) :D

[Post.Body]
DonXml’s All Things Techie : Mixing Object, Functional and Aspect Oriented Programming

Within a DSL it would be cool if you could map its Nouns to Objects (described via OOP), its Verbs to Functions (described via FP), and its Adjectives and Adverbs to Aspects (via AOP).

I have to do some research, but does this fit within the definition of a composable language? I tried to fine a definition of what a composable language, but didn’t seem to find one.

Oh, the power of XSLT 2.0 (and XPath 2.0), where you can bring together the knowledge-pool and massive underlying code base of OOP, fold in the power of Functional Programming, and weave all of it together with AOP to produce a truly composable language as a result.

For example,

AddThis Social Bookmark Button

The linked data principles formulated by Tim Berners-Lee read quite straight-forward:

1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs. so that they can discover more things.

However, when it comes down to implementing them, you may start wondering how to interpret them. I guess some of them were intentionally phrased rather generic. Astonishingly, RDF is not mentioned explicitly. Though, the third principle talks about ‘useful information’: I think this can be interpreted to be useful for machines in the first place; we’d expect to find something at the other end of the link that is at least GRDDLable (such as microformats, RDFa, etc.). The fourth principle is also known as follow-your-nose.

One of the key ideas in linked data is to interpret the property in an RDF statement as a typed hyperlink. So, when I come accross the RDF statement

<> dc:author <http://sw-app.org/mic.xhtml#i> .

I will assume that the author of this post is indeed http://sw-app.org/mic.xhtml#i. I can learn more about this person when dereferencing this URI, i.e. do an HTTP GET to fetch its content. To learn more about URI design and usage, have a look at Cool URIs for the Semantic Web, a W3C note recently finalised.

The above mainly arose from a recent discussion with Richard Cyganiak regarding our riese system, an RDFised and interlinked version of the Eurostat data.

AddThis Social Bookmark Button

Lately I’m becoming more bullish on RDFa. I loose heart when things don’t materialize on my timeline, however, I was recently reminded of a scene in “Under The Tuscan Sun” wherein the actor describes how they built tracks over the Alps before there was a train in existence that could make the trip.

“They built it because they knew someday the train would come…

That’s similar to what the semantic web community is doing. In my analogy the tracks are the specifications (RDF, OWL, RDFa) and data sources. The train is of course applications that use these specifications and information to make our online lives more convenient. And for me personally, the mountain is grasping how it all works and most important of all, applying it in the real world.

That brings me to what I want to share. If you are particularly interested in RDFa, http://rdfa.info/ is probably the best link to watch. I found a very well written and informative tuturiol on linked data that I highly recommend for anybody gearing up to apply this technology. The recommendation to “not define new vocabularies from scratch” struck me as particularly good advice. If you are wondering what the more common, well used and understood RDF vocabularies are, here are my top 5:

AddThis Social Bookmark Button

One valid answer to the question in the title would be: I’m both into linked-data and RDFa. Hey, but that’s not the answer you are interested in, right? We’ll have a look into both and find a better answer by the end of this post. Oh, right, by the way, let me introduce myself shortly. I’m new to xml.com and I try focusing on Semantic Web stuff.

In the beginning, there was the URI. Kingsley recently wrote about it, coming from the plain old untyped @href hyperlink. Then there was RDF, not so well known, and still often confused with one of its serialisations, namely RDF/XML. But there are other ways to deploy RDF as well. In a couple of weeks, presumably, RDFa will be finalised by W3C. RDFa is all about delivering structured metadata in HTML. Much as microformats, RDFa uses attributes to ‘hide’ - or, more technically: embed - metadata in HTML.

Coming back to URIs: The hyperlinks basically were the success factor of the Web as we know it. Typed or semantic links are expected to be the same for the Semantic Web. TimBL wrote up the so called linked-data principles a bit ago (URI for everything, HTTP URI, RDF properties). An example might help understanding both RDFa and linked-data; compare

this page is under <a href="http://creativecommons.org/licenses/by/2.5/">CC 2.5</a> license
this page is under <a rel="cc:license" href="http://creativecommons.org/licenses/by/2.5/">CC 2.5</a> license
 

The key is the rel="cc:license" bit. This is actually a piece of valid RDFa (telling that this content is under a certain license) and equally is a typed link. It overloads the simple @href hyperlink and let’s an agent (be it a search bot or a syndication site) interpret and follow it properly. I think you get the point, right? To sum up: RDFa is the way doing linked-data. Coming back to the initial question, I guess the main point is that both are manifestations of the real-world Semantic Web emerging these days. While in the last couple of years most of the people involved in Semantic Web stuff maybe thought ontologies and reasoning are the most important issues to deal with, it’s a bit like building a marvellous roof and finding out one day that there are no walls, and not even a foundation to put it onto.

Simon St. Laurent

AddThis Social Bookmark Button

I spend a fair amount of time providing technical support for friends, family, and the occasional local political campaign. Looking back over the past few years, it seems clear that I’m spending a lot less time helping people with Windows (thank you, Macintosh) but a lot more time helping out with various wireless network problems. Most of those problems seem to be caused by dying routers.

Rick Jelliffe

AddThis Social Bookmark Button

Many people don’t find abstention easy. Some don’t have the habit, some don’t see the point, some people are irrepressible, some people are used to having their way, and others think it is an attack on their rights and duties. Having hung around a few different standards bodies, it seems to me that one of the distinctives about ISO/IEC JTC1 is the role that voting abstain plays. Other standards bodies have it, but there seems sometimes a stigma or idea that abstaining from a particular vote represents a failure in expertise: a loss of face and an insult to pride. The worry that you need to be on top of everything, perhaps coupled with the paranoia that people are trying to scam you. But, as Clint Eastwood says, a man’s got to know his limitations.

Lets look review the Fast-Track procedure. The JTC1 Directives (which have sway here) allow National Bodies three kinds of reply on a standard: see s 9.8 (bold added by me; DIS means Draft International Standard, DAM means Draft Ammendment, NB means National Body):

Approval of the technical content of the DIS as presented (editorial or other comments may be appended);

Disapproval of the DIS (or DAM) for technical reasons to be stated, with proposals for change that would make the document acceptable (acceptance of the proposals shall be referred back to the NB concerned for confirmation that the vote can be changed to approve);

Abstention

Note that the only criteria countenanced under these JTC1 rules for approving or disapproving a fast-track standard is because of the technical content: it is or isn’t up to scratch. Editorial issues alone are not enough. However, any significant comments, even editorial ones will trigger a Ballot Resolution Meeting, where these things can get looked at: they don’t disappear into a black hole. Under the JTC1 rules, non-technical and non-editorial issues just don’t seem to be legitimate grounds for acceptance or rejection: the only slot for a National Body wishing to act in good faith to the JTC1 Directives but who have significant non-technical and non-editorial concerns is to abstain.

Now, a National Body that votes disapprove has a duty (JTC Directives s13.7) to participate at a Ballot Resolution Meeting (BRM). A Ballot Resolution Meeting has to be open to representation from all affected interests, convened in a timely manner, keeping in mind the spirit of the fast-track process. (JTC Directives s13.1) “The spirit of the fast-track process” does not seem to be a defined term.

Issues from National Bodies that arise after the deadline for the initial ballot (or after the BRM, or where the BRM did not go far enough in some desired direction or went the wrong way in the NB’s opinion, etc.) get handled by the NB raising defect notices with the Steering Committee looking after the standard (in this case, SC34, after the fast-track gets standardized. As well, NBs (and ECMA or other liaison bodies) can raise an immediate draft amendment, which can itself go through the fast-track procedure! (If an NB thinks the editor’s instructions have not been followed, they can raise the matter with the ITTF (the body responsible to make sure that the BRM’s instructions have been followed) who, as I am sure is expected of them, will respond with a service-oriented attitude of “Whoops! Thanks!”

A Ballot Resolution Meeting for a fast-tracked draft is unusual because what comes out of the meeting is a set of editor’s instructions. I have read some incompetent reporting on other websites that somehow a BRM’s result is an approval or disapproval of the standard in question. Never let the truth get in the way of a good story, I suppose.

My experience of ISO/IEC JTC1 is only through Steering Committees, Working Groups and a certain recent Ballot Resolution Meeting, on and off since the mid-90s. However I have also participated in multiple groups at W3C and observed OASIS and IETF. The thing that is interesting in JTC1 meetings, from what I have seen, is that there is usually a really strong idea that you do not block the minority interests of another national body, just because you have no interest. (I have seen a committee basically fall apart because one NB dominated and tried to block the legitimate and specific interests of another NB: what happens when NBs attempt this kind of selfish trick can be that the parties who were stymied lose faith and simply go to another standards body.)

An effective delegation at a meeting who have niche requirements will take care to remind other NB’s delegations that unless they have technical expertise in that area, they should abstain. Or if the niche requirement may be significant for broader concerns, an effective delegation will try to explain in or outside their meeting what the technical issue is. However, it is part of the gentleman’s agreement that you vote on the issues: a delegation with particular issues shouldn’t have to make a specific request for other NB’s to abstain on issues that they do not have an actual technical opinion on, any good faith delegation will attempt to do that anyway (though sometimes they may get lost amid all the other tasks.)

I have found that in the ISO meetings I have experienced, the contributions of the individual are really important. In SC34 you think of the contribution of James Clark for example. This was a theme of Martin Bryan’s memorable phrase standardization by corporation (e.g. see farewell report as chairman of SC34 WG1.) The system is geared to having deep experts who are highly sceptical, but who very willingly defer to others in areas outside their expertise. In fact, the ISO Directives (part 1 s 1.11.1, a splendid number) define a Working Group as comprising a restricted number of experts who act in a personal capacity and not as the representative of the…organization…by which they have been appointed however the JTC1 Directives nuance this (s2.6.1.2) WG members shall, where possible, make contributions in tune with their respective NB positions (which does not in any way stifle individual contributions, as long as the status is clear.) There is an interesting example in JTC1 Directives Annex J3.1, concerting the development of standards for APIs, which explicitly mentions that multiple kinds of experts are required. I am not saying that generalists or observers are not important in technical meetings, however, the meetings are technical and need technical people: governments wishing to participate more in standards need to be asking themselves what programs they have in place to develop and encourage the necessary range of deep expertise in order to be effective at this level. (And one of the best ways is to start to send experts to meetings, and getting them to review standards of different sorts, and to expose them to standards practices of different organizations to help them to be critical and functional.)

Technical experts are frequently ratbags, a (nowadays quite fond and) useful Australianism.

Macquarie dictionary (1991):
n. colloq. 1. a rascal; rogue. 2. a person of eccentric or nonconforming ideas or behaviour. 3. a person whose preoccupation with a particular theory or belief is seen as obsessive or discreditable: that Marxist ratbag. -ratbaggery, n. -ratbaggy, adj.’

but the ease of abstinence at ISO tames this tendency. I have read more than once that new people coming to the SC34 meetings are surprised at the level of helpfulness and collegiality that usually can be seen (and I think Ken Holman had a lot to do with achieving this tone.)

JTC1 groups try to act by consensus. But consensus is not unanimity, but is defined in part as a general agreement, characterized by the absence of sustained opposition to substantial issues…. To understand the role that abstention plays in ISO, I think you have to see how it dovetails into this definition of consensus: consensus is not an issue of achieving an absolute positive majority of all parties! In fact, JTC1’s view of consensus demands the ready availability of the option to abstain, otherwise NBs and participants will be forced to make decisions they don’t wish to or are not competent to or are not briefed to.

Voting “abstain” on issues at ISO is not a failure. Indeed, sometimes the briefs for delegations have instructions that require them to abstain. But experts who have to abstain can still be critically valuable to the process. Because of this, and because of the mutual spirit of accommodation and collegiality that usually prevails, abstention is easy and a more frequently used option than people used to other standards systems may feel comfortable with initially. But is it not for no reason.

Rick Jelliffe

AddThis Social Bookmark Button

Prof. Rob Cameron of Simon Fraser University has just announced on the XML-DEV mail list his open source Parabix XML parser, which seems to set new benchmarks for parsing speed, using the SIMD instructions of modern processors.

I am particularly interested in this, because a year ago when Cameron released his UTF-8 converter that trialled his approach, u8u16, I said

I would love to see an XML parser that combines Cameron’ SIMD work with the optimizations from IBM’s XML Screamer, which seem to increase the speed of Java processing by two or three fold.

I’ll have a look at this over the next few days, time permitting, in more detail. There are not many areas in text processing where there is new work being done: the 60s and 70s saw most of the basic work and data structures, so I think it may be a quite startling development. Well done, Rob!

Intel has also being doing work in the area of hardware speed-ups to parsing. Anyone else doing research in this area?

Simon St. Laurent

AddThis Social Bookmark Button

REST offers a great way to build simple applications that Create, Read, Update, and Delete resources. But what if you want to get at part of a resource?

M. David Peterson

AddThis Social Bookmark Button

Update: Subbu Allamaraju has followed up my post with “Idempotency Explained” which is worth a read. I’m not sure if I agree 100% with his comments due to the fact that — as far as I know — the same request to create/edit/update an entry/attribute on SimpleDB will always yield the same result no matter how many times the request was made. Then again, I could very well be completely off base here. /me is reading through the docs again to ensure I haven’t missed something.

Anyone in the know care to clarify one way or another?

Either way, thanks for the extended overview, Subbu!

[Original Post]
So for various reasons I’ve had the opportunity to get to know a lot of the folks who design, develop, deploy, market, and support the various offerings of Amazon Web Services, and it’s because of this I found it funny to hear people criticize Amazon for “setting back web architecture 10 years” with the release of SimpleDB. For example, Dare Obasanjo provided the following commentary,

I’ve talked about APIs that claim to be RESTful but aren’t in the past but Amazon’s takes the cake when it comes to egregious behavior. Again, from the documentation for the PutAttributes method we learn,

<snip/>

Wow. A GET request with a parameter called Action which modifies data? What is this, 2005? I thought we already went through the realization that GET requests that modify data are bad after the Google Web Accelerator scare of 2005?

I’ll admit that at first I was right in line with Dare’s point, or in other words, WTF?

But as I mentioned, I know a lot of these guys personally, and I can assure you not a single one of them could qualify as anything other than the best and brightest this world has to offer as it relates to the field of computer science. So I’ve always held off from criticizing, assuming that eventually it would all make sense.

Apparently eventually =~ February 19th, 2008,

Rick Jelliffe

AddThis Social Bookmark Button

By default, Schematron uses XPath 1 for setting contexts, testing assertions, and producing dynamic diagnostics. Actually, it is XPath 1 as used and extended in XSLT 1. This has lead many people to think it is just a nicer declarative front-end to XSLT, which indeed it usually has been.

However there have been many requests to allow more powerful languages, and ISO Schematron was designed to allow this. There is an attribute called queryBinding on the top-level schema element, and this lets you declare which query language you are using. The standard even specifies a document called a “Schema Language Binding” and says the information that this must provide. It also reserved several names: “xslt1, xslt2, xpath2, exslt” etc.

So here are the draft text for new annexes I will be submitting to SC34 (and thence to national vote) for augmenting ISO Schematron. EXSLT was a community effort to define some more powerful functions for XSLT1. XPath2 is the updated version of XPath from W3C, very much changed, in particular with a different and large function library; the xpath2 query language binding allows the minimal, untyped, untyped-data profile. XSLT2 is the reworked XSLT1, and the xslt2 query language binding allows the typed data (PSVI) if you want it (Schematron doesn’t provide any mechanism for making sure that is what you are working with) and also user-defined functions in the XSLT2 namespace.

Most interestingly, perhaps, is the STX binding. I am supposed to be contacting the STX editor to see about using this query language binding plus the STX specification as an ISO standard (another part of DSDL.) Actually, STX was voted on for this purpose, but without the query language binding some national bodies decided it couldn’t be classed as a schema language, but it should be an easy fix, since the hard work has been done and the NBs are onside at last.

The thing about STX is that works in streaming fashion. So you can test documents larger than your virtual memory. STX is much less limited than the subset of XPath that XSD uses.

The draft bindings are here (sorry in boring custom XML not typeset to HTML.) Comments are very welcome, and thanks to the schematron-love-in mail-list members for comments and prods. There are a few other issues on the table for a revised Schematron upgrade, but they all can procede independently of these bindings, if time is not my friend.

M. David Peterson

AddThis Social Bookmark Button

Pat Eyler works about a block and a half from where I live in downtown SLC, UT, and yesterday we met up for lunch. Amongst our far reaching topics of conversation included the proper way to pronounce Rubinius. In case any of you were like me and had no clue how to properly pronounce it, here’s the general idea,

Say “Rubik’s Cube”.

Then replace the “k” in Rubik’s with “n”, the end result sounding like a Reuben sandwich (mmm… my favorite! :D) or Rubin Stoddard (hmmm… not so much my favorite, though I’m not an American Idol fan (that’s cuz’ I’m not a TV fan, not because I despise the show itself) so that could be why.)

So now that I think of it you can probably just skip the Rubik’s cube ‘k -> n’ transfusion and move right on through to Reuben/Rubin, but whatever makes you happy; I would just run with that and call it good. ;-)

OK! So, now that we have the first part figured out… Say “I” (as in you, but in reverse) make that EEE, as in “I am an idEEEot”. See ‘probablycorey’’s comment and my follow-up below for all the gorey details and then “us” (as in “you and I”), putting all of them together to form,

Rubin EEE us

… and that’s it! You now know how to properly pronounce the Rubinius project :-) But no need to thank me. Thank Pat!

Thanks, Pat! :D

M. David Peterson

AddThis Social Bookmark Button

Update: Thorsten has followed-up with some interesting comments regarding the current state of the virtualized industry, the problems he’s seeing a lot of companies facing, and various obstacles they’re running into along the way. Interesting stuff!



[Original Post]
((AOTD == Advice of the Day) == True)


Amazon Web Services Developer Connection : Instances not responding …

A word of advice, easy to give, hard to follow: design your system so you can relaunch any critical instance!

Amazon has thousands of instances available, just waiting for you to hit the launch button. If a current instance smells bad and your own troubleshooting doesn’t resolve it, launch a new one and bring your service up on it. Actually, if it’s critical, you should have two running so you’d be left with one while you replace the failing one.

All this should be motherhood and apple pie on EC2 or any other hosting facility, or also in your own datacenter for that matter. Systems fail.

Thorsten von Eicken, Posted: Feb 2, 2008 10:53 AM PST

BTW: Thorsten is one of the smartest individuals I have ever had the fortune of coming to know. *GREAT* guy, and someone in whom if you need help with Amazon Web Services-related consulting, in particular EC2, I would *HIGHLY* recommend getting in contact with his company, RightScale. Just the right combination of open source, open minds, and openly giving more than he/they receive in return, so I believe it’s certainly both fair and in-line with the ideals of O’Reilly, and therefore this blog to provide promotion.

M. David Peterson

AddThis Social Bookmark Button

As sad, desperate and/or pathetic as it may sound, I often times will find myself rooting around the Mono Project SVN repository looking for buried treasure; One of the intended side effects of open source software is the freedom and encouragement to experiment, so there’s a tendency for those willing to dig to find things that haven’t made it into an official release, but they’re both useful and useable tools, libraries, applications, etc. none-the-less.

Today, apparently, is my lucky day (though I’m surprised I hadn’t noticed this before given Eno did the initial check in 7 months ago),


 Assembly/	 81031	 7 months	 atsushi	 initial checkin.
  Mono.Xml/	 81031	 7 months	 atsushi	 initial checkin.
  Mono.XsltDebugger/	 81155	 6 months	 atsushi	 2007-07-02 Atsushi Enomoto <atsushi@ximian.com> * XsltDebugger.cs XsltDebugg...
  ChangeLog	 81031	 7 months	 atsushi	 initial checkin.
  Makefile	 81031	 7 months	 atsushi	 initial checkin.
  Mono.XsltDebugger.dll.sources	 81031	 7 months	 atsushi	 initial checkin.

M. David Peterson

AddThis Social Bookmark Button

Actually, there are plenty of reasons why F# ((F == Functional) == True)) *ROCKS*. Here’s a few from the previously linked F# site on Microsoft Research,

Combining the efficiency, scripting, strong typing and productivity of ML with the stability, libraries, cross-language working and tools of .NET.

F# is a programming language that provides the much sought-after combination of type safety, performance and scripting, with all the advantages of running on a high-quality, well-supported modern runtime system. F# gives you a combination of

* interactive scripting like Python,

* the foundations for an interactive data visualization environment like MATLAB,

* the strong type inference and safety of ML,

* a cross-compiling compatible core shared with the popular OCaml language,

* a performance profile like that of C#,

* easy access to the entire range of powerful .NET libraries and database tools,

* a foundational simplicity with similar roots to Scheme,

* the option of a top-rate Visual Studio integration,

* the experience of a first-class team of language researchers with a track record of delivering high-quality implementations,

* the speed of native code execution on the concurrent, portable, and distributed .NET Framework.

The only language to provide a combination like this is F# (pronounced FSharp) - a scripted/functional/imperative/object-oriented programming language that is a fantastic basis for many practical scientific, engineering and web-based programming tasks.

F# is a pragmatically-oriented variant of ML that shares a core language with OCaml. F# programs run on top of the .NET Framework. Unlike other scripting languages it executes at or near the speed of C# and C++, making use of the performance that comes through strong typing. Unlike many statically-typed languages it also supports many dynamic language techniques, such as property discovery and reflection where needed. F# includes extensions for working across languages and for object-oriented programming, and it works seamlessly with other .NET programming languages and tools.

For those of you unaware, F# is now a first class MSFT language, or in other words, this is no longer a “Hey, here’s an idea. Let’s research it.”-type project and instead a true-blue MSFT product backed by mean-green MSFT money, led by some of the very best and brightest minds @ MSFT.

If you were to ask me “What’s the future language foundation of the .NET platform?” I would first state “More than likely, XSLT 2.0++.” And then when you stopped laughing and slapped me upside my head to awake me from my dream I’d say, “What the F#!? was that for?” and you’d say “F#??,” followed by “Isn’t that for programming the way God intended for people to program on the .NET platform?”, and then I’d say “Okay, you got me on that.” at which point we’d move on…

So here’s the thing: While there are *TONS* and *TONS* of reasons why F# *ROCKS* (did I mention that F# is distributed as both an MSI and a ZIP, the latter designed to make it easy for folks using Mono to take full advantage of what F# has to offer?), the biggest reason it *ROCKS* is this,

M. David Peterson

AddThis Social Bookmark Button

I don’t agree with everything Sean McGrath writes in his latest post as I think there are a lot of really smart people who have developed some really smart ways to handle the variable width nature of XML w/o turning to malloc() every time the length of an element or attribute name reaches past any given preset constraints. That said, I can’t help but agree with,

Memory-based caches of “cooked” data structures are your friend.

Absolutely!

For you .NET developers here’s a pre-written recipe that handles all of the dirty work of determining whether to create a new XmlReader or return the in-memory cached version based on the generated ETag for the source file (see Extended Overview below for a deeper understanding of how this works.) To use this recipe you need to do nothing more than create a new XmlServiceOperationManager when your application starts up like so,

XmlServiceOperationManager myXmlServiceOperationManager =  new XmlServiceOperationManager(new Dictionary<int, XmlReader>());

and then use the GetXmlReader method of the XmlServiceOperationManager, passing in the Uri (an actual System.Uri object, not the string value of the URI, though I guess it would be easy enough to create an overload that takes the string value of the URI. Another task for another day. ;-)) of the desired XML file to get an XmlReader in return like so,

XmlReader reader = myXmlServiceOperationManager.GetXmlReader(requestUri);

That’s it! Now you can use your “new” XmlReader however you might need and the next time that file is requested for processing if it hasn’t changed you save all of the time it would normally take to read the source file and convert it into an XmlReader which is fairly significant.

Source code and extended explanation inline below. Enjoy!

Oh, and stay tuned for the next installment of this recipe where we learn how adding,


1 Part memcached
1 Part ETag's

and


1 Part GZip encoding

… can turn your lame a$$ performance sucking web application into a lean, mean, kick a$$ performing machine. For a precursor, see Joe Gregorio’s AtomPub presentation slides from this past OSCON. I assure you, it’s worth every second you spend studying this gem of a resource.

David A. Chappell

AddThis Social Bookmark Button

I just published part 2 of an article exploring the “Next Generation Grid Enabled SOA”. This one is sub-titled “Not Your MOM’s Bus“.

Abstract: In our previous article we discussed how SOA grids can be used to break the convention of stateless-only services for scalability and high availability (HA) by allowing stateful conversations to occur across multiple service requests, whether between disparate service boundaries or load-balanced groups of cloned service instances.

In this article we will challenge traditional applications of message-oriented middleware (MOM) for achieving high levels of quality of service (QoS) when sharing data between services in an enterprise service bus (ESB).We will further compare and contrast a state-based, in-memory storage and notification model, and investigate the intelligent co-location of processing logic with or near its grid data in large payload scenarios. Finally, we will also explain when to substitute an SOA Grid for existing MOM technologies as driven by the following question: “If you have an SOA grid that can reliably hold application state data and the necessary systems can access it, why continue to utilize conventional messaging?”

Read More..

Cheers,
Dave

M. David Peterson

AddThis Social Bookmark Button

I’ll keep this short: Decentralized Conversations-as-a-(Web)-Service. Interested? Can you write code?

If += yes here’s the rules,

*ALL* code will be released under a Creative Commons Attribution License. In other words, the only requirement is that whomever uses the code you might write gives you attribution. They can close the source, hide it in a dark corner of the Internet, and in other ways never be required to give back in the same way that you gave. In fact, some people might attempt to steal your code and call it their own.

Are you okay with that?

If += yes,

David A. Chappell

AddThis Social Bookmark Button

- Grid computing will grip the attention of enterprise IT leaders, although given the various concepts of hardware grids, compute grids, and data grids, and different approaches taken by vendors, the definition of grid will be as fuzzy as ESB. This is likely to happen at the end of 2008.

- At least one application in the area of what Gartner calls “eXtreme Transaction Processing” (XTP) will become the poster child for grid computing. (see Gartner Research ID # G00151768 - Massimo Pezzini). This “killer app” for grid computing will most likely be in the financial services industry or the travel industry. Scalable, fault tolerant, grid enabled middle tier caching will be a key component of such applications.

- Event-Driven Architectures (EDA) will finally become a well understood ingredient for achieving realtime insight into business process, business metrics, and business exceptions. New offerings from platform vendors and startups will begin to feverishly compete in this area.

Rick Jelliffe

AddThis Social Bookmark Button

The vogue quip that “a camel is a horse designed by committee” probably makes more sense to people who don’t live in a desert country. From here in Australia, camels seem to a very plausible design. It is the speaker, actually, who is wrong: what you need is a camel when you are in the desrt, a horse on the planes, a yak in the mountains, perhaps a porpoise in the sea, and an elephant in the jungle.

The ongoing XML Schemas trainwreck shows little sign of improvement; that users have so repetitively stated their problem and received no satisfaction from the W3C shows how disenfranchised they are. I am thinking about these things again this week for three reasons.

First, I saw (only 2 years too late) the AT&T-originated guidelines on XML Schemas Best Practices which underly a best checker tool at Java.net. It goes through the capabilities of a particular class of application (while assuming that everyone is interested in the same class of applications grrr “XML” is not just what one set of software uses) and gives a list of what will cause problems or be unportable. Some (like deprecating <appinfo>) are dubious, but most seem well-founded. It is a good document for anyone reading.

The tables in A.2 and A.3 is especially interesting, or horrific in practical terms. None of the software supported derivation of complex types by restriction fully, most not at all. None fully supported ID datatypes. Only one implementation fully supported enumerations. Basically, type derivation of complex types was a complete non-starter.

The other reason I am thinking about it was for work. A customer wants to use MS InfoPath with a schema I have been working on. But, predictably, InfoPath has a range of things it doesn’t support. Many of them (replacing “unbounded” for the cardinality of choice groups with some reasonable number) are trivial, but it is the same issue.

A little over a year ago, Paul Klee had a great summary article on XML.COM XML Schemas Profile. It mentions the 2005 W3C organized W3C Workshop on XML Schema 1.0 User Experiences, and the do-nothing Chair’s report (”No-one wants anything, and if they do they don’t agree, and if they agree it cannot be done, and if it could be done other people don’t want it, and if other people do want it they actually want something else, and if they don’t want something else it would be confusing.”) It looks like very strong leadership for inertia, and it cheeses me off that their laziness affects me and my clients at the end of 2007.

One positive thing that has come out has been the W3C Basic XML Schemas Databinding Patterns which lists various XPaths that databinding tools can have. (It mentions how to use these in Schematron, which is good too!) But it doesn’t come up to the level of a profile. (And, to be fair, the W3C Schema WG has also upgraded XSD to reduce some gotchas that have been reported, such as allowing unbounded on all groups.)

Why not? Because, as far as I can make out, the idea that we will all be better off if we pretend that XML Schemas is a unified and whole specification, one size that can fit all, then somehow it will magically happen. But fantasy is a really poor substitute for reality. Time and time again I have seen clients happy about XML Schemas and its promises, only to have their hopes dashes as they realize that as soon as they need to start deploying they have to use subsets and there is no support from “standards” to help interoperability.

The third thing? DIS29500 gave XML Schemas that worked in MSXML, but failed in Xerces. This was raised as an issue (by Japan among others) and the schema is being reworked to support Xerces. (The issue is to do with circular imports IIRC: I think the new schemas will be in a single file per namespace and that will help the RELAX NG conversion too.) Again, this is an issue we are dealing with in late 2007.

And that is what you get when you have a large standard that is not sufficiently modular and focussed to support its main applications: guaranteed non-interoperability. This lack of modularity has been an issue that has been relentless pointed out to the W3C XML Schema Working Group and just as relentless ignored: and the result is that it is surprising if we find a schema that works out-of-the-box with the particular tools desired for a job.

Why is that we are going into 2008 and we still have exactly the same kinds of problems that were clearly expressed as real problems in the 2005 experience workshop, and which were predicted vociferously before then?

M. David Peterson

AddThis Social Bookmark Button

So in about 6 days all direct addressed EC2 instances will be shutdown. This day comes with *PLENTY* of warning, so decommissioning the 3 direct addressed EC2 instances that we still have running has been planned for a while. Of course, why do something now if you can just as easily put it off until later? ;-)

Okay, so maybe that’s not the best philosophy in life, but when you’ve designed your server infrastructure around worst case scenario disaster recovery, the thought of “losing” an instance or three doesn’t present the type of anxiety you would normally expect, so in the case of EC2, it actually works pretty well.

That said, as per the following screen scrape *even if* we didn’t design our system with a worst case scenario mentality, we’d probably still be okay,

M. David Peterson

AddThis Social Bookmark Button

It seems Safari is the only browser that will leave you left wondering why on earth it — seemingly randomly — refuses to make even the most token attempt at accessing any particular URI via the document function. That’s because each of the other browsers will automatically URL encode GET requests where as Safari will not and as such will throw an internal (I assume internal to the underlying OS?) error. Of course it won’t tell you it through an error which will be the source of significant hair pulling, but none-the-less, an error has been thrown — somewhere. ;-)

I’m not immediately finding anything in the XSLT 1.0 spec that even remotely touches on whether or not it is the job of the transformation engine, the underlying system, or the developer to properly URL encode a request, so I can only assume that regardless of whether or not it’s a pain in the a$$, not URL encoding requests made via the document function is completely within the realm of a standards compliant XSLT processor. Anyone in the know care to clarify?

In the mean time, one of the better resources I’ve found for both quick and easy reference as well as on-the-fly encoding of any given URI is located @ http://www.blooberry.com/indexdot/html/topics/urlencoding.htm. If you find yourself about ready to rip your hair out because Safari refuses to make any attempt at retrieving the document located at any given URI, check the above resource. Chances are pretty good that something as simple as a | character not being properly URL encoded is the culprit.

M. David Peterson

AddThis Social Bookmark Button

RFC 2068: Section 3.2.1: General Syntax

Note: Servers should be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.

Okay, so if I’m making a web service call to a particular URI it’s more than likely going to be inside of my own code base as opposed to inside of someone’s client. And in the cases that it’s not chances are pretty good that this same client doesn’t support the extended functionality of my shiny new asynchronous Web 2.x+ app. So whether or not a client supports URI lengths over 255 bytes is probably less of a concern given that these same clients couldn’t support my application in the first place.

But let’s set aside the most likely client-side scenarios and assume nothing: RFC 2068 is about a week shy of being 11 years old. Is the 255 byte URI length recommendation still applicable? From the client perspective, possibly not. But what about from the proxy perspective? And are there clients (possibly mobile browsers?) that I’m not taking into consideration that still impose a limitation on the URI length?

NOTE: As of October 27th, 2007 the limit inside of Internet Explorer is 2083 bytes. Is 2083 bytes today’s equivalent of the 255 byte recommendation of 11 years ago? (You would have to assume that MSFT didn’t arbitrarily arrive at this figure, basing the limitation on known limitations of the existing infrastructure of the Internet, correct?)

David A. Chappell

AddThis Social Bookmark Button

In recent articles and presentations I have been postulating that a concept called “next generation Grid Enabled SOA”, a.k.a. “SOA Grid” and “Not your MOM’s Bus”, combines conventional SOA infrastructure technologies such as BPEL and ESB with middle tier data grid technology to provide a new level of predictable scalability and high availability for SOA based applications.

I often get asked - “How much better is it? What’s the ROI?”

Keith Fahlgren

AddThis Social Bookmark Button

Here’s my notes from the last day of XML Conference 2007. David has collected some of the blogging about the conference.

Keith Fahlgren

AddThis Social Bookmark Button

This is the continuation of blogging from XML Conference 2007. See yesterday’s post for more. There are, of course, a lot of folks blogging about the conference. Here’s my colleague Andy’s take. Elliotte Rusty Harold is providing some wonderful reading as well (and apparently did a smashing job at the XForms talk last night). For a visual sense of the conference, check out David Megginson’s photos on Flickr.

Keith Fahlgren