advertisement

November 2004 Archives

O´Reilly´s Digital Media Blogs have been expanded and are now located at a new home. To find our new blogs, please visit:
Marc Loy

AddThis Social Bookmark Button

Related link: http://story.news.yahoo.com/news?tmpl=story&cid=599&e=3&u=/nm/20041130/media_nm/…

Well, the next shot has been fired in Yet Another Standards War: HD DVD made a deal with several new studios giving it a bit of a jump over its rival, Blu-ray.

For those of you just catching up on this unfolding drama, both formats use blue lasers to achieve much higher storage capacities on discs the same size as current DVDs (about 50GB in the case of a dual-layer Blu-ray, 30GB in the case of a dual-layer HD DVD). The entertainment industry wants these to start delivering movies in High Definition–something the puny standard DVDs can’t accomodate. (Wow…I know it’s my age, but 5 gigabytes is puny? But let’s not start the I remember when tangent.) Anyway, here’s a good summary article you can check out as well.

While it’s definitely too early to slap the VHS label on one and Betamax on the other, the format war is gearing up to be about the same. (I gather some companies are already looking at making players that work with both discs.) And it is a war. Think about how much money you spend on DVDs. Or if you’re a video nut like me, think about how much money you spent upgrading your video library to DVD. You need to buy a new HDTV, a new player and then replace all of your DVDs with the new hi-def successor. Naturally, you’re more than happy to pay more for the better quality, too. Lots of $$$ at stake, so don’t expect either side to back down.

The easy early winners are the HD-capable software apps. Everyone will be falling over themselves to produce kick-ass demo discs. So fire up Final Cut Pro HD and Motion and prepare for the new world. Of course, you’ll also need to look into some serious new gear to support the bandwidth for hi-def, but hey, I already mentioned the $$$ part, right?

So, any bets on the eventual winner?

Tyler Mitchell

AddThis Social Bookmark Button

Related link: http://tinyurl.com/4n6un

The global web mapping site that is launched from the above URL has several general layers of information. When a newsworthy topic comes up, I like to hunt down how much of it I can see on this map. So, when I heard about the storms hampering U.S. Thanksgiving celebrations, I must admit I got somewhat excited. It was a bit of a test. Can I see the storms or zoom into a city and get an idea of the conditions? Pretty much, was my conclusion. The weather patterns were nice and clear on the map (from my nice warm office in western Canada).

image

The weather-related layers such as radar-based precipitation and storm tracking are usually the most interesting. These are from remote servers and are accessed using Open GIS Consortium standards for web-based mapping services (WMS in particular). The clouds layer is neat too, taken from another source and updated every few hours (thanks to the authors of the upcoming Mapping Hacks book who shared this hack with me).

The software behind the scenes is MapServer (see http://mapserver.gis.umn.edu), coupled with GDAL for robust raster/image data access and GD libraries for some rockin’ fast map image creation.

These products are great, not just because they are free, but because they are powerful. The ability to serve up and access so many different formats of data through numerous types of services allows for some pretty interesting projects.

This software and several related tools and web mapping services are part of a book I am currently writing for O’Reilly - due out Summer 2005.

Do you do mapping? Hope to learn more in the future? What would you put on a web map? Share your thoughts, I’d love to hear them.

Marc Loy

AddThis Social Bookmark Button

It seemed about time that I got myself involved in this wonderful world of blogging. But I really wasn’t sure what I’d have to talk about. Coming from a tech support training background, I decided to do what I always do starting a new project: figure out what questions get asked the most. Since writing DVD Studio Pro 3 In the Studio, that has become an easy task. I get the occasional question about translucent figures for button templates in DVDSP, but 9-to-1 I get asked about the media I use.

I get two variations on the topic: "what’s the best media to use?" and "where do I find the cheapest media?" Of course, everyone really wants to know where the best and cheapest media can be found. That, I’m afraid is a micro-scale holy grail, but these days it’s not as absurd a question as it used to be.

On the quality front, Taiyo-Yuden discs are still winning hands down. You’ll see them mentioned in lots of DVD discussions and they certainly have been the best performing discs I’ve used. They get rebranded by a lot of folks–some of whom will tell you about the rebranding, some of whom won’t. Verbatim 4x -R media was one of the earliest sources of TY media, but nowadays you can purchase TY-branded media direct from a variety of online sources. I’ve personally had good luck with the MAM-A branded media as well. (I got hooked on them for archival CD-R…)

On the price front, that’s something akin to the Floating Market in London Below. I usually pick up big bundles from Meritline or Tape and Media. My experiences there have always been good and I’m somewhat of a loyalist. (Both sites have TY media available.) DiscMakers also has the occasional special that catches my eye. If I’m really in the bargain hunting mode, though, I just Google "dvd-r media" and breeze through the sponsored links to see what’s up for grabs.

If anyone has specific experiences (good–or more importantly–bad), I’d love to hear them. I’m also curious about 8x media experiences.

Rick Jelliffe

AddThis Social Bookmark Button

Mike Champion asked on the XML-DEV mail list this week
To be honest the XBC (XML Binary Characterization) WG has been waiting for some community push-back
and input on both the negatives and positives of binary XML … but so far the negatives
haven’t been coming and that worries me a bit.
.

My guess is that most people think like me: if “Binary XML” is just a form of compression that allows a tightly coupled interface to SAX, then why not? Or if it allows some substantially different characteristic such as random access, then maybe it has a good place as an adjunct. If it does not have enough bang per buck, it will only be a niche thing. Not that there is anything wrong with niches. (And we should expect that companies who don’t do well out of XML will try to spoil it and develop alternatives, while companies doing well from it will try to stifle innovation. That’s show biz!) Does ASN.1 currently hurt XML?

I think that, post XML Schemas, the W3C brand is fairly diminished as far as new specs are concerned. XBC could easily go the way of XPointer, XML 1.1 and XML Fragment Interchange: like a quarrelsome but beautiful neighbour, decorative but to be avoided.

If the XBC discussion takes off, expect all the usual baloney, in particular the extremely fragrant sausage that if we reduce the number of different tags that XML recognizes, it will speed up parsers in some significant way. (Baloney because there are efficient ways of implementing parsers so that tests for rare tags don’t penalize the common cases. For example, an optimized parser could detect that a document has no DOCTYPE declaration, and then switch to an implementation that does not need to do any buffer reallocation to handle entity inclusion. Java even provides jump tables to make simple parsing fast.)

My view from the armchair is that chip manufacturers (Intel, AMD, et al.) need to step up to the plate here: the Unicode character tables and properties, and Unicode transcoders for the most common characters sets, should be hardcoded inside CPU chips. I have seen at least one East Asian CPU with character tables built-in, so it is not a far-fetched idea. People do not say “Maths operations take a lot of CPU power, lets ditch less common math functions”, do they?

Now that XML is ubiquitous and mission critical, of course we should expect all sorts of ingenious ways to speed it up. But the prime area that is being missed, it seems to me, is how to improve XML support inside CPUs.

What kind of form could it take? The simplest form might be to provide an operation that takes an unsigned short (i.e. a UTF-16 character) and returns an int containing bits representing each binary Unicode property and its status as an XML delimiter, just by simple table lookup. (Actually, I would provide two operations: one for UTF-16 BMP which also copes with ASCII and ISO 8859-1 because they are code compatible, and one for UTF-32.) Since XML documents tend to be small, for both SAX processing and XML->DOM processing, I quite expect that not much XML parser machine code would survive in the cache between invocations of the parser or SAX. So providing a built-in table will marginally improve cache behaviour as well as allowing faster parsing without giving up on decent and suspicious parsing: since IO between the CPU and bus is the current bottleneck, this improvement, though certainly limited and sporadic, is in the right kind of area.

If I were Intel or AMD, and looking for a way to add value to my CPUs, I would look into building the Unicode character tables especially to speed up XML processing. Derek Denny-Brown made a good point on XML-DEV: Most of the CPU cost of parsing
is related to the abstract model of XML, not the text parsing: Duplicate
attribute detection, character checking, namespace resolution/checking.
Every binary-xml implementation I have researched which improves CPU
utilization does so by skipping checks such as these. At that point you
are no longer talking about XML.

Of course, Unicode is evolving. But nowadays only on the fringes, and really only outside the Basic Multilingual Plane (BMP: the first 64000 characters). XML delimiter-based parsing is quite cheap (at any one time, there are usually only two significant characters to look for: & or < in data content, “or & (or ‘ or &) in attribute values, ] in CDATA content, whitespace or > in tagnames, whitespace or = in attribute names.)

It is the characters that indicate malformed XML that add checking cost: finding !@#$%^*()_+={}[];;”‘,<?/ or other non-element character in an element or attribute name. XML pairs its Draconian error handling with trivial inspectability of the data: this is congenial for programmers, in comparison to a binary format which may not have enough hints to allow meaningful reconstruction of the file for inspection. (Add comment about babies and bathwater here.)

Perhaps the rise of East Asian economic power also may have some impact here: when most CPUs drove PCs with ASCII documents, there was little reason to think about hardware support for large-character-set property-tables. Now that everyone has converged on Unicode, notably XML, and that China/Korea/Japan/Taiwan are such big players, this might be a useful feature.

Mark Sigal

AddThis Social Bookmark Button

For those of you old enough to remember, a really cool gadget that pre-dated PCs was the View-Master, a kind of handheld projector that could view reels, or pre-packaged slide sets, from different content creators, like movie makers, kid’s programs and science content sources. It required no batteries and had a brain dead interface (just insert a reel, look into the View-Master and point towards a light source). Clicking the one and only button on the View-Master forwarded to the next slide in the reel.

Flash-forward to the present, and a brain-dead simple way that people often share digital photos is via USB pen drives. They fit in the pocket, and for under $30, you can carry around 1,000 average quality images with you. Not bad.

A thought occurred to me the other day; what if you could combine the pen drive with the View-Master? What I am thinking of is a pen drive-sized device that can store photos and other graphic files the same way as any USB based storage device (i.e., stick the drive in a computer’s USB port and drag photos onto the drive). But, similar to the View-Master, this device, which I will euphemistically call the iPocket, would have a lighting source that could “project” images in the drive onto a wall or some other background.

To be clear, unlike the View-Master, which used an analog reel and shined light through it, the iPocket would need to take digital images and project them out of an analog interface. Conceptually similar to digital projectors, which run in the thousands of dollars and have specialized digital signal processors (DSPs) and other sophisticated componentry, but lack local storage and of course consume larger physical footprints, the iPocket would need to leverage ultra cheap components (I personally wouldn’t pay more than $50 or so for such a device) and fit in a pocket-friendly form factor.

Such a device would be brain dead simple to use in that it would only need a forward and backward button, which also automatically turns the device on. Similarly, a more advanced version could feature a “share” option that allows one iPocket owner to share a photo with another iPocket owner simply by clicking share while pointing at the other iPocket. For technical simplicity, version one might be limited to projecting JPEG images, while later versions could support more sophisticated multimedia formats like MPEG movies.

Can it be done? Would you use it?

Robert Kaye

AddThis Social Bookmark Button

In the past few weeks I’ve been working for Derek hacking on various CD Baby projects. While I was coding up the Lucene web service in Java I started reflecting on a few words of wisdom from Andy Hertzfeld at the ETech conference a couple of years ago:

“When Java came out, I was excited — I could write code twice as fast in Java as I could in C/C++. And with Python I can write code twice as fast as I can in Java.”

That was the last straw — I simply had to find out what this Python hype was all about. After Andy’s presentation about Chandler I walked over to the conference bookstore and bought the Python in a Nutshell book to get started. I have to admit, at first I was a bit skeptical about Andy’s claims. Twice as fast as Java is a tall claim — no doubt.

While I was coding up the Lucene web service I was painfully reminded about the speed at which I can code in Java. Java is least practiced language in my toolkit of proficiencies and when I first tried my hand at Java I was impressed that I could much faster than I could in C++. Twice as fast? Perhaps when I got a solid grip on the language — yes.

This time around, hacking on Java was a lot more painful. Having gotten used to Python over the last few months, I found myself cursing Java every time I had to do a format conversion. Why do I have to instantiate a number of objects just to convert from a String to an int? First, I have to convert to an Integer, and then I have to convert to an int — you can’t just cast from an Integer to an int, even though logically the are the same thing. Lame. As I was hacking these little annoyances kept bugging me about Java.

I hated strongly typed languages when I first had to code in Modula-II in college. I really hated it when I had to write a compiler for this asinine language, and I still don’t like them. Recently, I spent a lot of time looking at Javadocs to find the right conversions between types, and I found that a large portion of the base Java API is made up of functions to convert types — the API is literally cluttered with all these functions. Ick!

The latest Java coding bout gave me a really good perspective on the various languages in my toolkit. I’ve sworn off writing C/C++ unless there is very good reason to do it (read: speed). I don’t think I’m going to seek out writing Java code — after writing Python it’s just not fun anymore.

I really appreciate Guido van Rossum’s no-nonsense approach to language design. The language simply does what you’d expect a language to do, and it doesn’t get in the way of letting you get things done. However, one of the features of Python I like the most is the Python way — the suggested guidelines of how to code in Python. Python’s idea of “One obvious way to do it” as opposed to Perl’s
“More than one way to do it” is more condusive to writing clear code that others can read. I appreciate that Perl aims to be a flexible tool that can be bent and twisted into a programmer’s finely honed tool — that is intensely powerful. However, unless you’re coding in a cave by yourself, that is most likely not the best approach to things. In the open source world where many people may look at your code, having the code be clear, readable and expressed in the most obvious fashion is intensely valuable.

The bottom line for me is that Andy was right — I can code twice as fast in Python as I can in Java. And I love that fact — programming is fun again!

How would you compare Java and Python?

Rick Jelliffe

AddThis Social Bookmark Button

Related link: http://www.sellsbrothers.com/conference/

I didn’t go the Sells conference, but apparantly
Daniel Cazzulino gave a good talk
All about Schmematron.
The conference home page has URLs of blogs discussing
the papers, which is an excellent idea.

>Thinking Out Loud has a very interesting dialog on which Enterprise Integration Patterns
message pattern Schematron fits into:
command, event or document.

Good quote at the end: “XML Schema as it exists in .NET today definitely doesn’t cut it. The validation error messages tend to not be useful,
and so we have to do some level of home grown validation. The simplest I could get that was to have boolean xpath assertions
and associated validation error messages, and the implementation started looking a lot like a trivial schematron,
e.g. pattern down, or maybe even just rule down.

If we all start to use standard Schematron,
rather than home-made languages,
we gain explainability, existing implementations,
and (this is a very underrated thing) we get the buffer
zone of
a next layer of features such as patterns, diagnostics, flags and phases which don’t bother us if we don’t use them, but
which mean we are not in a fix when we suddenly need more than the simple assertions.

Dropped Packets
was disappointed that Schematron is not a gigantic robot.

(Whew…nothing more to report about conferences!)

Rick Jelliffe

AddThis Social Bookmark Button

Related link: http://www.open-standards.com/papers

A good little conference, for OASIS/messaging/web services people.
(Conference attendees can view papers with a password mailed to them
here.)
I presented a short introduction to ISO Schematron
and gave a half-day tutorial on XML Schemas.

The standout paper was by Brett Jackson
from Fairfax Digital, detailing how they moved to validated XHTML with CSS.
They deliver 164,000,000 page impressions per month
for the major dailies
The Sydney Morning Herald and The Age. They do day parting: different kinds of stories are
emphasized at different times of day. Later in the day people want gossip,
analysis and entertainment but in the morning they want breaking news and weather.
He reported they save a million dollars a year on bandwidth costs by moving
to validated XHTML with CSS!

Brett said there is a benefit of being first, because you can
suck up the available XHTML/CSS aware human resources and so starve the
competition. But you must have or make a culture where
designers are coders“:
use text editors not QUASIWYG.
Announcing a staged migration away
from supporting generation 4 browsers (but still providing a dull text page for
them) is essential: “It is ridiculous to support
old browsers
“. Validation is a big gain for quality control. But there
still needs a final layer to serve different CSS for different browsers.

The move CSS, plus design changes, let them move from
38 second delivery of their front page over modems to about 3 seconds. Faster pages translates into more pages being
viewed at the same site.

Handsome lawyer Kieren Power gave a good paper Are we free to use open
standards?
which I found infuriating, though I am undoubtedly kicking
against the pricks. His points were that open licenses have not been tested
in case laws, that there is an awful lot of IP around, that providing an
open source license to your IP today may not prevent you from
turning around tomorrow once the technology is being established,
taking it back and demanding license fees. He was trying to
state fairly the status quo, and did mention that despite these dangers
patents had not hindered the progress of open standards, in the sense
that he did not know of a case of a company pushing its technology
as an open standard then, subsequently, turning around and demanding
fees. (GIF was not proposed as an open standard.). He could not
see that there was any reason why not to have software patents
if you have hardware patents.

My response would be a couple of points. First, an international
public policy concern: the grab for IPR by rich countries/organizations just robs
less rich. Second, the criteria for obviousness are way too lax: evidence
of this is that there are hundreds of thousands of patent applications.
A non-obvious idea comes to a few people once or twice in a lifetime;
surely the sheer volume of applications and granted patents shows that IP is being granted to people who just
come to a problem first and come up with the obvious answer.
Why should anyone have any special rights for discovering
something obvious to the next person who comes along?

Also, had a very interesting time with W3C’s accessibility/semantic bod Charles Macathy-Neville,
who was arrested in Italy for carrying a broadsword.
He mentioned RDF is developing better capabilities
to name collections of RDF statemtents, for example
so that you can express statements of trust or
assign an ontology to them.

Rick Jelliffe

AddThis Social Bookmark Button

This O’Reilly sponsored conference looks like becoming
the European equivalent of the Extreme XML conferences.
Many practical speakers, such as Michael Kay and Uche Ogbuji.

They flew me out from Australia as a keynote speaker, which was nice. (Except, of course, I caught a cold and was bedridden for a week after. Murata Makoto spoke, and he caught a cold coming. Long distance air travel is dangerous: is it because of being cooped up, or tired, or just contact with stangers?).

I presented for the first time my Document Complexity Metric. This is a magic number derivable from DTDs or document sets, which provides an objective (though challengeable) measure of how complex a document is, and therefore how much work is likely to be involved in writing a stylesheet for it. I have tested hundreds of real technical documents, most of them between 60K and 1 meg, using all sorts of proprietary DTDs to see whether the metric provides a better indication of complexity than just a raw count of elements. (It does, comparing the metrics with project manager’s recollections of the jobs.)

This metric is also useful for testing how successfuil a DTD-trimming tool is (no surprises, Topologi is releasing such a tool.) During the questions, Sean McGrath mentioned his company’s alternative strategy for handling the problem of using standard DTDs (i.e., you cannot want to program for every element, standard DTDs often have hundreds of elements, real document sets rarely have more than 50 or so.): he applies an 80/20 rule and they just handle the most commonly occuring 80% . (I suppose this leaves the significant elements in the remaining rare elements to be dealt with in a second iteration. Pre-emptive YAGNI.)

Michael Kay questioned whether the metric was reliable or sufficent. I think, of course, there are many other aspects that come into play, but at a minimum an objective number provides more information than just guess work. In particular, it lets you see whether the documents have the same kind of complexity as other you have done: preventing nasty contract disputes when jobs are less or more complex than expected is an important issue.

Henry Thompson gave a funny kind of paper on his experimental URL scheme which attempts to provide proper names for semantic resources. The idea is to provide some keywords and do a Google search on them; if a Google search on some other keywords returns a significant number of the same pages, then you take it that both searches refer to approximately the same thing. So you cobble together ten or so search results into a great long URI, and, uncle:Bob, you get a kind of proper name. My first response is that this is definitely useful for something, but I am not sure whether it is a proper name or not. For example, wouldn’t “Guy Fawkes” and “Gun Powder Plot” be considered the same thing? I gather the idea is the Semantic Web will be a fuzzy web of federated and disparate RDF databases each with different ontologies, not a single grand unified knowledge base of facts.

The standout paper was Jenni Tennison’s paper on
her Datatyping language. We are considering this at ISO for inclusion into ISO Document Schema Description Languages, depending on proof of concept and interest. It solves all the kinds of problems that publishers face that XML Schemas Datatypes sweeps under the carpet: how to have dates in a localized format, how to have multilingual booleans, how to convert between inches and centimeters. In other words, it starts off with the assumption that a document (or database) contains strings expressing their information in the most natural way to the human users, not in a transnational format that requires intervening user interfaces.

It seems that Jenni’s idea can be treated as a layer
underneath XML Schemas datatypes: it might provide a way
to have the power of facets, but not the limitations of
a fixed set of primitive types and fixed lexical spaces.

David Battino

AddThis Social Bookmark Button

Given that most audio signals eventually pass through the hairlike wires on a microchip, I’ve always wondered how much premium cables and other tweaky doodads truly help. Personally, I’m more interested in chasing evocative textures and music than pristine sound quality, which is why I found these gizmos hilarious:

Keep listening,
David

What are the stupidest (or best) audio-enhancement gadgets you’ve found?