Ecma 376 Office Open XML’ DrawingML uses an odd measure called the EMU: short for English Metric Unit. There are 36000 EMUs per cm, 91440 EMUs per inch.

The reason for this may become clearer if I note that, using the Adobe “big point” of 72 points per inch (rather than the old 72.72), there is 1270 emu per point. Err, maybe not…

Still no idea? Well, representing numbers in computers is frought with errors every time you have to have anything that requires fractions, or with multiplication or division by numbers that are not 2^n. That even can includes multiplying by 0.5. Computer scientists spent a lot of their early time investigating various techniques to overcome these problems: in a branch of mathematics (or is it engineering?) called numerical methods.

These errors are small by themselves, but when you have, for example, long sequences of calculations such as graphics object where one segment is positioned using the result of the last segment, the accumulated error can increase. In publishing, misalignent can have a serious effect when there is some kind of multi-color printing: you can get registration errors.

One way to circumvent the problem is to move to integer (whole number) arithmetic: you find some convenient small measure that can be multiplied so that you don’t need to use floating-point numbers. When you do divide, you throw away the remainder, because it is below the precision you are supporting; but because the data frequently is aligned to grid positions (1/2 inch, etc) there will be no loss of precision from data capture (what the user sees) to the internal representation. Now armed with this perspective, lets imagine a set of criteria for a typesetting system or vector graphics system:

* use a small unit to allow implementation in integer arithmetic
* this unit should allow allow exact whole divisions (no remainder) of the common measures of modern English-speaking countries’ typesetting: the cm, the inch, and the point. So a half inch, 10.5 points, or a third of a CM are all exact (within the bounds of the system)
* the unit should be small enough to allow non-”English” measurements with, say, 0.01% precision (or do I mean inaccuracy?): the continental diderot or the Japanese Q system for example

If you take these kinds of criteria, and work through the numbers you get something like EMU. They are used by Ecma 376’s DrawingML for ‘high precision co-ordinates” in certain places. The rest of the time, people can use locale-dependent measures.

So if EMU is a reasonable technical approach, is it a reasonable measure to appear in a standard? To my mind, this falls in exactly the same bucket as SpreadsheetMLs use of numeric indexes, though there are accuracy issues as well as performance issues. I think it comes down to the purpose of the standard: when the purpose of the standard is too allow high-quality typesetting and graphics and to reflect the triggering application, I think the exact numbers such as EMU may win. However, when the purpose is to allow data interchange and human/read and writability, then using SI and locale-dependent measurements will win.

The EMU issue is also a interesting one from a standardization viewpoint: there is a kind of premise that supporting a standard (obviously the specific application-independent alternative is SVG-in-ODF in this case, but this applies to systems supporting Open XML too) involves adding functionality or adjusting superficial details (names of elements and attributes, use of property elements rather than attributes, and so on): this is, I think, the view that underlies Tim Bray’s comment (from memory) “how many ways do we need to say some text is bold or italic”? However, there are other changes that go to implementation: converting to and from SVG (as it is) presumably entails foregoing give up exact import and export of data in the “high precision coordinate” system. The difference would be minimal, a rare pixel here or there, I’d expect.

Like the data indexes, I don’t particuarly know why Open XML couldn’t support both the common notations as well as the optimized one. Best of both worlds. But EMUs are a rational solution to a particular set of design criteria, it seems to me: and the name English Metric Units that has caused alarm seems less alarming when understood as just a descriptive name and not a reference to something external.