advertisement

January 2004 Archives

O´Reilly´s Digital Media Blogs have been expanded and are now located at a new home. To find our new blogs, please visit:
Rick Jelliffe

AddThis Social Bookmark Button

Related link: http://www-106.ibm.com/developerworks/library/j-jtp01274.html

Brian Goetz
>reckons that garbage collection in modern Java Virtual Machines is now so good that we Java programmers do best by leaving it alone. I don’t buy it.*

>Defensive programming is a strategy where you may put in
“extra” code to just in case the world is not perfect.
For example, the defensive programmer puts in assert()s
to check invariants that are inconceivable to fail
and explicit tests for invariants that could conceivably
go wobbly.

Mr. Goetz is down on explicit nulling, as generally pointless.
But after minor collections the new generation
includes objects that aren’t necessarily alive but can’t be reclaimed, either because they are directly alive, or because they are within or referenced from the old generation according to
>Sun.

So when dealing with a large, long-lived, dynamic data structure that is passed around various methods (think of a big DOM or, if it helps, an unwanted haggis brought by a Scottish aunt), you might find that some of its objects are old generation and some are in the new. In that case,
the defensive programmer may
decide to have a cleanup method at the root object of the data structure to traverse the data structure to some depth, to reduce the chances that new generation objects have
references from the old. To give the minor collector
a hand.

The defensive programmer might also decide to explicitly null the references as soon as possible so that the object is not alive if there is a full GC before the end of the method or the object with the reference.

Whether you think the example is a kind of exceptional case
obviously depends on what kind of application you think is
normal. Java has been well designed for servers
and small applications, which fits in well with Sun and IBM’s
web-client/server architecture. Indeed, much advice
on the Web tacetly assumes servers or applets; fair
enough. But those of us making
desktop applications using Java sometimes have
special needs.

We like minor collections and want them to be as effective as possible, and we want to
system.gc() to encourage major collections to
occur at times that the user is at rest:
for example, when opening the first file and
whenever a large file is closed.
(Contrast this with Mr. Goetz’s suggestion of using -XX:+DisableExplicitGC, though he has
a typically good point that libraries should avoid
unconditional system.gc(): leave it
to your GUI code.)

Now, you should at this stage be thinking “Mmmmaybe, but
surely profiling is the better way to go: only fix problems that you can see actually exist, not phantom ones!”
This is where defensive programming comes in:
the defensive programmer thinks “Well, the trouble
with profiling is the trouble with most empirical
>benchmarking
approaches: they may be focussing on behaviour that is
unexpected or wrong rather than behaviour that is suboptimal” because the performance of large, multi-threaded
desktop applications can be quite hard to capture.
(If your problems occur because you are fondling the
nether limits of your memory, your profiler might
be straining also.)

Actually, in many cases you may not be using
a profiler at all, for whatever reason. If you are not
using a profiler, then defensive programming may
reduce the effect of some problems (such as unreleased
listeners) that profiling makes obvious. If you are
using a profiler, defensive programming may be worth
it just to smooth out lumps in performance and
reduce the application developer’s nightmare: when
the user suddenly has a data set and use-pattern that
brings out the worst in your application.

So I suspect that
sometimes explicit nulling of references is defensible,
sometimes cleanups may help the minor collector,
and sometimes the dreaded System.gc()
is appropriate. So what about the other main point in
the article, against object pooling?

This comes from what looks like an
>excellent presentation
at Java One by the wonderfully named
Dr. Cliff Click. Dr. Click’s page 30 says
pooling “loses for small objects”,
“a wash for medium objects (Hashtables)” and
a “win for large objects”.
But in Mr. Goetz’s article this becomes
“object pooling is a performance loss for all but the most heavyweight objects”, which I think is a
little too enthusiastic, certainly for
single-CPU desktop systems.

Brian Goetz has an excellent collection of
other articles at
>http://www.quiotix.com/~brian/pubs.html
that is well worth a look.

In the more tolerant ’70s, one of my dear
old Dad’s business partners used to sport a
badge “How dare you presume I’m a heterosexual!”
May this is a good slogan for Java desktop application
developers to adapt towards people writing technical
material: “How dare you presume
I’m writing a servlet!

* Well, at least not for the kinds of desktop
applications I work on.

What are your favourite techniques for defensive programming?

Bruce A. Epstein

AddThis Social Bookmark Button

Paul O’Neill (the former Treasury Secretary) was recently quoted as calling President Bush “disinterested and unengaged.” In that context, the statement was universally interpreted as meaning that Bush was indifferent rather than unbiased.

Being an editor, I’d prefer the unambiguous use of the word “uninterested” in lieu of “disinterested.”

However, checking Webster’s Collegiate Dictionary, I learned the following:

Defintion 1: unbiased by personal interest or advantage; not influenced by selfish motive.

However…

Definition 2: not interested; indifferent.

Webster’s goes onto say:

“‘Disinterested’ was originally used to mean ‘not interested, indifferent’; ‘uninterested’ in its earliest use meant ‘impartial.’”

Whoa! you learn something new every day.

Webster’s continues:

“By various developmental twists, ‘disinterested’ is now used in both senses; ‘uninterested’ [is used] mainly in the sense of ‘not interested; indifferent.’”

So ‘disinterested’ can be used either way? I guess I was wrong.

But wait a minute. Webster’s goes on to say:

“Many object to the use of ‘disinterested’ to mean ‘not interested; indifferent’ and continue to reserve the word strictly for the sense, ‘impartial:’ A disinterested observer is the best judge of behavior.

Aha! Vindication. I’m so happy being right.

Regardless, I propose a new definition:

disinterested - describing a politician who isn’t influenced by “special interests”

And finally, I’ll coin a new term:

“Special Disinterest” - an appalling lack of concern for the American people whom you allegedly serve.

What do you think?

Rick Jelliffe

AddThis Social Bookmark Button

Objectively, Java is now looking pretty good for use in desktop applications: the speed for tasks that don’t involve much object creation is excellent, Swing is stable though in need of scintillation with some fresh JComponents, SWT is available for people who want something else, Linux’s new threads may improve responsiveness there, JRE 1.4.2 is available pretty much everywhere, and we have a zillion open source libraries—my current favourite is >SwiXML.

But buzzing amongst our sunbeams are a couple of flies:
too-slow loading times,
the pathological space-behaviour of some large text classes, and—the topic of this blog—memory allocation geared to suit servers or applets but rather constipated for desktop applications.

Sun’s JRE for Windows has a single command-line parameter for setting the maximum heap size of Java applications.
Make it too small and you risk perpetual garbage collection, and out-of-memory errors if you open a large file. Make it too large and you risk bumping other applications out to the swapspace (frustrating with slow disks) and, worse,if your memory setting exceeds the physical RAM, you risk the disk thrashing that can occur from the unfortunate interaction of garbage collection and virtual memory.

I don’t know what the best answer is. For desktop applications we want to use as much RAM as the user has available, but still co-exist with other applications. But here is one approach that may be some use for people in the same boat.

Javamem

Javamem is an MVC++ program, a little command-line utility for Windows for estimating the optimal maximum heap size for a Java application. You use the result in the -Xmx parameter for the JVM. Javamem came out of frustration with trying to tailor memory setting with some common Windows installer programs. (Click here for the source code
javamem.zip.)

Basically, Javamem allocates the amount of heapspace you specify, range-checked by a minimum and maximum threshold, but if there is more available memory, not used by any other program, it will allocate that too. So if your user has a large amount of RAM, your application can make more use of it.

There are a couple of whistles: Javamem reserves some memory for the operating system (and your application’s non-heap use) and it can detect if you are running XP and reserve even more; the intent being to allow you to go up to maximum without taking you further. If there is less memory available than the preferred size, it allocates as much as is available.

The command line is

javamem min preferred max? reserved?

where min is the minimum -Xmx in
megabytes,
preferred is the heap size you want,
max is the opportunistic maximum, and
reserved allows your own bias for
how much memory the OS and so on requires.

Of course, virtual memory behaviour is very complex: this is just a quick hack for a particular problem I had. My feel is that the problem is not so important under Linux and Mac OS X, I presume because of their different virtual memory algorithms.

Anyway, enjoy! Here are a couple of example settings.

  • A medium program with small documents which you always want to have a 64 meg heap if it is available, but 96 if possible:
    javamem 32 64 96
  • A large program which you want to allow up to a gig of heap memory, if available, but at least 256 megabytes:
    javamem 64 256 1024.

Note: this is not code I will be supporting or maintaining. Please feel free to use or improve it.

Any better ideas?

Rick Jelliffe

AddThis Social Bookmark Button

Related link: http://www.hackcraft.net/xmlUnicode/

John Hanna’s introduction to Unicode and XML, like most "http://tbray.org/ongoing/When/200x/2003/04/06/Unicode"
>good introductions
skirts the dirty secret:
Unicode—or, more specifically, its encodings
UTF-8 and UTF-16—gives us exciting new opportunites for corrupting data. Text is broken.

At the moment, the web uses an ad hoc mix of
defaults (ASCII for pre-90s standards, ISO 8859-1 for early
90s standards, UTF-8 for recent standards), out-of-data headers (such as MIME headers), voluntary in-data signals (such as HTML’s meta tag), magic numbers (such as XML’s encoding header), browser and server settings, hidden attributes (such as on HTML forms) and guesswork (such as browsers often use). This is rubber-banded together in a nebulous hierarchy that some labels, defaults, etc.
could be trusted in preference to other labels, defaults, etc.

All that is difficult enough, but often we programmers
do not even know which encoding our programs read or write text files as: the default in Java, for example, is to read and write text in the default encoding of the particular platform in its current locale. What is the default encoding
on your current computer? What encoding is used by your DBMS? What encoding does data sent from an HTML form use?

Without Unicode encodings, this adhocery hangs together enough so that people using the dominant platform in the dominant regions usually do not notice much of a problem.

Yet even without the addition of UTF-8 and UTF-16,
many people will have experienced the common problem
where you make a webpage with an em-dash or
"http://www.oreillynet.com/pub/wlg/4139"
>quotes
on your "http://www.hclrss.demon.co.uk/demos/charsetdiffs.html"
>Windows or older Mac
system, only to find the
dash or quotes gone awry when read on a different system.
The cause? Documents not correctly labelled with their encodings coupled with systems that
cannot figure out that the documents are wrongly labelled.

How could it be fixed? One way is for everyone to use UTF-8 and UTF-16: that might be a good goal for 2010.
Why not?
Another way would be for all file systems to allow metadata so that there is an unbroken metadata channel from API to data storage to server to HTTP to web agent: MicroSoft seems to be taking a step forward in this regard with WinFS just as Apple has taken a step back by adopting UNIX files.

But there is another way: follow XML’s emerging example:—

  • Send text with the MIME content type of application/* rather than text/*,
    so that application-specific defaults are not used:
    no mystery;
  • Put the encoding of the file in some header
    at the top of the file:
    explicit labelling
  • Character set transcoding libraries should
    barf loudly when an erroneous code sequence is
    found, and not just swallow the codes or replace them
    with “?”: expose corruption;
  • APIs should take care of this for grassroots
    programmers: don’t burden folk with complications; and
  • With all this potential for data corruption,
    text formats need to make use of code redundancy,
    to catch certain mislabelling problems that cannot
    be detected by a vanilla transcoder:
    critical systems require robustness.

This last point has only recently emerged as
being important, and underlies the recent draft XML 1.1.

There are some critical code points which let us detect when our text file is not in the encoding we thought it was in.
In
engineering jargon, by creating code redundancy we allow error detection.

In particular the C1 block of control characters from
U+0080 to U+009F need to be sacrificed,
disallowed from use by text formats, so that single-byte based encodings which use the bytes 0×80 to 0×9F will be discovered, when mislabelling has occurred. They are critical. So a system that expects ISO 8859-1 will spit the dummy if presented with Cp1252 (”ANSI”) text, for example.

Not everyone likes XML’s approach.
Amelia Lewis recently gave the argument on
"http://lists.xml.org/archives/xml-dev/200401/msg00470.html"
>XML-DEV
that in-band
signals of encoding, such as magic numbers of various kinds, are a hack because they require a pre-parse of the data stream, at least in Java. I expect this will go away,
not only because of hackers getting bored with rolling their own XML parser, but also because the Java "http://www.oreilly.com/catalog/javanio/">NIO architecture allows autodetecting transcoders: with such a transcoder one could open up an XML file with the encoding “XML-autodetect”, for example, and not need to pre-parse data. That being the case,
we need some generalization of the XML header that can be applied to other text format readily: my
>xtext is one idea for that, if anyone wants to get on board.

After five years, XML is still state-of-the-art on this;
other text formats would do well to adopt its approach.
Allowing UTF-8 and UTF-16 in addition to existing
encodings can make a confusing situation worse, unless
we also adopt simple harm-reduction measures like the ones
suggested above.

Can text be fixed?

Robert Kaye

AddThis Social Bookmark Button

Related link: http://conferences.oreillynet.com/etech/

Ahhh. O’Reilly’s Emerging Tech Conference (ETech) is just around the corner. Getting excited about the conference is almost as much fun as getting excited for Christmas when I was a kid.

My talk slides are submitted. My travel is arranged. My accomodations are set. Meetings are getting scheduled.

I’m scanning the conference sessions trying to make up my mind early on what sessions I want to attend. As usual there are too many cool sessions to absorb them all. There is no way to attend all the sessions or to participate in all the lively discussions — there are too many concurrent events happening at the same time.

And the idea of having Participant Sessions where the attendees can suggest/host new evening sessions is quite intriguing. It’s quite clear that I will be utterly sleep deprived, over caffeinated and over stimulated for the whole conference. Ahh — just they way I like it.

ETech is a great place to pour the latest trends and memes into your brain and to network with the movers and shakers. Many important people (think Lessig, Kapor, Bezos…) frequent ETech, so it is easy to make the right connections and to meet people that are normally our of your reach. For me, ETech is the best networking opportunity of the year.

If you’re headed to ETech, check out my talk on Next Generation File Sharing With Social Software. If not, you can catch some of the happenings here — the O’Reilly bloggers also show up en masse at ETech.

Are you going to ETech? What are you excited about?