Over the last few years, refactoring -- the process of gradually improving a code base by renaming methods and classes, extracting common functionality into new methods and classes, and generally cleaning up the mess inherent in most 1.0 systems -- has gained a lot of adherents. Integrated Development Environments (IDEs) like Eclipse and IDEA can now automatically refactor code.
But what if it's not just your code that needs refactoring? What if the language itself has inconsistencies, inefficiencies, and just plain idiocies that need to be corrected? When you get right down to it, the entirety of Java is really just like any other large code base. It has some brilliant parts, some functional parts, and some parts that make just about everyone scratch their heads and ask, "What the hell were they thinking?"
It's now a little more than 11 years after James Gosling began working on OAK, the language that would eventually become Java, and seven years since Sun posted the first public release of Java. The language, class library, and virtual machine collectively known as "Java" are all showing their age. There are many parts of Java that everyone agrees should be fixed but can't be, for reasons of backwards compatibility. Until now, revisions of Java have attempted to maintain "upwards compatibility;" that is, all earlier code should continue to run unchanged in later versions of Java. This has limited the changes that can be made to Java, and prevented Sun from fixing many obvious problems.
This article imagines a "Java 3" that jettisons the baggage of the last decade, and proposes numerous changes to the core language, virtual machine, and class libraries. The focus here is on those changes that many people (including the Java developers at Sun) would really like to make, but can't -- primarily for reasons of backwards compatibility.
I am specifically not focusing on new features that could be added to Java 2 today, useful as they might be. These can be addressed through the Java Community Process. Instead, I want to look at how we could do the same things Java does today, only better. For instance, while I'd love to see a complex number data type as a standard part of the Java language, this could be added to Java 1.5 without breaking existing code. On the other hand, changing the existing char type to use four bytes rather than two would be radically incompatible with most existing code.
Similarly, I am only looking at changes that will leave Java as the same language we know and love today. I want to talk about refactoring the language, not reinventing it. I am not interested in purely syntactic changes, such as eliminating the semicolons at the ends of lines or making indentation significant. These sorts of changes could readily be implemented as byte code compilers for other languages like Python and F. Indeed, such compilers already exist. The changes I want to address are much more fundamental, and often lay across the boundaries between language, library, and virtual machine. With that in mind, let's look at my top 10 list of possible refactorizations for Java 3. (See Gosling's "Design Principles" slide for a justification for simplicity and lack of redundancy.
This one's a no-brainer. Java 1.4.0 ships with 22 deprecated classes, 8 deprecated interfaces, 50
deprecated fields, and over 300 deprecated methods and constructors. Some, like
Date.parseDate(), are deprecated because there are now equivalent or better methods to do the same thing. Others like
Thread.resume() are deprecated because they were a bad idea in the first place and could be actively dangerous. Whatever the reason a method has been deprecated, the fact is, we're not supposed to be using it.
Sun's official line is, "It is recommended that programs be modified to eliminate the use of deprecated methods and classes, though there are no current plans to remove such methods and classes entirely from the system." It's time to cut the umbilical cord. Ditch them all now. This can only make Java simpler, cleaner, and safer.
One of Java's contributions to code readability has been consistent naming conventions, even though they aren't enforced by the compiler. Class names are nouns that begin with capital letters. Fields, variables, and methods begin with lowercase letters. All use camel case. Named constants are written in all caps with underscores separating the words. I can pick up the code of any experienced Java programmer on the planet and expect that their naming conventions will match mine.
When Java 1.0 was being written, however, not all the programmers had internalized Java's naming
conventions yet. There are numerous minor but annoying inconsistencies throughout the API. For instance, the color constants are
Color.green, etc., instead of
Color.GREEN, etc. Java 1.4 finally added the capitalized versions, but still retains the incorrect lowercase versions, doubling the number of fields in this class. These inconsistencies should be cataloged and corrected.
Another beneficial coding convention Java thrust upon an occasionally resistant world was using full names with no abbreviations. However, some of the most basic Java methods are abbreviated. Why, for instance, do we type
System.gc() instead of
System.collectGarbage()? It's not as if this method is called so frequently that the time saved typing twelve fewer letters is important. Similarly the
InetAddress class should really be named
Along the way, let's move JDBC into the
javax packages. JDBC is important, but it's hardly a core language feature. The only reason it isn't already in
javax is because the
javax naming convention for standard extensions hadn't been invented when JDBC was first added to the JDK back in Java 1.1. Programmers working with JDBC can still use it. The rest of us can safely ignore it.
This will undoubtedly be my most controversial proposal, but bear with me. I am not talking about removing
char, and other types completely. I simply want to make them full objects with classes, methods, inheritance, and so forth. This would make Java's type system much cleaner. We'd no longer need to use type-wrapper classes to add primitives to lists and hash tables. We could write methods that operated on all variables and data. All types would be classes and all classes would be types. Every variable, field, and argument would be an instance of
Object. Java would finally become a pure object-oriented language.
The reason Java used primitive data types in the first place was speed. The claim was that pure object-oriented languages like Smalltalk were too slow for production code. But after seven years of Moore's law, computers are a lot faster and have a lot more memory than they used to. Even more importantly, compiler technology has advanced to the point where it's really not so hard to replace object-based source code with primitive-based byte code where appropriate. Modern Eiffel, C#, and Smalltalk compilers already do this. In essence, a good compiler should be able to figure out when to use
ints and when to use
BigIntegers and transparently swap between the two.
char classes would still have the literal forms they have today. Just as the statement
String s ="Hello" creates a new
String object, so too would
int i = 23 create a new
int object. Similarly, the compiler would recognize all of the customary operators like
*, and map them to the appropriate methods in the classes. This is no more complicated than the compiler's native understanding of the plus sign for string concatenation today. Most existing arithmetic code would work exactly as it works today. The
int/char/double/float/boolean objects would be immutable, so these objects would be thread-safe and could be interned to save memory. The classes would probably be final for reasons of both safety and performance.
I'd also like to consider whether Java's arithmetic rules are correct. The floating point operations are defined by IEEE 754 and, for compatibility with other languages and hardware, it's important to keep that. The integer types offer real room for improvement, however. It is mathematically incorrect for two billion plus two billion to equal -294,967,296, yet it does in Java today.
There should be at least one integer type that is not bounded in size, and perhaps it should be the default type. If so, it could easily subsume the
long types. The
byte type still seems necessary for I/O, and it could also remain for those rare cases like image filters where bitwise manipulation is really necessary; however, using bitwise operators like
& on integers confuses implementation with interface and thus violates a fundamental principle of object orientation. The various bitwise constants, such as
SelectionKey.OP_ACCEPT, used throughout the Java API should be replaced with type-safe
enums and/or getter and setter methods.
The basic story would be that integers are for arithmetic and bytes are for memory manipulation.
Thus, in reverse, we might choose to ban arithmetic operations like addition and subtraction on bytes. Even today, adding two bytes automatically promotes them to
ints because the virtual machine doesn't support these operations on any type narrower than an
There's substantial evidence from other pure OO languages that this scheme can be implemented efficiently. Nonetheless, I anticipate resistance to these ideas from the performance-at-any-cost crowd. Naive implementations will require more memory than existing Java code (which is already not particularly stingy with the megabytes). This is likely to be a special problem in J2ME and smaller environments. J2ME might choose to take a different path than J2SE and J2EE.
J2ME can continue development-based Java 2 with its dichotomy between primitive and object types, its 2+2=-1 arithmetic, and all of the problems that entails. In this environment, the benefits of moving may not outweigh the cost. But Java is no longer a language just for cheap set-top boxes (and really it never was). The needs of the desktop and the server are not the same as the needs of the cell phone and the digital watch. Programmers in each environment need a language tailored for them. One size does not fit all.
char type is primitive or an object, the truth is that Unicode is not a two-byte character set. This was perhaps not so important in the last millennium when Unicode characters outside the basic multilingual plane were just a theoretical possibility. As of version 3.2, however, Unicode has about 30,000 more characters than can be squeezed into two bytes. Four-byte characters include many mathematical and most musical symbols. In the future it's also likely to encompass fictional scripts like Tolkien's Tengwar and dead languages like Linear B. Currently, Java tries to work around the problem by using surrogate pairs, but the acrobatics required to properly handle these is truly ugly, and already causing major problems for systems like XML parsers that need to deal with this ugliness.
Whether Java promotes the
char type to an object or not, it needs to adopt a model in which characters are a full four bytes. If Java does go to fully object-oriented types, it could still use UTF-16 or UTF-8 internally for
strings to save space. Externally, all characters should be created equal. Using one
char to represent most characters but two
chars to represent some is too confusing. You shouldn't have to be a Unicode
expert just to include a little music or math in your strings.
Java was the first major language to integrate multithreading as a fundamental feature rather than a special purpose add-on library. Thus, it's not surprising that its designers made a few mistakes and missteps in this area. All of these need to be fixed:
As Sun's Joshua Bloch wrote, "Thread groups are best viewed as an unsuccessful experiment, and you may simply ignore their existence." (Effective Java, Addison-Wesley, 2001) They don't provide the security they were intended to provide, and the minor functionality they do provide can easily be moved into the
Thread class itself.
resume() methods are all rightly deprecated because they have the potential to leave objects in inconsistent states. They should be removed from the
Thread class completely.
destroy() method isn't implemented. It just clutters the API. Get rid of it.
It's become widely known that the Java memory model is broken with respect to "the semantics of threads, locks, volatile variables, and data races." Indeed, an expert group has been formed within the JCP to fix this, but not a lot has been heard from it since it was constituted a year ago. Without doubt, this is a hard problem; but maybe removing concern for upwards compatibility can help fix it.
The non-atomic nature of
longs is a sop thrown to architectures that can't efficiently do 64-bit operations. That's not nearly as much an issue today as it used to be, however, and few VMs ever took advantage of it. If these types aren't made into objects, then they need to be as atomic as the other single-byte types.
Finally, we should seriously consider the possibility that monitors can be decoupled from objects so that an object can have multiple monitors for different operations. I'm not a thread expert (and I generally embarrass myself whenever I pretend to be one), but I've heard a lot of arguments from both sides on this point, most of which have gone right over my head. If we're redesigning Java threads, maybe we can move this discussion from boozy barroom chats to a serious discussion and figure out if there's some way to reconcile the two sides.
These changes are going to be tricky, and they're going to require changes at all three levels -- the API, the language specification, and the virtual machine. But they are important, if Java is to remain efficient and reliable on the multiprocessor systems of tomorrow.
The Java community is already using XML for latter-day file formats like Servlet config files and Ant build files. XML is clean, easy to edit, easy to parse, and easy to debug. It is rightly the first choice of most programmers when designing new file formats. Of course, XML wasn't invented until a couple of years after Java was released. Thus, Java has a number of non-XML file formats that should be ported to XML. Among others, these include JAR manifest files, properties files, and serialized objects. All of these can and should be replaced with well-formed XML.
Serializing objects with XML is perhaps the most surprising suggestion, since serialized objects are binary data and XML is text; however, most data inside objects are just text and numbers at the lowest level; and all of this is well-supported by XML. The limited true binary data inside Java objects can easily be Base-64 encoded. Perhaps most surprisingly, the resulting format should be both smaller and faster than today's binary serialization. Numerous developers have already invented custom XML formats for object serialization, and pretty much all of them have proved more efficient than Java's binary format. The fact is, contrary to popular belief, binary formats are not necessarily smaller or faster than the corresponding text formats, and serialized Java objects are a
particularly poorly-optimized binary format. Sun has already implemented an XML-based serialization format for JavaBeans in Java 1.4 in the
java.beans.XMLDecoder classes. Now it just needs to go a step further to cover all serializable objects.
Two GUI APIs is one too many. Most Java developers have chosen to standardize their work on Swing. I agree with them. It's time to merge the
JComponent classes, the
JFrame classes, the
JMenu classes, and so forth. In some cases, the classes would come from Swing (
JTable). In others, from the AWT (
Color, etc.) Still others (
JFrame) would be merged, typically pulling in most of the code from Swing but retaining the more obvious AWT name. Overall, this would be a huge simplification for GUI development in Java and noticeably cut down on Java's bulk.
As long as we're at it, it's time to get rid of the legacies of the Java 1.0 event model. There's no reason for every component to have a series of confusing
action(), and similar methods. If they're still being used behind the scenes as part of the infrastructure, at least make them non-public; but I suspect they can be eliminated completely without too much effort.
Java's current collections API is a hodgepodge of different designs implemented at different times. Some classes are thread-safe (
Vector). Some aren't (
HashMap). Some collections return null when a missing element is requested. Others throw an exception. Let's settle on some standard idioms and metaphors, and
design all the classes to fit them, rather the other way around. Probably the easiest way to do this would be to eliminate
Hashtable completely. An
ArrayList can do anything a
Vector can do and a
HashMap can replace a
The original Java developers were Unix programmers, Windows users, and Mac dilettantes. The I/O APIs they invented were more than a little Unix-centric in both obvious and not-so-obvious ways, and really didn't port very well. For instance, initially they assumed that the file system had a single root. This is true on Unix, but false on Windows and the Mac. Both the new and old I/O APIs still assume that the complete contents of a file can be accessed as a stream (true on Windows and Unix but false on the Mac).
Some of the problems, especially with regard to internationalization, were fixed in Java 1.1, with the introduction of the
Writer classes and their subclasses. Java 1.2 fixed some of the more glaring inadequacies in the file system API. Still more were fixed in Java 1.4 with the new I/O APIs.
The job isn't done yet. For instance, even in Java 1.4 there still isn't a reliable means to copy or move a file -- pretty basic operations, I think you'll agree. To date, attempts to design a better file-system API have foundered on the need to be upwards-compatible with the atrocious Java 1.0 I/O classes. The time has come to reconsider everything in
java.io. Some of the more urgently needed changes are:
File class needs to represent a real file on the file system rather than a file name. It should provide full access to the file's metadata, support various naming conventions, allow for operations on the file itself, such as copying, moving, and deleting, and in general, recognize that a file is more than just a bucket of bytes.
PrintStream class is a disaster. It should be removed.
System.err can be
PrintWriters instead. (Sun originally planned to make this change in Java 1.1, but decided it would break too much existing code.)
writeUTF() methods in
DataOutputStream don't actually support UTF-8. What they support is 90% real UTF, 10% meat-byproduct. There's nothing actually wrong with the formats they support, except that this causes problems for inexperienced users who use them to read and write UTF-8, and then wonder why their code breaks when exchanging data with conformant software from other languages. These methods should be renamed
Writer classes desperately need a
getCharacterSet() method that can help determine which characters the writer can safely output.
Encodings should be identified with IANA-registered names like ISO-8859-1 and UTF-16 instead of Java class names like 8859_1 and UTF16.
Buffering I/O is one of the most important performance optimizations a program can make. It should be turned on by default. The base
Writer classes should have their own internal buffers, rather than requiring them to be chained to a
BufferedWriter. Filters can check whether the stream they're chained to is buffered before deciding whether or not to use their own buffers.
No single topic is as confusing to new users as the class path. I get almost daily e-mail from novice readers asking me to explain the "Exception in thread 'main' java.lang.NoClassDefFoundError: HelloWorld" error messages they keep seeing. I've been writing Java for seven years and I'm still occasionally baffled by class loader issues. (Pop quiz: When is class A that implements interface B not an instance of interface B? When A and B are loaded by two different class loaders. I lost half a day to that one just last week, and after I mentioned my problem on a mailing list, one talented programmer friend told me he lost two weeks to the exact same bug.)
I'll freely admit that I don't know how the class loader should be fixed. It's clearly one of the trickier areas of Java. I do know that the current system is far too difficult. There has to be a better way.
This top-ten list is just a beginning. There are lots of other areas where Java could be improved, if we allow ourselves to throw off the straitjacket of upwards compatibility: replacing integer constants with type-safe
enums, removing confusing and unnecessary classes like
Cloneable a true mixin interface or perhaps eliminating it completely, renaming the
Math.log() method the
Math.ln() method, adding support for true design by contract, eliminating checked exceptions (as Bruce Eckel has advocated), limiting objects to a single thread as in Eiffel, and much more.
We can argue about exactly which changes are necessary, and which ones may cause more harm than good. But one thing's for sure: if Java fails to change, if it refuses to correct its well-known problems, there are other languages waiting in the wings written by some very sharp programmers who have learned from Java's mistakes and are eager for the opportunity to replace Java in the same way Java replaced earlier flawed languages. Java must not be forever handicapped by mistakes made seven years ago in Java 1.0. There comes a point where we need to throw off the chains of backwards compatibility and move boldly into the future.
Elliotte Rusty Harold is a noted writer and programmer, both on and off the Internet. His previous books include "Java Network Programming", Third Edition, "XML in a Nutshell", Third Edition, and "Java I/O", all from O'Reilly.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.