Anders Hejlsberg joined Microsoft in 1996, and did his initial work as an architect of Visual J++ and the Windows Foundation Classes (WFC). Hejlsberg is currently a Microsoft Distinguished Engineer and chief architect of the C# language, and he played a key role in the creation of the Microsoft .NET framework. Today, he leads the ongoing development of the C# programming language. John Osborn, executive editor with O'Reilly Media, Inc., responsible for the company's developer books on .NET and other Microsoft initiatives, recently sat down with Hejlsberg for this wide-ranging interview.
Osborn: I thought I'd start off by jumping back to the year 2000, which is the last time we talked. It was interesting [for me to reread] the interview. At the time, we [seem to have been] obsessed with Java, and there were comparisons being made between C# and Java.
Osborn: From the perspective of five years--looking back on that time, and looking at where C# is now--what is your sense of where the language is [today]? Is [C#] a Microsoft-only tool? [Or] is it something that has a broader place in the community?
Hejlsberg: Right. A lot has happened in those few years, and the world looked very different back then … certainly from a competitive standpoint. We didn't have .NET, and Java was a much newer, much more [in-]vogue thing at the time. So obviously there were a lot of comparisons being made. I think in those five years, C# has really grown up. Today we're talking about version 3.0, and we're just about to ship 2.0, and we shipped [versions] 1.0 and 1.1 [some time ago].
[When] we look at tracking studies, .NET appears to be at least neck and neck with Java now in terms of usage. And [there's] lots of C# [usage] there. So [C# is] a grown-up language now, as opposed to the newcomer. Obviously, I'm very happy with the fact that it's taken this position. There are obviously some big differences in the two platforms. I mean, .NET is first and foremost a Windows development platform. And that causes you to have [a] different business strategy. Now, that said, you know that we participated actively in [the] standardization of C# and the core pieces of the .NET framework. There's now actually a version 3.0 standard of C# [Ed.: ECMA-334 and ISO/IEC 23270].
It's a little confusing because what they call 3.0 is what we're calling 2.0, but ...
Osborn: I was going to ask you about that later.
Hejlsberg: That's a bit unfortunate numbering there, something we ought to get fixed somehow. But it's out there, and there are third-party independent implementations of C#. Mono certainly comes to mind. And so [ours is] not just a close-ended proprietary strategy.
I also think that Microsoft in the last five years has gone through a big transformation in terms of transparency and community involvement, openness and so forth. The kinds of dialogues we engage in with customers now are very, very different from what they were five years ago, and night and day from what they were ten years ago. You know, the whole industry, through blogging and open source and what have you, has very much switched around, and sort of the center of gravity lies much more with the individual developer and the individual person than it used to.
Osborn: [When you introduced the C# language in 2000], your number one bullet was that this was the first language that was really component-oriented. Has [your] perspective changed in five years?
Hejlsberg: Well, it's become more than that, for sure. But [C#] is still what you could characterize as a component-oriented language. And that takes us back to properties, methods, events, and the core things that you deal with on a day-to-day basis in your programming. Back then, and even more so now, programming very much is aided by tools, and those tools very much tend to have a programming model where you have some sort of design surface where you put components, and then you modify the components, put code behind them, set their properties.
I always felt that [properties and events and methods are] such important concepts that they deserve first-class treatment in the language, and that's what we did with C#. And quite honestly, I have heard no complaints about the fact that we have properties. It is a complete non-issue, you know what I mean? It's just, "Yep, yep," people just think that's completely natural.
And in a sense, this idea of giving first-class treatment to things that we commonly do is something that is very dear to my heart. In many ways, it's the same [thing] we're doing with LINQ and language-integrated query, right? It's giving first-class treatment to the notion of query in a programming language, because we all like queries. It's a declarative way of expressing something at a higher level than if you write a bunch of
for loops and
if statements and sort of do it the manual way.
Osborn: I want to come back to LINQ, but going back to the relative positioning of languages, again, one of the things that [Microsoft Visual Studio .NET product manager] Tony Goodhew said in that interview was that Microsoft studies were showing that people tended to use two or more languages to do their programming. And there was a sense at the time that [languages were just] syntactic sugar. You choose the language that you're most comfortable with.
Do you think that's changed? We don't say that anymore.
Hejlsberg: Well, we don't, but it's all syntax in the end, right? I mean otherwise, we'd just be handing over an XML document that describes the abstract syntax tree of what you want done, and that could be the syntax too, but it's obviously not usable by programmers. So I think programming languages occupy a special position in people's minds in the sense that just as your spoken language is the way you express yourself, so is [your] programming language; it's how you express yourself.
And the syntax is the manifestation of the programming language, and it actually, in many ways, affects how you think about your program and so forth. So syntax does matter, and it matters a lot, I think.
Osborn: What's special about C# in that regard? Can you characterize it?
Hejlsberg: Well, I think the component-oriented stuff that we talked about is tremendously important. We try to make sure that there are not multiple ways of doing things. We try to always find synergies between syntactic elements--It's hard to describe precisely what I mean by this. But take the language-integrated query [LINQ] stuff [we've introduced at the PDC, for example]. The extensibility model of that is that we turn it all into method calls. When you write a query with a
select clause, we turn that into calls to methods called
Select on the collection you are querying. And we pass the expressions you wrote in the query as lambda expression arguments to those methods.
So queries just turn into method calls that are strung together, but the query syntax makes it easier to read, even though it is simply a syntactic veneer. It immediately translates into method calls, just like the
foreach loop translates into "get a numerator on a
while loop" and so forth. But it helps you think about it at a higher level. Do you know what I'm saying?
Hejlsberg: So in that sense, syntax deeply affects how you think about the problem, even though semantically it has absolutely no bearing on what's going on.
Osborn:Yes. From the perspective of a book publisher, and [by looking at] our own tracking data, we see that C++, oddly enough, is holding its own; it's actually grown a little bit in terms of book sales, whereas VB has declined probably 20 to 25 percent in the last year. C# has been very steady [Ed.: But flat].
Osborn: So clearly, from what we're seeing, there seems to be a migration from VB to C# [Ed.: Or perhaps elsewhere]. But C++ seems to have its niche.
Hejlsberg: Right. VB and C# very much appeal to the same crowd of programmers. C++ does play in the managed space, but C++ at [its] core is really about writing unmanaged code, and a lower level of programming. I know I'm generalizing here, and yes, you can [do] template-based [programming] and [use] STL [Ed.: The standard template library], and I'm not meaning to belittle anything. I'm just saying it's sort of broad position and the broad clientele of C++ tends to write a different type of application than you write with [C# and VB].
Where C# and VB very much appeal to the same segment.
Hejlsberg: So I'm not actually surprised that C++ is--
Osborn: You wouldn't choose C++ to write managed code.
Hejlsberg: Personally, no. I would not choose it to write managed code. But if I had to go write a compiler that wasn't going to be managed code or whatever [, I would.] But I think as a general rule, every year that passes, I think the reasons for writing managed code are stronger and stronger. Simply because the hardware is more capable at this point, and the tradeoff, arguably not very large, but the tradeoff that we make for "let's sacrifice a bit of CPU power and a bit of memory for dramatically increased productivity" is a great deal. I think it is a very worthwhile value proposition. And I think that's only getting more true. Plus, the world of managed code is getting richer every day. It clearly is where all the innovation is happening, and where the vast bulk of enterprise apps are being written today.
Osborn: Maybe we should talk a little bit about version 2.0. Certainly, [C# programmers have been] looking forward to generics for a long time.
Osborn: What's different about the generics in C# as opposed to other languages?
Hejlsberg: I think that begs the question of what's different between C# and Java, obviously.
Hejlsberg: First of all, I'm very pleased that we got generics into 2.0. Generics in many ways is the deep enabler of all the stuff that you're now seeing us do in C# 3.0. It is really profound how generics adds a new dimension of richness to the type system that opens up all sorts of possibilities, like language-integrated query, which we could've never done without generics. So in that sense, it's a deep enabler for interesting stuff. It also is a very pragmatic real-world problem solver.
Hey, more typing is good, because that means you find more errors sooner, and you can do better code generation because you have to have fewer dynamic checks at runtime to verify the solidity of the types in your program.
Now, with respect to Java and C#, syntactically, the two implementations of generics look very similar. They [both] look kind of like C++ templates; you can see the heritage there.
But once you scratch the surface, underneath they're actually very, very different. I think the biggest difference is that in .NET, generics are not just a language feature. They are deeply understood by the CLR [common language runtime] and by the type system of .NET itself. So [generics] have representation at runtime.
Java chose a different strategy for implementing generics, where in a nutshell, they're only a compile time feature. And the [Java] compiler removes all of the genericity from the code, and just emits objects; it effectively substitutes
object for every type parameter. With Java, at runtime, there are no generics. This is interesting in the sense that it allows you to run on an unmodified VM [virtual machine], but it also brings about a host of very surprising limitations and rules that you have to abide by. And it does not give you some of the performance gains that we see from [our own implementation of] generics, because [with Java] at runtime, there are no generics and you have to still do all of the dynamic runtime checks and downcasts and whatever while you take things out of a
But I think the subtler point here is that because there is no runtime representation of generics [in Java], you lose some information going from your compiled code to the code that you run at runtime. So at compile time, you might be operating on a list of customers. If at runtime, someone hands you a list of customers typed as
object, they just give it to you as an object, and [if ] you want to find out what this list is a list of, you can't, because reflection doesn't know about generics because that got erased.
And so you have these strange holes in the system, and in a world where we rely increasingly on dynamic code generation and inspection--of running code and dynamic behaviors and whatever--that is actually, to me, probably the biggest issue that I have with Java's implementation of generics; that is, this lack of true representation of the program being run.
Osborn: So you're saying that the [.NET implementation of] generics allows you to hang onto the--
Hejlsberg: Oh, certainly. If I give you a
List<T> typed as
object, I can ask it, "What are you?" And it will say, "I am a list of customers." It'll say, "I'm a
Customer." Then I can say, "Well, why don't you give me just the
List<T>? And in fact, why don't you bind
Order?" And now I can make myself a
List<Order>, and then I can create instances on that. Anything I can do at compile time, I can do at runtime too, through reflection, and that's tremendously powerful.
Osborn: What about the addition of anonymous methods? I remember when that [feature] was announced, and I was talking to authors, saying you have to add anonymous methods to [your] text, but not really understanding why [I] was asking them to do that. I'm sure there are use cases for anonymous methods that [people reading this want will want to know about. But also, it's interesting to see how anonymous methods, like generics, are an enabler for new features in C# 3.0].
Hejlsberg: Oh, absolutely. And, you know, honestly, first of all, let's give credit where credit is due. I am not inventing anything completely new here. It's all based on this thing called lambda expressions or lambda calculus or whatever, which has existed in the functional programming space for decades. But somehow, that has never really seen the light of day in a mainstream programming language.
And C# is fortunate enough to be among the first to do that. We're very serious about evolving that, and that's what you're seeing in C# 3.0, where we evolved anonymous methods even further into these things we call lambda expressions now, where we have surrounded them with rich type inference, for instance, so you don't have to say a lot of the stuff that you would have to say manually before.
Hejlsberg: In terms of why they're important, let me illustrate it by an example. In C# 3.0 we're introducing this notion of language-integrated query [LINQ]. And really, what we're doing is we're making it possible to build a query language as an API. You know, out of methods called
GroupBy and whatever. You can see that if a collection has a
Where and a
Select and an
OrderBy, and a
GroupBy method, then you can sort of string them together, and you can build a whole query language out of [them].
But if you were to do that in a language that doesn't support anonymous methods or lambda expressions, then if you think about how you would implement a
Where method, well, it wants to take a predicate as an argument, right? A
test to apply to each element, you know what I'm saying? So I want to say
list.Where(blah), and the
blah I want to pass in is a test.
But it's not like a parameter in the normal sense, because I'm not passing in just a
bool argument, because that obviously would [require that the test] get evaluated up front and then passed in. And I don't want to see
false, I want to be passed the test itself. You know what I'm saying?
Osborn: Yes, you want to pass a procedure to be executed?
Hejlsberg: Yes. And really, what I want is, I want a reference to some code that I can execute, right? I want a function reference or a method reference passed to the
Where operator, such that the
Where operator can run this code for each element that it's trying to test, and then, you know, return to me all of the elements for which this test is true. And if you look at how the
Select operator (i.e.
projection) works, it's the same. It's like, give me an element, and give me a function that can make an element from one thing to another. That's a projection.
OrderBy, it's like, give me something that can compare two elements. Again, it's a piece of code. So really, the thing expressively that has been missing in programming languages is the ability to pass code as parameters.
Osborn: That's the significance--
Hejlsberg: And that is in a nutshell what lambda expressions and anonymous methods allow you to do. And by the way, lambda expressions and anonymous methods are really just two words for the same thing. The only thing that differs is, what does the syntax look like? And the lambda expressions are a further evolution of the syntax. But underneath, they do the same thing. They generate methods. You know, they're in-line methods.
Osborn: What else besides generics and anonymous methods should people be paying attention to in 2.0?
Hejlsberg: Nullable types, I would say, is a pretty important advance, too. Because it's one of the steps on the way to bringing parity between the database world and the general-purpose programming world. You know, it's very hard to talk about meaningful mappings between the two worlds when one world, the database world, is entirely based on nullable types, and the other world doesn't have them at all.
Hejlsberg: And of course, you can fake your way out of it--
Osborn: Which you have to do--
Hejlsberg: --in the general programming world, which, you know, people often do by boxing, for example, or by allocating an object in which they store the value, and then using
null if not. And that's effectively how Java does it. But it ends up being very expensive to do it that way. Because in order to represent an integer value that can possibly be null, typically the way it's done in Java is that you use the
Integer wrapping class, and you allocate instances for each
int value, and then you just use
null as the null.
Hejlsberg: But you quadruple the memory consumption for each
int, and you incur an indirection. So there's a lot of cost associated with that, where with nullable types in C#, we effectively give you the ability to make value types null, but we don't do it by allocating objects in the heap. We do it through a generic type that couples a
T and a
bool. It's called
Nullable<t>, and it has two fields inside of it, a
T and a
Nullable<T> is itself a value type. So it actually gets stack allocated or in-line allocated and ... it's much more efficient from a memory standpoint than other solutions that are out there. And over and above that, we have language syntax to support nullability. We have this question mark (
?) type modifier.
int is an integer,
int? is a
nullable integer, and there are implicit conversions from
int to nullable
int, and explicit conversion the other way that will throw a null exception if the thing is ... like all the things that you would naturally expect of a language that deeply supports nullable types as a proper concept. So again, it's this thing about giving first-class treatment to a thing that people use every day.
And again, for me, as a language designer, I look at, "What are people doing out there? And what is it that we need to think about giving first-class treatment to?"
Osborn: So is C# one language to rule them all?
Hejlsberg: [Laughs] No, no. I think not, I don't really think it is. There are lots of things for which other languages are more suited. C#, at its core, is a strongly typed language. And for certain things, you know, dynamically typed languages are more appropriate if you're just going to write a few lines of code, and you don't want to first have to do a bunch of declaring upfront. You just want to sort of try it out.
But within the family of languages that it's in, sure, I aim to take it as far as I can.
Osborn: In relation to 2.0, are the changes that we're seeing in the Microsoft implementation also being proposed as standards?
Hejlsberg: In 2.0?
Osborn: In 2.0.
Hejlsberg: Yes. The standard will be called "third edition", but it's actually the one that we're calling 2.0.
Osborn: So there's no aspect to it that's Microsoft only in terms of the changes.
Hejlsberg: No, every language feature in C# 2.0 has been submitted to ECMA and is in the standardization process. And we expect fairly shortly for the community to vote, and it's really a matter of procedure at this point [Ed.: This has already happened. See "C# and CLI Become More Powerful."]
Osborn: So probably by the end of the year?
This is part one of a two-part interview.
John Osborn is a senior editor with O'Reilly Media, Inc., responsible for Windows and .NET developer books, PDFs and other content.
Return to the ONDotnet.com.
Copyright © 2009 O'Reilly Media, Inc.