ONDotNet.com    
 Published on ONDotNet.com (http://www.ondotnet.com/)
 See this if you're having trouble printing code examples


Copying, Cloning, and Marshalling in .NET

by Shawn Van Ness
11/25/2002

I hate to admit it, but as a veteran C++ programmer and COM aficionado, I've spent an embarrassingly large part of the last decade thinking about how the objects in my code will be copied, duplicated, and marshalled from one place to the next. In other words, I spent a lot of time pushing bits around. These days, I'm spending more time with C# and the .NET environment, which offer a wide spectrum of language features and runtime services that make the art of programming vastly simpler, in almost every conceivable fashion -- but still, I find myself wondering about the precise semantics of all the various copying, cloning, and marshalling mechanisms at play in my code.

Even after spending the last few years with the C# language, I recently found it worthwhile to step back and analyze what happens in some very simple scenarios, such as copying a value from one variable to another, or passing those variables as arguments to a method call. And that is the focus of this article. Boring? Hardly -- consider the following questions:

Related Reading

.NET Framework Essentials
By Thuan L. Thai, Hoang Lam

Throughout this article, we'll contemplate zen koans like these, and more. We'll start slowly, with a review of value-types and reference-types in .NET. Then we'll move on to more advanced terrain, deconstructing the System.ICloneable interface, and even scratching the surface of .NET's remoting architecture to explore the concept of marshalling.

Copying, Cloning, and Marshalling: Then and Now

C++ programmers saw the concept of the "well-behaved class" evolve to describe things like copy constructors, assignment operators, virtual destructors, and how these things should be applied to classes. Failure to conform to these guidelines of well-behavedness might produce compile-time errors or (far worse) run-time leaks that are terribly difficult to debug.

Thankfully, .NET languages like C# and VB.NET free us from all of this complexity. Or do they? A quick search of Google.com for "C# and 'copy constructor'" turns up quite a few developers who are a little uncertain! And rightly so -- the .NET runtime environment has its own set of rules and regulations that control an object's copying, cloning, and marshalling behavior. Some of these are outlined in Table 1. When you add the concept of boxing into the mix, .NET has (arguably) the most complex cloning and marshalling semantics of any language or runtime environment ever conceived.

Table 1: The cast of characters

System.ValueType Base class for value-types, which have pass-by-value semantics
System.ICloneable Interface by which objects support creating clones of themselves
Object.MemberwiseClone Protected method that represents the ability of all objects to duplicate themselves
System.SerializableAttribute Attribute by which objects declare their support for serialization
System.Runtime.Serialization.ISerializable Interface by which objects control their own serialization (and marshalling!)
System.MarshalByRefObject Base class for objects that are accessed from remote app domains via proxy

But before we look at all of these mechanisms in action, let us start at the beginning, with a quick review of .NET's distinction between value-types and reference-types.

Value-types vs. reference-types

I still remember being introduced to Java for the first time. I was attending an informal brown-bag presentation, wherein one fellow was trying to convince his coworkers (C++ programmers, the lot of us) that Java was the way of the future. "Just imagine -- no more pointers!" he said. I'd only just seen a glimpse of the language, but it sure looked a lot to me like everything was a pointer. And, sure enough, that turned out to be pretty much the case. Java brought fantastic productivity gains, but at the cost of terrible performance for a lot of applications (mainly due to its excessive use of the heap and incessant dereferencing of pointers).

The architects of .NET attempted to learn from Java's mistakes in this regard by creating a framework-wide distinction between reference-types and value-types. Put simply, value-types are those that derive from System.ValueType (either directly or indirectly) and reference-types are those that do not. In C#, value-types are declared using the struct or enum keywords, and reference-types declared with the class keyword. But neither of those distinctions are very helpful. The real difference in most programmers' minds is that value-types have pass-by-value semantics, and reference-types have pass-by-reference semantics.

The easiest way to see the difference is to write a few lines of code: make two copies of a variable, and try to modify them independently.

Listing 1: Simple copying of value- and reference-types in C#


struct MyStruct { int x; } // value-type!
class MyClass { int x; } // reference-type!

MyStruct s1 = new MyStruct(37);
MyStruct s2 = s1;
s2.x = 73;

MyClass c1 = new MyClass(37);
MyClass c2 = c1;
c2.x = 73;

Console.WriteLine("s1:{0}, s2:{1}, c1:{2}, c2:{3}", 
                  s1, 
                  s2, 
                  c1, 
                  c2);

//output: s1:37, s2:73, c1:73, c2:73

Stepping through this simple code in a debugger, you can see that the value-type variable (MyStruct s1) is copied by-value into s2, while the reference-type variable (MyClass c1) is copied by-reference into c2. So, modifying the value of c2.x also modifies the value of c1.x (because they're really the same value).

But what about boxed value-type objects? Somewhat surprisingly, the topic of boxing does not have any real relevance here -- this is because the very act of boxing a value-type variable involves making a memberwise copy of the variable, from the stack onto the heap (and unboxing, vice versa). So value-type objects are passed by value, even to destinations typed as System.Object.

(For more background on boxing, see the References section for links to Eric Gunnerson's articles on the topic.)

Passing a Variable as a Method Parameter: Value-types vs. Reference-types (Again)

For ordinary method calls (no ref or out parameters, and no marshalling -- all of which will be discussed later), passing a variable as an argument to a method (or property) is logically equivalent to declaring another variable of the same type, and assigning its value to the newly-declared variable.

No surprise, there. For both value- and reference-types, a shallow copy of the variable is made. For value-types, this means a member-wise copy is created. For reference-types, only the reference is copied (resulting in two references to the same object, as we saw earlier).

However, the situation is somewhat altered if the method parameter is decorated with either the ref or out keyword. In those cases, for value-type parameters, a pointer to the object is passed to the method (thus allowing the method body to alter the value of the original object). This technique is known as passing a parameter "by reference." This should be fairly intuitive (at least to former C++ programmers, who will see it's just like passing a pointer-type parameter; or using the [out] attribute in COM). Of course, many programming languages make this distinction in one way or another, not just those whose names begin with the letter "C."

But what does it mean to pass a reference-type "by reference"? Isn't the parameter already being passed by-reference, simply by virtue of not deriving from System.ValueType? Should we perhaps expect a compiler warning, or an error? No -- put simply, the ref keyword means the same thing for reference-types as it does for value-types: a reference to the variable is passed to the method, rather than a shallow copy. For classes (reference-types), this means a reference to a reference. This allows the method to discard and reallocate the caller's variable (or even set it to null). Again, the analogy in the COM and C++ world is passing a pointer to a pointer.

The out keyword has very nearly the same semantics as ref. However, unlike ref, the method implementation is obligated to instantiate and initialize a new variable, for which the caller has a pointer. Effectively, this gives out parameters the same semantics as a property or method's return value.

The Diminished Role of "Copy Constructors" in C#

Now that we've seen how the default variable-copying semantics in .NET work, you're all probably wondering how to override this behavior to create full, rich, deep copies of your objects (rather than squeak by with the dull, shallow copies provided by the runtime).

For example, imagine a class that represents a node in a doubly-linked list. Each node object contains a reference to the previous node, and the next node (or perhaps a null pointer, if the node is the first or last in the list). Clearly, a memberwise copy of any single node would not be desirable! Figure 1 illustrates the tragedy that would ensue, if a shallow copy of the head node were inadvertently made:

Figure 1:  Deep vs. Shallow Copying
Figure 1: Deep vs. Shallow Copying

Back in the days of C++, this is where so-called "copy constructors" and "assignment operator overloading" came into play. Now, it's true that in C# you can define a constructor that looks and feels very much like a C++-style copy constructor -- and several classes in the FCL do this -- but the truth is they probably shouldn't bother, because they can't overload the assignment operator.

Rather, a system-defined interface exists for classes and structs to declare to the outside world that they support "deep copy" semantics. This interface is System.ICloneable, and it has a single method: Clone. It doesn't get any simpler.

Listing 2: Introducing the System.ICloneable interface


namespace System
{
  interface ICloneable
  {
    object Clone();
  }
}

But there are two problems with the ICloneable interface. First, it's weakly-typed -- it's specified to return an object, which could be darn well anything -- the lack of generics (templates) in the current version of .NET makes this an unavoidable necessity. This forces clients of ICloneable to downcast the clone back to the type in question, which can sometimes result in cumbersome and error-prone (or at least ugly) code.

It seems to me that the best analog of a "copy constructor/assignment operator" pair, in C#, would be an implementation of ICloneable that delegates to a public, type-safe alternative Clone method:

Listing 3: A well-designed, cloneable class


class MyCloneableClass : System.ICloneable
{
  // Explicit interface method impl -- available for 
  // clients of ICloneable, but invisible to casual 
  // clients of MyCloneableClass
  object ICloneable.Clone()
  {
  // simply delegate to our type-safe cousin
    return this.Clone(); 
  }

  // Friendly, type-safe clone method
  public virtual MyCloneableClass Clone()
  {
    // Start with a flat, memberwise copy
    MyCloneableClass x = 
            this.MemberwiseClone() as MyCloneableClass;

    // Then deep-copy everything that needs the 
    // special attention
    x.somethingDeep = this.somethingDeep.Clone();

    //...

    return x;
  }
}

ICloneable.Clone vs. Object.MemberwiseClone

Related Reading

Programming C#
By Jesse Liberty

In the previous section, we made use of an interesting member function, present on all .NET objects: MemberwiseClone. This method is a source of great confusion in the developer community. Don't be fooled by its name -- it's certainly not any kind of alternative to ICloneable.Clone, because it's a protected method. Furthermore, it's not even overrideable by derived types, because it's not a virtual method. Its only purpose in life seems to be to assist us in our implementations of Clone methods, by performing the default .NET shallow-copy in just one line of code.

Now, this begs the following question: why is there no corresponding "DeepClone" method on System.Object? Shouldn't it be possible for the framework to provide a method that queries each member for the ICloneable interface, and either calls ICloneable.Clone on that member or performs a bitwise (shallow) copy of the member, as appropriate? This would allow a great many implementations of Clone to be implemented with just one trivial line of code:

Listing 4: Wishful thinking


public virtual MyCloneableClass Clone()
{
  // let .NET do the heavy lifting
  return this.DeepClone(); 
}

The only exceptions, of course, would be types that contain one or more references to objects that neglect to implement ICloneable as expected, or object graphs that contains circular references (like the doubly-linked list example in Figure 1). These objects would have to be copied "by hand," in the current manner.

Anyway, this brings us to the second problem with ICloneable: although it's a well-known interface, defined by the system, the .NET runtime doesn't seem to make use of it (at least not in any context that I've yet encountered). This is in contrast to most of the other system-defined "IXXXXable" interfaces (e.g.: ISerializable, IComparable, IEnumerable, IDisposable, etc.) each of which is either called by the .NET runtime in some situation, or else serves to support some language feature (e.g.: C#'s foreach and using constructs are supported by IEnumerable and IDisposable, respectively).

It seems that ICloneable is purely a convention -- no better or worse than recommending that we all implement a public Clone method to accomplish the same thing. The fact that ICloneable is an interface does, however, make it easy for callers to query an object of unknown origin for its copy-semantics, without resorting to reflection (although in practice, the need to do this does not come up very often).

Listing 5: Do our best to make a copy of object x, deep or shallow


public static object MakeCopyOf( object x)
{
  if (x is ICloneable)
  {
    // Return a deep copy of the object
    return ((ICloneable)x).Clone();
  }
  else if (x is ValueType)
  {
    // Return a shallow copy of the value
    return ((ValueType)x);
  }
  else
  {
    // Without resorting to reflection or serialization, 
    // all we can do is fall back to default copy semantics, 
    // which will return a ref to the same physical object 
    // (not what we want!)
    throw new 
         System.NotSupportedException("object not cloneable");
  }
}

What's So Special About System.String?

An interesting case study in the field of object-cloning is that of System.String. We all know how easy it is give pass-by-reference semantics to a value-type -- just box it, and copy the object around. But how do you give pass-by-value semantics to a reference-type? Surely it must be possible, because System.String gets away with it. Or is System.String special in some way?

To be sure, strings are given a lot of special treatment in .NET. They even have some of their very own Intermediate Language (IL) opcodes. However, there is no magic at play, here -- System.String accomplishes its pass-by-value trick by virtue of being immutable, by design. In other words, strings in .NET are passed by-reference just like instances of any other class. But you can't easily test that hypothesis without modifying the string's contents, and there simply aren't any methods that modify a string without returning a newly-created string instance.

Every method that might appear, at first glance, to modify a string in fact returns a modified copy of the string. Unlike C++ and some other languages, .NET does not offer the concept of "const" methods -- methods that do not modify any of the object's member variables. If it did, however, then every instance method on the System.String class would surely be marked "const".

This clever design is similar to the "pass-by-reference-but-copy-on-write semantics" made popular by the C++ string classes found in STL and MFC. This is a very efficient design, because strings take great resources to copy (it should be done only when necessary).

Note that it is probably not worthwhile to imitate this technique in your own classes. Only strings are used so heavily as to justify the excessive amount of code involved (viz. a whole separate class, System.Text.StringBuilder, is needed to avoid spurious copying in some other common usage scenarios). But it's always good to understand how the magic works.

Now let's leave the topic of object-cloning behind, for a while, and expand our horizons by analyzing what happens when we pass parameters to objects that live in remote app domains!

Marshalling Arguments Across AppDomain Boundaries

The topic of remoting in .NET (whereby objects communicate across AppDomain boundaries) is long and complex -- far beyond the scope of this article. However, a central premise of all remoting architectures is marshalling, and marshalling is very closely related to the subject matter of this article (namely, the passing of objects as arguments to method calls), so it's worth taking a look at how marshalling works in .NET remoting. (See the References section for some interesting links to learn more about remoting in .NET, in general.)

For our purposes, marshalling can be defined as the mechanism by which arguments to/from a method call are transported, across some communication channel, to a remote recipient. Often the process of marshalling involves serialization -- persisting the object's state into a stream, and reconstituting the object "over there." Other times, it involves the creation of a proxy object, which will in turn marshal arguments to subsequent method calls back and forth across the wall.

The former case is known as "marshal-by-value" (or MBV), and it's very much like the pass-by-value semantics exhibited by value-types. The latter case is known as "marshal-by-reference" (or MBR), and, no surprise, it's very much like the pass-by-reference semantics that we've already seen.. But the most important thing to understand about marshalling in .NET is that the default semantics are different: by default, all objects in .NET (both value- and reference-types) are marshalled by value when sent across the "wire" to a remote AppDomain.

But how can a reference-type be passed by-value? Didn't we learn, back in Listing 5, that this was impossible (without resorting to reflection or serialization)? We did. And indeed it's true -- if you attempt to pass an MBV object that is not serializable (marked with the [Serializable] attribute), you will experience a SerializationException at run time. Listing 6 demonstrates this phenomenon.

Listing 6: Playing around with inter-appdomain marshalling


using System;

// not [Serializable]
struct SimpleValueType
{
  public int a;
  public int b;
  public int c;
}

class MainMosdule
{
  static void Main()
  {
    // Create a "remote" appdomain
    AppDomain testDomain = 
                     AppDomain.CreateDomain( "testDomain");        
    MyRemoteableClass remoteObject = 
     (MyRemoteableClass)testDomain.CreateInstanceAndUnwrap(
                                        "test", 
                                        "MyRemoteableClass");

    // Initialize a SimpleValueType
    SimpleValueType x1 = new SimpleValueType();        
    x1.a = x1.b = x1.c = 7;

    // Try to send it across the wire 
    // (will fail unless [Serializable]!)
    remoteObject.DoSomethingWithSimpleValueType(x1);

    // Access the property X remotely        
    remoteObject.X = remoteObject.X+1;

    Console.WriteLine("{0}", remoteObject.X);
  }
}

The resulting exception looks something like this (long, boring stack trace omitted for brevity). But uncomment the [Serializable] attribute, and the program will run successfully.


Unhandled Exception: 
			System.Runtime.Serialization.SerializationException: 
        The type SimpleValueType in Assembly test, 
        Version=0.0.0.0, Culture=neutral, 
        PublicKeyToken=null is not marked as serializable.
 

To override this default MBV behavior, one can simply derive one's class from System.MarshalByRefObject -- and this is exactly what MyRemoteableClass does, as seen in Listing 7. (This is the same class referenced in Listing 6. We were skipping ahead a bit, but without at least one MBR object in the picture, there would be nothing to do the marshalling!)

Note that MarshalByRefObject is a base class, not an attribute, nor even an interface. This has a number of implications, the most obvious of which is that value-types cannot be made to marshal by reference (value-types in .NET cannot explicitly inherit from any base class other than System.ValueType). Perhaps the next most obvious implication is that you can't just revisit any old class to make it MBR -- .NET does not allow multiple inheritance of base classes, so you'll likely have to plan for that capability, from the ground up (or else end up writing a bunch of "MarshalByRefWrapper" classes, which isn't very fun).

Listing 7: A simple MBR-capable class (as seen in Listing 6)


class MyRemoteableClass : System.MarshalByRefObject
{
  private int x;

  public void DoSomethingWithSimpleValueType(SimpleValueType vt)
  { this.x = vt.a + vt.b + vt.c; }

  public int X
  {
    get { return this.x; }
    set { this.x = value; }
  }
}

Why is MarshalByRefObject a base class, rather than an interface or attribute? The easiest explanation is that MBR objects need quite a bit of boilerplate functionality (with regard to activation policies, lifetime services, etc.) that is best encapsulated in a base class, and re-used with implementation-inheritance. No other solution would allow us to create a MBR class with just a single line of code (let alone a single keyword).

More Fun With Marshalling: ref and out Parameters (Again)

Earlier, we saw that how the C# ref and out keywords affected the semantics of parameters passed to method calls. How do these keywords affect parameters marshalled between appdomains? In the intra-AppDomain case, we saw that using either ref or out was tantamount to passing a pointer to the argument (regardless of whether the argument was a reference-type or not).

But one cannot pass a pointer from one process to another -- let alone from one machine to another! Instead, one must think of marshalling as a form of messaging (which it is). Either way, the effect is largely the same. The C# code may look like an ordinary function call, but under the hood, the method's arguments are being packaged into an envelope and mailed away ... The ref keyword is an instruction to wait for a response, because the argument will travel both to and from the remote destination, along the remoting channel. Parameters with the out keyword (as well as return values) are only sent back, from callee to caller, along the channel. (Arguably, the IDL attribute keywords used to describe the same concepts in DCOM and RPC were more intuitive. These are listed in Table 2.) Just like DCOM (and RPC before it) it's the proxy objects on either side of the remoting channel that do all the dirty work.

Table 2: Analagous marshalling keywords in DCOM and .NET

RPC/DCOM keyword C# equivalent remoting keyword
[in] (default)
[in,out] ref
[out] out

Conclusion

Hopefully this article has shed some light on some basic aspects of .NET programming that should be quite straightforward -- moving copies of objects around from one place to another. But of course, nothing in modern computing is straightforward, anymore (especially if it involves remoting in any way). The .NET framework is a vast and complicated space to work in, but at the same time its design is vastly more intuitive for programmers than anything that came before.

We've examined everything from the differences between value-types and reference-types, the semantics of return-values, C#'s out and ref keywords, the ICloneable interface, the MemberwiseClone method, the special semantics of the framework's string class, and even a little bit of remoting and marshalling. Whew! Who knew that pushing bits around would be so hard?

References

"Passing Parameters" C# Programmer's Reference, MSDN Library

"Argument Passing ByVal and ByRef" Visual Basic Language Concepts, MSDN Library

"Nice Box. What's in It?" Eric Gunnerson, MSDN Library.

"Open the Box! Quick!" Eric Gunnerson, MSDN Library.

".NET Remoting: Design and Develop Seamless Distributed Applications for the Common Language Runtime" Dino Esposito, MSDN Magazine.


Return to ONDotnet.com

Copyright © 2009 O'Reilly Media, Inc.