AddThis Social Bookmark Button


Copying, Cloning, and Marshalling in .NET

by Shawn Van Ness

I hate to admit it, but as a veteran C++ programmer and COM aficionado, I've spent an embarrassingly large part of the last decade thinking about how the objects in my code will be copied, duplicated, and marshalled from one place to the next. In other words, I spent a lot of time pushing bits around. These days, I'm spending more time with C# and the .NET environment, which offer a wide spectrum of language features and runtime services that make the art of programming vastly simpler, in almost every conceivable fashion -- but still, I find myself wondering about the precise semantics of all the various copying, cloning, and marshalling mechanisms at play in my code.

Even after spending the last few years with the C# language, I recently found it worthwhile to step back and analyze what happens in some very simple scenarios, such as copying a value from one variable to another, or passing those variables as arguments to a method call. And that is the focus of this article. Boring? Hardly -- consider the following questions:

  • Why do so few system-defined types provide "copy constructors?"
  • What does it mean to pass a reference-type parameter with C#'s ref keyword?
  • Why is System.MarshalByRefObject a base class, rather than an attribute (or interface)?
  • Does a value-type object still have pass-by-value semantics when it's boxed?
  • What's the deal with System.String? It's a reference-type, but it seems to have pass-by-value semantics.

Related Reading

.NET Framework Essentials
By Thuan L. Thai, Hoang Lam

Throughout this article, we'll contemplate zen koans like these, and more. We'll start slowly, with a review of value-types and reference-types in .NET. Then we'll move on to more advanced terrain, deconstructing the System.ICloneable interface, and even scratching the surface of .NET's remoting architecture to explore the concept of marshalling.

Copying, Cloning, and Marshalling: Then and Now

C++ programmers saw the concept of the "well-behaved class" evolve to describe things like copy constructors, assignment operators, virtual destructors, and how these things should be applied to classes. Failure to conform to these guidelines of well-behavedness might produce compile-time errors or (far worse) run-time leaks that are terribly difficult to debug.

Thankfully, .NET languages like C# and VB.NET free us from all of this complexity. Or do they? A quick search of for "C# and 'copy constructor'" turns up quite a few developers who are a little uncertain! And rightly so -- the .NET runtime environment has its own set of rules and regulations that control an object's copying, cloning, and marshalling behavior. Some of these are outlined in Table 1. When you add the concept of boxing into the mix, .NET has (arguably) the most complex cloning and marshalling semantics of any language or runtime environment ever conceived.

Table 1: The cast of characters

System.ValueType Base class for value-types, which have pass-by-value semantics
System.ICloneable Interface by which objects support creating clones of themselves
Object.MemberwiseClone Protected method that represents the ability of all objects to duplicate themselves
System.SerializableAttribute Attribute by which objects declare their support for serialization
System.Runtime.Serialization.ISerializable Interface by which objects control their own serialization (and marshalling!)
System.MarshalByRefObject Base class for objects that are accessed from remote app domains via proxy

But before we look at all of these mechanisms in action, let us start at the beginning, with a quick review of .NET's distinction between value-types and reference-types.

Value-types vs. reference-types

I still remember being introduced to Java for the first time. I was attending an informal brown-bag presentation, wherein one fellow was trying to convince his coworkers (C++ programmers, the lot of us) that Java was the way of the future. "Just imagine -- no more pointers!" he said. I'd only just seen a glimpse of the language, but it sure looked a lot to me like everything was a pointer. And, sure enough, that turned out to be pretty much the case. Java brought fantastic productivity gains, but at the cost of terrible performance for a lot of applications (mainly due to its excessive use of the heap and incessant dereferencing of pointers).

The architects of .NET attempted to learn from Java's mistakes in this regard by creating a framework-wide distinction between reference-types and value-types. Put simply, value-types are those that derive from System.ValueType (either directly or indirectly) and reference-types are those that do not. In C#, value-types are declared using the struct or enum keywords, and reference-types declared with the class keyword. But neither of those distinctions are very helpful. The real difference in most programmers' minds is that value-types have pass-by-value semantics, and reference-types have pass-by-reference semantics.

The easiest way to see the difference is to write a few lines of code: make two copies of a variable, and try to modify them independently.

Listing 1: Simple copying of value- and reference-types in C#

struct MyStruct { int x; } // value-type!
class MyClass { int x; } // reference-type!

MyStruct s1 = new MyStruct(37);
MyStruct s2 = s1;
s2.x = 73;

MyClass c1 = new MyClass(37);
MyClass c2 = c1;
c2.x = 73;

Console.WriteLine("s1:{0}, s2:{1}, c1:{2}, c2:{3}", 

//output: s1:37, s2:73, c1:73, c2:73

Stepping through this simple code in a debugger, you can see that the value-type variable (MyStruct s1) is copied by-value into s2, while the reference-type variable (MyClass c1) is copied by-reference into c2. So, modifying the value of c2.x also modifies the value of c1.x (because they're really the same value).

But what about boxed value-type objects? Somewhat surprisingly, the topic of boxing does not have any real relevance here -- this is because the very act of boxing a value-type variable involves making a memberwise copy of the variable, from the stack onto the heap (and unboxing, vice versa). So value-type objects are passed by value, even to destinations typed as System.Object.

(For more background on boxing, see the References section for links to Eric Gunnerson's articles on the topic.)

Passing a Variable as a Method Parameter: Value-types vs. Reference-types (Again)

For ordinary method calls (no ref or out parameters, and no marshalling -- all of which will be discussed later), passing a variable as an argument to a method (or property) is logically equivalent to declaring another variable of the same type, and assigning its value to the newly-declared variable.

No surprise, there. For both value- and reference-types, a shallow copy of the variable is made. For value-types, this means a member-wise copy is created. For reference-types, only the reference is copied (resulting in two references to the same object, as we saw earlier).

However, the situation is somewhat altered if the method parameter is decorated with either the ref or out keyword. In those cases, for value-type parameters, a pointer to the object is passed to the method (thus allowing the method body to alter the value of the original object). This technique is known as passing a parameter "by reference." This should be fairly intuitive (at least to former C++ programmers, who will see it's just like passing a pointer-type parameter; or using the [out] attribute in COM). Of course, many programming languages make this distinction in one way or another, not just those whose names begin with the letter "C."

But what does it mean to pass a reference-type "by reference"? Isn't the parameter already being passed by-reference, simply by virtue of not deriving from System.ValueType? Should we perhaps expect a compiler warning, or an error? No -- put simply, the ref keyword means the same thing for reference-types as it does for value-types: a reference to the variable is passed to the method, rather than a shallow copy. For classes (reference-types), this means a reference to a reference. This allows the method to discard and reallocate the caller's variable (or even set it to null). Again, the analogy in the COM and C++ world is passing a pointer to a pointer.

The out keyword has very nearly the same semantics as ref. However, unlike ref, the method implementation is obligated to instantiate and initialize a new variable, for which the caller has a pointer. Effectively, this gives out parameters the same semantics as a property or method's return value.

The Diminished Role of "Copy Constructors" in C#

Now that we've seen how the default variable-copying semantics in .NET work, you're all probably wondering how to override this behavior to create full, rich, deep copies of your objects (rather than squeak by with the dull, shallow copies provided by the runtime).

For example, imagine a class that represents a node in a doubly-linked list. Each node object contains a reference to the previous node, and the next node (or perhaps a null pointer, if the node is the first or last in the list). Clearly, a memberwise copy of any single node would not be desirable! Figure 1 illustrates the tragedy that would ensue, if a shallow copy of the head node were inadvertently made:

Figure 1:  Deep vs. Shallow Copying
Figure 1: Deep vs. Shallow Copying

Back in the days of C++, this is where so-called "copy constructors" and "assignment operator overloading" came into play. Now, it's true that in C# you can define a constructor that looks and feels very much like a C++-style copy constructor -- and several classes in the FCL do this -- but the truth is they probably shouldn't bother, because they can't overload the assignment operator.

Rather, a system-defined interface exists for classes and structs to declare to the outside world that they support "deep copy" semantics. This interface is System.ICloneable, and it has a single method: Clone. It doesn't get any simpler.

Pages: 1, 2, 3

Next Pagearrow