Untitled Page

Deterministic Destruction in C++/CLI

Looks at how deterministic destruction is possible with C++/CLI

Introduction

Many C++ programmers were rather unhappy with the non-deterministic finalization feature they were provided with by the .NET Garbage Collection algorithm. C++ programmers were so much used to the RAII (Resource Acquisition Is Initialization) idiom where they expected a destructor to get called when an object went out of scope or when delete was explicitly called on it, that a non-deterministic destructor simply didn't fit their expectations or requirements. Microsoft alternatively offered the Dispose pattern where classes had to implement IDisposable and then call Dispose on their objects when they went out of scope. The basic issue here was that this required the programmer to manually and consistently call Dispose whenever the object needed to be finalized and it became worse when the object had managed member objects that themselves would need to have Dispose called on them, which then meant they too needed to implement IDisposable. Tiresome sounding, isn't it?

Guess what? In C++/CLI, the Microsoft VC++ team is giving us a destructor that internally gets compiled to the Dispose method and the old finalizer gets an alternate syntax, so we basically have finalizers and destructors as two separate entities that behave differently as they should have in the previous version. The designers of C# made the unfortunate initial mistake of calling their finalizer a destructor and I presume there must be tens of thousands of C# coders out there who have no inkling of the fact that they have got a basic concept in object life-time maintenance absolutely confused with the wrong thing.

Note

It's easy to wrongly call automatic objects as stack objects in C++/CLI, but it should be remembered that the seemingly stack based objects actually reside on the CLR heap, as they are still normal garbage collected ref objects. It's a C++ compiler trick that allow us to treat these variables just as we used to treat stack based objects in unmanaged C++ during the good old days.

The new syntax

In C++/CLI, destructors follow the same syntax used in the pre-managed times, where ~classname would be the method name for the destructor. It also brings out a new naming syntax, !classname which is the method name for the finalizer. Here is what a typical class would look like :-

ref class R1
{
public:
    R1()
    {
        Show("R1::ctor");
    }
    ~R1()
    {
        Show("R1::dtor");
    }
protected:
    !R1()
    {
        Show("R1::fnzr");
    }    
};

The destructor (~R1) gets compiled into a Dispose method in the generated IL.

.method public newslot virtual 
        final instance void 
        Dispose() cil managed
{
  .override [mscorlib]System.IDisposable::Dispose
  // Code size       17 (0x11)
  .maxstack  1
  IL_0000:  ldstr      "R1::dtor"
  IL_0005:  call       void [mscorlib]
        System.Console::WriteLine(string)
  IL_000a:  ldarg.0
  IL_000b:  call       void [mscorlib]
        System.GC::SuppressFinalize(object)
  IL_0010:  ret
} // end of method R1::Dispose

The C# equivalent of the above would be :-

public void Dispose()//IDisposable::Dispose
{
      Console.WriteLine("R1::dtor");
      GC.SuppressFinalize(this);
}

There is a call made to GC::SuppressFinalize in the generated Dispose method. This is done to ensure that the finalizer does not get called during the garbage collection cycle that claims this object's memory. If that sounds confusing, remember that we are still restricted by the environment which we are targeting, which happens to be the CLR. In the CLR, reference objects are allocated on the CLR heap and their memory is reclaimed when they are out of use by the Garbage Collector, there is no way the programmer can free up the memory on his/her own. So, even if our destructor gets called, the memory will be released only during the next GC cycle and at that point we don't want the GC trying to call Finalize on our object. GC::SuppressFinalize basically removes the object from the finalization queue.

How it's implemented

void _tmain()
{   
    R1 r;
}

I've declared r as an automatic variable. Now let's see the IL that gets generated for this :-

.method public static int32  
        main() cil managed
{
  .vtentry 1 : 1
  // Code size       16 (0x10)
  .maxstack  1
  .locals (class R1 V_0)
  IL_0000:  ldnull
  IL_0001:  stloc.0
  IL_0002:  newobj     instance void R1::.ctor()
  IL_0007:  stloc.0
  IL_0008:  ldloc.0
  IL_0009:  call       instance void R1::Dispose()
  IL_000e:  ldc.i4.0
  IL_000f:  ret
} // end of method 'Global Functions'::main

The C# equivalent for that would be :-

public static int main()
{
      R1 r = null;
      r = new R1();
      r.Dispose();
      return 0;
}

Pretty straightforward stuff as you can see with Dispose being called when the object goes out of scope. You might be a little surprised that there is no try-catch block in there, but that's because our code fragment was too simple. try-catch blocks are used only if they are required, in the above case, it's not so. Let's see the following code snippet :-

void _tmain()
{   
    R1 r;
    int y=100;
}

The IL generated :-

.method public static int32 
        main() cil managed
{
  .vtentry 1 : 1
  // Code size       28 (0x1c)
  .maxstack  1
  .locals (class R1 V_0,
           int32 V_1)
  IL_0000:  ldnull
  IL_0001:  stloc.0
  IL_0002:  newobj     instance void R1::.ctor()
  IL_0007:  stloc.0
  .try
  {
    IL_0008:  ldc.i4.s   100
    IL_000a:  stloc.1
    IL_000b:  leave.s    IL_0014
  }  // end .try
  fault
  {
    IL_000d:  ldloc.0
    IL_000e:  call       instance void R1::Dispose()
    IL_0013:  endfinally
  }  // end handler
  IL_0014:  ldloc.0
  IL_0015:  call       instance void R1::Dispose()
  IL_001a:  ldc.i4.0
  IL_001b:  ret
} // end of method 'Global Functions'::main

The moment the compiler realizes that there is a probable contingency where control might not reach the line that calls Dispose, it implements a try block and in case of any exception, calls Dispose within the fault handler. The C# equivalent would be :-

public static int main()
{
      R1 r = null;      
      int y;
      r = new R1();
      try
      {
            y = 100;
      }
      catch
      {
            r.Dispose();
      }
      r.Dispose();
      return 0;
}

You could also declare the object as a handle object and then manually call delete on it which equates to calling Disposeon your object.

void _tmain()
{      
   R1^ r = gcnew R1();   
   delete r;   
}

The generated IL is a little more complex for this case (I am not fully sure why an unnecessary int variable is introduced for instance.)

.method public static int32  
        main() cil managed
{
  .vtentry 1 : 1
  // Code size       27 (0x1b)
  .maxstack  1
  .locals (class [mscorlib]System.IDisposable V_0,
           class R1 V_1,
           int32 V_2)
  IL_0000:  ldnull
  IL_0001:  stloc.1
  IL_0002:  newobj     instance void R1::.ctor()
  IL_0007:  stloc.1
  IL_0008:  ldloc.1
  IL_0009:  stloc.0
  IL_000a:  ldloc.0
  IL_000b:  brfalse.s  IL_0017
  IL_000d:  ldloc.0
  IL_000e:  callvirt   
    instance void [mscorlib]System.IDisposable::Dispose()
  IL_0013:  ldnull
  IL_0014:  stloc.2
  IL_0015:  br.s       IL_0019
  IL_0017:  ldnull
  IL_0018:  stloc.2
  IL_0019:  ldc.i4.0
  IL_001a:  ret
} // end of method 'Global Functions'::main

As I mentioned, I am truly puzzled by the V_2 int32 variable. Here is the C# equivalent for those of you who don't like looking at IL.

public static int main()
{
      int v2;
      R1 r = null;
      r = new R1();
      IDisposable d = r;
      if (disposable1 != null)
      {
            d.Dispose();
            v2  = 0;
      }
      else
      {
            v2 = 0;
      }
      return 0;
}

My best guess is that this is to help the CLR Execution Engine do run-time optimizations; in the above case, the entire if loop might possibly be skipped if r is not null.

How member objects are handled

See the following code snippet :-

#define Show(x) Console::WriteLine(x)

ref class R1
{
public:
   R1()
   {
      Show("R1::ctor");
   }
   ~R1()
   {
      Show("R1::dtor");
   }
protected:
   !R1()
   {
      Show("R1::fnzr");
   }   
};

ref class R
{
public:
   R()
   {
      Show("R::ctor");
   }
   ~R()
   {
      Show("R::dtor");
   }
   R1 r;
protected:
   !R()
   {
      Show("R::fnzr");
   }   
};

Let's take a look at R's constructor in the generated IL :-

.method public specialname rtspecialname 
        instance void  .ctor() cil managed
{
  // Code size       28 (0x1c)
  .maxstack  2
  IL_0000:  ldarg.0
  IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
  IL_0006:  ldarg.0
  IL_0007:  newobj     instance void R1::.ctor()
  IL_000c:  stfld      class R1 modopt(
      [Microsoft.VisualC]Microsoft.VisualC.IsByValueModifier) R::r
  IL_0011:  ldstr      "R::ctor"
  IL_0016:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_001b:  ret
} // end of method R::.ctor

Equivalent C# code would be :-

public R()
{
      this.r = ((R1 modopt(Microsoft.VisualC.IsByValueModifier)) new R1());
      Console.WriteLine("R::ctor");
}

The compiler inserts a custom modopt modifier into the instantiation of the R1 object which would give the JIT compiler some idea of how to treat it. In this case, it has marked it with Microsoft.VisualC.IsByValueModifier which presumably means that this object is to be treated as a pass-by-value object. Anyway, that's beyond the scope of this article and what I wanted to put forth here is that the R object's constructor also instantiates and constructs the R1 member object.

Now let's see the R class destructor :-

.method public newslot virtual final instance void 
        Dispose() cil managed
{
  .override [mscorlib]System.IDisposable::Dispose
  // Code size       42 (0x2a)
  .maxstack  1
  .try
  {
    IL_0000:  ldstr      "R::dtor"
    IL_0005:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000a:  leave.s    IL_0018
  }  // end .try
  fault
  {
    IL_000c:  ldarg.0
    IL_000d:  ldfld      class R1 modopt(
       [Microsoft.VisualC]Microsoft.VisualC.IsByValueModifier) R::r
    IL_0012:  call       instance void R1::Dispose()
    IL_0017:  endfinally
  }  // end handler
  IL_0018:  ldarg.0
  IL_0019:  ldfld      class R1 modopt(
       [Microsoft.VisualC]Microsoft.VisualC.IsByValueModifier) R::r
  IL_001e:  call       instance void R1::Dispose()
  IL_0023:  ldarg.0
  IL_0024:  call       void [mscorlib]System.GC::SuppressFinalize(object)
  IL_0029:  ret
} // end of method R::Dispose

Equivalent C# code is :-

public void Dispose()
{
      try
      {
            Console.WriteLine("R::dtor");
      }
      catch
      {
            this.r.Dispose();
      }
      this.r.Dispose();
      GC.SuppressFinalize(this);
}

As you can see, Dispose is called on the member object as well. The compiler sure does generate a lot of code for us, eh?

In the above discussed case, the member object was also an automatic variable. But what if we had a handle variable as a member? In that case, we should manually delete the member variable in our destructor, otherwise there won't be so much benefit out of the deterministic destruction if the member objects will then have to wait for an unpredictable GC cycle before they get disposed. So, this is what we need to do for such cases :-

ref class R
{
public:
    R()
    {
        r = gcnew R1();
        Show("R::ctor");
    }
    ~R()
    {
        delete r;
        Show("R::dtor");
    }
    R1^ r;
protected:
    !R()
    {
        Show("R::fnzr");
    }    
};

Warning

Do not delete member objects manually from your finalizer, because there is every chance that by the time the finalizer is called on your object, its member objects might already have been finalized.

Performance boost

By using destructors whenever possible instead of finalizers, you would see a small-to-medium performance boost in your code. Problem with finalizers it that, the GC promotes objects that need to be finalized to at least Generation 2, and then the finalizer thread will have to run the Finalize method on objects that need finalizatioon, and then the GC has to reclaim the memory in a future cycle.

Points to remember when using destructors

You cannot have a method named Dispose in your class, for obvious reasons
~classname is the destructor and !classname is the finalizer
Destructors get called when the object goes out of scope, but the memory won't be freed up until the next GC cycle
The destructor and finalizer won't get called for the same object
For automatic member variables you don't need to do anything special
For handle member variables, make sure to delete them manually in the destructor

Conclusion

Essentially the C++/CLI deterministic destructor implementation is internally a syntactically pleasant form of the Dispose-Pattern and the compiler generates just about all the code that we require. I believe C# 2.0 has a slightly inferior form where they use the using-keyword. The big plus about the C++/CLI destructor syntax is that it fits in naturally to what a native C++ programmer expects his/her destructor to do, and he/she needn't even be aware of the Dispose pattern that's being used internally. Thanks to Herb Sutter and his team :-)