Sunday, January 19, 2014

Devirtualization in C++, part 1

Devirtualization is an optimization turning polymorphic calls into direct calls. In these series I would like to describe of calls in C++ programs.  This area of compiler optimization is not really well covered by literature (it seems like all research is done on Java or interpreted languages). I will explain in detail how devirtualization works in GCC. Later I will also show some statistics collected from bigger applications, like Firerox or Libreoffice and it is quite clear that C++ programmers tends to abuse the use of virtual keyword in C++ and have various (non)-realistic assumptions on the resulting code.


How polymorphic calls are implemented

To make the series understandable to non-compiler developers, lets show small example:
struct A
{
  virtual int foo (void) {return 42;}
};
int test(void)
{
  struct A a;
  return a.foo();
}

Call of method foo in test is polymorphic.  In low level it test translates into the following code (I will use completely random C-like language based on GCC's GIMPLE):
void _ZN1AC2Ev (struct A * const this)
{
  this->_vptr.A = &_ZTV1A + 16;
}
int _ZN1A3fooEv (struct A * const this)
{
  return 42;
}
char * _ZTS1A = "1A";
struct typeinfo _ZTI1A = {&_ZTVN10__cxxabiv117__class_type_infoE + 16, &_ZTS1A}
struct vtable _ZTV1A = {0, &_ZTI1A, _ZN1A3fooEv}
int test() ()
{
  int (*__vtbl_ptr_type) () *tmp1;
  int (*__vtbl_ptr_type) () tmp2;
  struct A a;
 
  _ZN1AC2Ev (&a);
  tmp1 = a._vptr.A;
  tmp2 = *tmp1;
  return = tmp2 (&a);
}

This may seem horrible at first. The exact layout is specified by Itanium C++ Application Binary Interface (yes, it is used on non-Itanium architectures, too). There is a little utility c++filt that helps to translate the mangled names of inidividual things autogenerated by the compiler:
  1. _ZN1AC2Ev is the constructor (A::A) that stores virtual table pointer to every instance of structure A.
  2. _ZN1A3fooEv is the method A::foo itself.
  3. _ZTV1A is a virtual table of structure A.  It consists of offset-to-top pointer (in case A is within bigger class), RTTI pointer and an array of pointers to individual virtual methods, in this case there is one pointer to A::foo.
  4. _ZTI1A is the RTTI (Runtime type-information) of structure A. Generation of this "junk" can be supressed by -fno-rtti
Function test winds up into a call of constructor (A::A) and then standard polymorphic call sequence that looks up the method's address via the virtual table.

In fact I just noticed that GCC frontend also produce a dummy try...finally region to execute the empty destructor in case A::foo throws an exception.  I sent a patch to fix this. UPDATE: seems that the situation is not so simple, the try...finaly region is needed for optimization of dead stores, see the discussion following the patch.

Most users will anticipate an optimizing compiler to produce:
int test(void)
{
  return 42;
}
And it is also what every decent C++ compiler will do. Lets see how it is done.

Devirtualization in the front-end

Every modern compiler is split into a language-specific part (front-end), a part responsible for target-independent optimizations (middle-end), and, a part responsible for code generation (back-end). Devirtualization is by its nature language specific optimization and thus for lazy middle-end developer, like me, it is the natural to expect it to be implemented in the front-end.

In fact it is what C++ frontend in GCC (or clang in LLVM) does and I had to disable this transformation to get the testcase above in a full monstrosity.

The implementation is simple-minded. The front-end looks for calls of virtual methods where it immediately knows that derived types are not allowed and produce a direct call. This optimization will happen even with -O0 in all of the following cases:

int test(void)
{
  struct A a;
  return a.foo();
}
struct A a;
int test(void)
{

  return a.foo();
}
int test(struct A a)
{
  return a.foo();
}
Front-end also special case calls within the constructors and destructors since it is the situation where type of this pointer is known. Therefore even the following call gets optimized:
struct A
{
   virtual int foo (void) {return 42;}
   A(void)
     {
        foo ();
     }
};

For those brave enough to dive into GCC sources, cansee the exact rules  in resolves_to_fixed_type_p and fixed_type_or_null in cp/class.c.

Because it is not front-end's job to try hard to optimize across statement boundaries, whenever you however use pointer or reference, the call will stay polymorphic. So the following testcase will not get optimized:
int test(void)
{
  struct A a, *b=&a;
  return b->foo();
}
One way to aid devirtualization is to use final keyword (added by C++11) such as in this testcase otherwise equivalent to the previous one:
struct A final
{
  virtual int foo (void) {return 42;}
};
int test(void)
{
  struct A a, *b=&a;
  return b->foo();
}

Missed front-end devirtualizations

There are at least three cases where I think front-end (both clang and GCC) is just lazy. Apparently they both implement only the cases that are required by the C++ standard to happen. 

In the following testcase one can devirtualize because type is in anonymous namespace and the frontend can thus easily use the fact that there are no derivations:
namespace {
  struct A 
  {
    virtual int foo (void) {return 42;}
  };
}
int test(void)
{
  struct A a, *b=&a;
  return b->foo();
}
And here one can use the fact that foo is self recursive and compile it into infinite loop:
struct A 
{
   virtual int foo (void) {return foo();}
};

This transformation seem especially useful, because self-recursive virtual functions are surprisingly frequent.

UPDATE: I was too fast here. In the following testcase:
#include <stdio.h>
struct A
{
   virtual int foo (void) {return foo()+1;}
};
struct B: public A
{
   virtual int foo (void) {return 1;}
};
main()
{
  struct B b;
  printf("%i\n",b.A::foo());
}
One can not make the assumption about self-recursion. The invocation of A::foo() "recurses" to B::foo(). It may be rare enough case so it may be worth to track such explicit calls to virtual functions and treat them with extra care and only for types we know that are local to a given unit.

The last seemingly important case is when class is a subfield of other structure or contained in array.  Here again derived types are not allowed:
struct A 
{
   virtual int foo (void) {return foo();}
};
struct B {
   struct A a;
};
struct A a[7];
int test(void)
{
  return a[3].foo();
}
int test2(struct B *b)
{
  return b->a.foo();
}
Well, these cases are now tracked in PR59883. Lets see if C++ frotnend developers agree.

This is about it; all you never wanted to know about the implementation of devirtualization within front-end and its limitations. Next time I will write about basic middle-end techniques.


Author: , Google +

11 comments:

  1. Hi Honza, even though I do not understand anything here, I decided to support you by a comment:)

    ReplyDelete
  2. Do not worry, Yangjing, it will only get more technical ;)

    ReplyDelete
  3. Very interesting article, could you please add an example why are infinite loops quite common?

    ReplyDelete
    Replies
    1. Not infinite loops, but self recusive virtual functions. These exists, for example, in libstdc++'s ballanced tree implementations (dunno why)

      Delete
  4. resolves_to_fixed_type_p and fixed_type_or_null are in cp/class.c

    ReplyDelete
  5. To quote from your article:

    int test(void)
    {
    struct A a;
    return a.foo();
    }

    Call of method foo in test is polymorphic.

    How is the call to foo polymorphic since a is neither a pointer nor a reference?

    ReplyDelete
    Replies
    1. It is bit oversimplified. I disabled devirtualization in C++ frontend for the early examples (it is mentioned in the post)

      Delete
  6. Are you working on a 64 bit machine? The offset 16 makes me confusing at beginning.

    ReplyDelete