Thursday, September 18, 2014

Devirtualization in C++, part 7 (Enforcing One Definition Rule)

This time I am going to write about a feature that I implemented as a side effect of the devirtualization machinery: the verification of One Definition Rule on types.

One Definition Rule (ODR) is one of more interesting cases where C and C++ differs in very subtle way. Wikipedia describes it as follows:
In short, the ODR states that:
  1. In any translation unit, a template, type, function, or object can have no more than one definition. Some of these can have any number of declarations. A definition provides an instance.
  2. In the entire program, an object or non-inline function cannot have more than one definition; if an object or function is used, it must have exactly one definition. You can declare an object or function that is never used, in which case you don't have to provide a definition. In no event can there be more than one definition.
  3. Some things, like types, templates, and extern inline functions, can be defined in more than one translation unit. For a given entity, each definition must be the same. Non-extern objects and functions in different translation units are different entities, even if their names and types are the same.
Some violations of the ODR must be diagnosed by the compiler. Other violations, particularly those that span translation units, are not required to be diagnosed.[1]
This means that C++ type names are global and should not be re-used in different ways in different source files. (In C doing so is perfectly valid and common, types across units are considered equivalent if their structure is.)

One of motivations for ODR is to make name mangling possible. Type names are used, for example, as program wide identifiers to distinguish functions and methods of same name but taking different types of arguments (function overloading):
struct A {int a;};
int getval (struct A a) {return a.a;}
struct B {char a;};
int getval (struct B a) { return a.a;}
This compilation unit will result in exporting two function symbols:  _Z6getval1A and _Z6getval1B. A and B comes from name of the argument's type.

Now, obviously, if another unit defines completely different structure A and completely different getval taking it as argument, the function will also be called _Z6getval1A and things will fail miserably. General violations of ODR can not be diagnosed by compiler working on single translation unit in isolation. If one is lucky, ODR violation turns into linker error about duplicated symbol name. With presence of inline functions however this will not happen and one of the two functions will be eliminated from program. This may result in invocation of a function on wrong data crashing program in random ways.

Tuesday, September 9, 2014

Linktime optimization in GCC, part 3 - LibreOffice

After Firefox, I decided to look into LibreOffice performance with link time optimizations (LTO) and profile feedback (FDO). This post was in works since April, so my setup is not exactly bleeding edge - GCC 4.9.1 prerelease, LLVM 3.5 and LibreOffice checkout from April. Some real world issues (like a wedding) interrupted my work, however the data presented should be still relevant and interesting.

Wednesday, August 20, 2014

Devirtualization in C++, part 6 - asking user for help

In previous posts of the series I described most of the new GCC devirtualization machinery. To cut the story short, GCC implements two kinds of devirtualization: the full and speculative one.

Full devirtualization replaces given polymorphic by a direct call (that can get inlined later).  Speculative devirtualization is a weaker form turning polymorphic call to a conditional testing whether the target is the predicted one and going by direct call path if it happens to be so and doing the indirect call otherwise.

Whole performing speculative devirtualization the compiler can play somehwat unsafe and make assumptions that are not valid 100% of cases. The main assumption made is that there are no new derived types introduced by code not visible to compiler. With link time optimization this is usually true, but not necessarily so - new derived types can be introduced by libraries or plugins. While compiler can generally attempt to track what instances may not be introduced by external code, this analysis is difficult and it is a question whether it can be implemented with reasonable precision within statically optimizing compiler.

The following example shows speculative devirtualization:
struct A {virtual void foo() {}};
void test (struct A *a)
{
  a->foo();
}
 Here GCC 4.9 produces:
test(A*):
        movq    (%rdi), %rax
        movq    (%rax), %rax
        cmpq    $a::foo(), %rax
        jne     .L5
        ret
.L5:
        jmp     *%rax
Instead of:
test(A*):
        .cfi_startproc
        movq    (%rdi), %rax
        movq    (%rax), %rax
        jmp     *%rax
If the target of polymorphic call happens to be a::foo(), the speculatively devirtualized code is significantly faster saving the call overhead (and enabling further optimizations).

Monday, April 21, 2014

Linktime optimization in GCC, part 2 - Firefox

This time I will write about building Firefox with LTO. Firefox is one of largest binaries on my hard drive (beaten only by Chromium) so it is an excellent stress test. Taras Glek and I started to work on getting Firefox to build well in 2009 (see original paper from 2010), however for all those years I was mostly chasing correctness and compile time/memory usage issues. Markus Trippelsdorf was instrumental to keep the project going. In parallel Rafael Espindola did same effort of getting Firefox to work with LLVM and LTO

Linktime optimization in GCC, part 1 - brief history

Link-time optimization (LTO) is one of the most actively developed features of GCC.
Since 2010, I am working on getting LTO to work well with real world applications and with GCC-4.9 release candidate out, perhaps it is a good time to share my experiences.

I plan to write about getting Firefox (Update: part 2 is here), Libreoffice, Chromium and other large applications to work with LTO. This will hopefully give an broader idea what kind of setup & portability issues one can hit. I will also show some benchmarks. Today I will however start with a short review of history and current status of implementation of link-time optimizations in GCC.

Friday, April 11, 2014

Devirtualization in C++, part 5 (feedback driven devirtualization)

After a break, I finally have time to return to the devirtualization series. Last time I was speaking about speculative devirtualization performed by static analysis of the type inheritance graph. This time I plan to speak about simple (but very effective) feedback directed devirtualization. While GCC implements basic feedback directed  devirtualization since 2003, GCC 4.9 will newly make this cross-module. I also finally show some statistics how effective the described optimizations are.

Monday, February 17, 2014

Devirtualization in C++, part 4 (analyzing the type inheritance graph for fun and profit)

Last time I described the construction of the type inheritance graph, a main datastructure used by special purpose devirtualization algorithms. Now it is time for some real optimization fun! In this post I will review the most elementary transformations and will show some real world data.

Sunday, February 9, 2014

Devirtualization in C++, part 3 (building the type inheritance graph)

Having covered the basic optimizations done by the C++ frontend and the way how generic middle-end optimizations may devirtualize. It is a time to dive into an wonderful world if special-purpose middle-end devirtualization techniques.

Topic of optimization of polymorphic calls is seemingly well covered by the literature with active research since the advent of object oriented programming (see, for example, DF Bacon, PF Sweeney: Fast static analysis of C++ virtual function calls and the numerous papers citing thisone). Pretty much all of the techniques are based on constructing and analyzing the type inheritance graph.

Unfortunately most of the research papers make assumptions that are not met in modern C++ programs. Here the situation is complicated by many C++ features including interesting implementation details of multiple and virtual inheritance partly described in this post and ability to make dynamic type changes. Important is also the fact that the compiler hardly ever see the whole program at once. Even with the link-time optimization one has to take into account the presence of shared libraries. Because shared libraries are supposed to be upgradeable without recompiling programs that use them, the compiler can not make any analysis beyond what it is given in the header files and as explained last time sometimes not even that. Moreover shared libraries are also often dynamically loaded as plugins. Contemporary C++ programs are really a lot more dynamic than most of the research assume.

Today I thus cover the seemingly simple topic of building and walking the type inheritance graph and show details one has to think about. It may seem exhausting, but so was my way to gain all this wisdom. :) I promise that next time I finally get to some real optimization fun.

Sunday, January 26, 2014

Devirtualization in C++, part 2 (low-level middle-end devirtualization by forwarding stores to loads)

In the first part of the series we saw low level implementation of a polymorphic call. It is a common belief that devirtualization can be done well by the generic middle-end optimization techniques - combination of the constant propagation, store-to-load forwarding and indirect call removal. (See, for example, this post on llvm-dev, GCC developers tends to share similar sentiments.) This time I show how this is implemented and why one needs special purpose devirtualization code in the middle-end to really optimize well.

Sunday, January 19, 2014

Devirtualization in C++, part 1

Devirtualization is an optimization turning polymorphic calls into direct calls. In these series I would like to describe of calls in C++ programs.  This area of compiler optimization is not really well covered by literature (it seems like all research is done on Java or interpreted languages). I will explain in detail how devirtualization works in GCC. Later I will also show some statistics collected from bigger applications, like Firerox or Libreoffice and it is quite clear that C++ programmers tends to abuse the use of virtual keyword in C++ and have various (non)-realistic assumptions on the resulting code.

Saturday, January 18, 2014

Hello World!

This is freshly created blog of Honza Hubička. I plan to write here on compilers, math, history of photography and perhaps more.

Why I am creating a blog? 

This week I moved to Calgary for a post-doc in mathematics. In addition to that I am also co-running Šechtl & Voseček Museum of Photography and develop GCC. These activities usually keeps me busy. For some reason I believe that after my move I will have plenty time to sit down and write something interesting here.

Last time I did something similar was in 90's when the word "blog" did not exist. Here is what remained from my activity back then. Lets see if I find enough of persistence to get this going.