LTO on OS X

After getting an LTO build of firefox working on ELF, I decided to try OS X again. There are some differences on how the interface with the linker works and which features are needed from LLVM for it to work. Fortunately I got it just in time for LLVM 2.9 branching :-)

Linker plugins X libraries

Gold and now the traditional bfd based gnu linker use plugins to support LTO. The Apple linker uses a library. From the user perspective, it should not be all that different. Unfortunately, clang will pass -plugin to gold, but it will not set DYLD_LIBRARY_PATH when running the OS X linker, so the linker will then use the system copy of LLVM.

To use the local build of llvm, just set DYLD_LIBRARY_PATH. I used

DYLD_LIBRARY_PATH=... make -f client.mk

but it might be possible to set it somewhere in mozconfig.

The -dead_strip option

The OS X linker can strip dead code. This should work with LTO, but unfortunately I was getting linker crashes if using both. To test LTO I changed configure to disable -dead_strip if using LTO (see bug 638149).

It looks like the linker in Xcode 4 got better at this (at least my reduced test now works), but I was still unable to use it for Firefox.

Inline assembly

The Apple linker is a lot more strict than gold about the LTO library getting the symbol information right. For example, in

asm(".text\n  .globl foo\n foo:\n jmp bar");

The library must report that foo is defined and bar is undefined. The library also has to be careful about local definitions for cases like

asm(".text\n foo:\n jmp bar");
int foo(void);
int bar(void) {
  return 42;
}
int zed(void) {
  return foo();
}

In this example, reporting an undefined reference to foo would cause the link to fail.

I don’t know why the OS X linker is so strict, but fixing this also helps with ELF a bit. For example, ar and nm get more accurate symbol information.

What libLTO was doing before was a simple search for .globl. This gets some cases wrong, like a .globl inside an if that is not taken. It also wasn’t trying to report undefined references.

To get all the cases right, you need a real asm parser. Luckily the MC project added just that to LLVM.

In MC there is an interface called Streamer. It has more or less one method for each line in an assembly file. There is one for instructions, one for labels, one for .long, etc.

To report the symbols defined and used, all that was needed was to created a new streamer that just records uses, lables, and .globl.

There was still the issue of llvm removing a function that was used only by inline assembly. To solve that, libLTO now adds a llvm.compiler.used variable. This is similar to a __attribute__((used)) in C. In fact, one could try to use the same idea in clang so that the attribute is not required for static functions used on inline assembly (see bug 9364).

Results

Debug info is disabled on all cases. All but the LTO test have -dead_strip enabled.

Build Time

Clang LTO Clang Gcc
Real 106m19.830s 34m58.584s 55m46.200s
User 137m14.200s 74m0.843s 131m12.997s
Sys 8m59.300 8m33.727s 13m44.164s

I was surprised by such a large increase in wall time. The OS X linker is using the same library as gold (the gold plugin links with it), so it must be doing something inefficient with it. One thing that helped a lot on gold was to change the plugn to merge modules as they are loaded, I can only guess that it would help in here too.

Binary Sizes

Clang LTO Clang Gcc
.dmg 28745376 27300927 28619705
32 bit XUL 34700596 31359356 33446792
64 bit XUL 33807320 34292160 35557336

This one was even more surprising. Adding -emit-llvm to CFLAGS and CXXFLAGS causes almost all the files to be compiled to IL. Why would the lack of -dead_strip cause such a big difference?

I decided to investigate a bit with a simple 2 file program. The first file defines a function f that return 42 and the second file has a main that just returns f().

Linking it with LTO and no -dead_strip produces a binary where main just returns 42 directly, so LTO is working. The strange thing for me was that the function f was still in the file! Adding -dead_strip removes it.

Building LLVM in debug mode and running gdb on the linker showed that without -dead_strip the OS X linker will call lto_codegen_add_must_preserve_symbol on both _main and _f. Adding -dead_strip didn’t change that, so it just causes the linker to remove the then dead function f.

Dromaeo

Using the 64 bit version.

This entry was posted in Uncategorized. Bookmark the permalink.

10 Responses to LTO on OS X

  1. Jan Hubicka says:

    As for parsing asm statements, this is not quite coold idea from GCC’s POV and it is not always possible as ASM statements might do funny things with the surrounding asm context they are expected to be put to. (i.e. playing with visibilities and doing other ugly tricks that really should be done with attributes and other infrastructure instead)

    I think we should go with extending toplevel asm syntax to add list of symbols defined and objects used. I believe SUN compiler already does that (at least I saw some of this in the Mozilla sources).

  2. Jan Hubicka says:

    Also gold uses plugin to avoid two stage linking, perhaps that is difference in the linking time of apple’s versus gold.
    I would be however surprised it would make that much difference given that whole library is in IL.

  3. respindola says:

    I have no idea how the apple linker works internally, but I don’t think it is a good idea to extend the asm syntax.

    LTO should really be a drop in feature. One way to do it is what gold/bfd linker does by checking what symbols are defined in the final .o. The OS X linker puts the burden on the library.

    Extending the syntax puts the burden on the programmer. I have found inline asm uses in mozilla that uses the C preprocessor to produce an assembly that uses the asm preprocessor. I don’t want to have to annotate that :-)

    With MC the results should be reliable. That is the code path when producing the .o that will be sent back to the linker anyway :-)

  4. Jan Hubicka says:

    With gold things works by an accident: you are linking shared library and thus gold is happy about undefined symbols assuming that they will be supplied by the dynamic linker. Later when the plugin provides them, gold does the right thing. libxul won’t however link if it was linked statically into main binary. We have GCC PRs on that.

    MC way won’t fly with GCC in general and the idea that asm statements should be transparent for compiler. I definitely saw asm statements passing things from one statement to another and doing similar crazy stuff. All those hacks can’t be supported reliably and thus it is IMO better to add extra syntax to declare these things clearly.

    Well, asm statements producing functions should be separate .s files at first place, but anyway…

  5. Jan Hubicka says:

    Another problem I expect to solve with extending asm syntax is solving references to static functions. Lets say that we have compilation unit c1.c that define static function foo with used attribute that is referenced from toplevel asm statement.
    Now we want to LTO it with unit c2.c that does pretty much the same. There is need to rename one foo or another, so you need also update the bodies of asm statements to compensate the renaming.
    With extended syntax and references like in GNU asm statement syntax, this problem goes away. Plus we avoid need to annotate vars as used when they are used from asm statement – it seems cleaner to annotate them at the time they are actually used.

  6. respindola says:

    I actually don’t think it is by accident, we pass -z defs. If I understand it correctly, gold delays the check for undefined symbols left until the plugin gives back the final object.

    In any case, the gold plugin uses libLTO, so even if gold has this limitation it should work now. What is the PR number? How are you linking XUL into the main binary? That should be an interesting thing to try!

    I agree that it would be better to have that ASM in a .s file. In fact, I wrote a patch for that first (just the linux x86-64 bits), but then decided that LTO should not impose extra restrictions on what language the compiler should accept.

    It is a pity a MC like approach will no fly in GCC. MC has other big advantages and without it the burden of handling inline asm goes to the programmer (which is bad) or the linker (which is bad on archers we don’t control the linker).

  7. respindola says:

    I proposed handling the ‘used from asm’ bit by using MC too in llvm.org/Pr9364. The case of renaming a static function is interesting, and I think we could handle that too by passing the renaming down to MC.

    The case you are describing is
    c1.c:
    static void foo(void) {…}
    asm(“…call foo\n ”’)
    c2.c:
    static void foo(void) {…}
    asm(“…call foo\n ”’)

    Correct?

  8. Jan Hubicka says:

    Yes, that is what I had in mind.
    The fact whether you can handle all cases by parsing the (partial) assembly code you have in the statement obviously depends on how complex asm constructs you want to support within the asm statements and if you want to have integrated assembler for all the lto enabled targets….

  9. Jan Hubicka says:

    I never actually tried to make static firefox build (I think it would make a lot of sense given the expenses of fPIC). We run into the problem when building Linux kernel with LTO.

    I believe both gnu LD and gold will fail linking static binary if they think there is symbol used by not defined before the plugin is actually run.

    The PR I had in mind is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46820

  10. Jan Hubicka says:

    by before plugin is actualy run I really meant before plugin provide final .o files. I.e. if the LTO symbol tables looks wrong, the linker reports error.