Even More Experiments with Includes

By Noel Llopis
25 January 2005

One aspect of the scientific process is publishing detailed experiment descriptions and results so that they can be independently verified by other scientists. That's exactly what I decided to do after reading Kyle Wilson's surprising results in his article “Experiments with Includes.”

Kyle found that using internal include guards was significantly slower than using #pragma once with the C++ compiler in Microsoft Visual Studio 2003 and 2005 beta. That just flies in the face of the measurements I had done before and what other people had reported. How was it possible?

The first thing that struck me when looking at Kyle's results is that he's using an extremely pathological case: 200 header files, each of them including all 200 headers. Still, the point is to measure header include performance, so maybe that was fine. More on that later.

The first thing I did was to write a quick script to generate a set of header files and main file given some parameters. I took the chance to write in Python, which displaced Perl as my favorite scripting language a few months ago. The script will generate a certain number of header files, using either no guards, internal guards, external guards, or pragma statements. In order to be able to compile no guards (and to keep things a bit more realistic), I actually had each header file include all the header files with numbers higher than itself. Feel free to download the script and play with it if you want to run the measurements yourself.

jungle Since I'm running Linux at home, I first ran the experiment with gcc 3.4.1. First I tested no guards, but I had to keep it to a set of 20 header files to avoid taking forever with an astronomical number of includes (they really add up combinatorially!). As expected, even a 20x20 half-matrix of includes too a while to compile: 2+ minutes. The same set of headers with internal include guards was just a blip at 0.15s. So far, so good. That's what I expected.

Now I cranked things up to 200 headers like Kyle had done. Here comes the first surprise of the day: gcc blows up trying to parse that many includes. I get a “#include nested too deeply” error. Digging through the gcc documentation I wasn't able to find any way to increase that depth. That made me realize that having include chains of 200+ headers is completely unrealistic. I knew that it was artificially large, but it's probably so by an order of magnitude at least.

Still, just for the test, I decided to go ahead. It turns out it was choking very near the end, so doing a run with 195 headers worked just fine. The results were what I would have expected. A bit better actually, and certainly much better than the results Kyle saw in Visual Studio: all the runs (internal guards, external guards, and pragma) took about the same time (0.07 seconds), and each of them included exactly 195 headers. No more, no less. That's exactly how I would expect the compiler to behave.

gcc 3.4.1 (Linux with 2.6 kernel). 195 headers.

  • Internal guards: 0.07s
  • External guards: 0.09s
  • Pragma directive: 0.07s

At work, I'm not as lucky, and I have to use Windows and Microsoft Visual Studio, so I ran the second set of tests there. The results confirmed what Kyle reported: The internal guards were very slow (14+ seconds), the external guards where blazingly fast (0.37 seconds), and the pragma directive was somewhere in between (9.6 seconds).

Microsoft Visual Studio 2003. 195 headers.

  • Internal guards: 14.7s
  • External guards: 0.37s
  • Pragma directive: 9.57s

I was pretty amazed to see that. It seems like a major flaw in the Visual C++ compiler, doesn't it?

To round out the tests, since I had the Metrowerks PS2 compiler handy, I decided to run the same set of tests with it. It turns out that it completely blew up, complaining of includes nested too deeply with the project with 195 headers. To my surprise, I had to lower the number of includes to 30 in order to be able to compile it at all. Anything over 30 would cause it to blow up.

That made me think again about the worst cases I would see in a typical project, and I realized that having over 30 includes deep at once is probably very, very rare, even if you're using STL or Boost, which make heavy use of header files.

Just for the sake of completeness, I decided to run the tests with just 30 includes and see if I could discern any patterns from the results. It turns out that Metrowerks is pretty slow overall, but at least the differences between the three approaches are minimal.

Metrowerks PS2 compiler v3.0. 30 headers.

  • Internal guards: 0.60s
  • External guards: 0.46s
  • Pragma directive: 0.49s

I ran the same set of tests with Visual Studio and it showed the external includes being the clear winner, but internal and pragmas being almost the same. I wonder if Visual C++ uses some dynamic algorithm that avoids having a fixed hard limit on include depth at the cost of runtime performance. I'll take a fixed depth and flawless behavior like gcc any day personally.

Microsoft Visual Studio 2003. 30 headers.

  • Internal guards: 0.48s
  • External guards: 0.14s
  • Pragma directive: 0.41s

gcc was so fast with such a tiny project that I couldn't reliably measure it. But if the 195 includes was taking 0.07 seconds, you can guess how fast it churned through just 30 header files.


It does seem that Microsoft Visual Studio has some major problems with includes in pathological situations. On the other hand, it also seems that those situations are completely unrealistic and will never happen in a real code base. For a more realistic situation (30x30 half matrix), internal guards and pragmas are the same, and external guards have somewhat of an edge.

gcc behaved like a real champion by being super fast and efficient no matter what technique you threw at it. Way to go! The Microsoft compiler certainly could stand some improvement in that area.

Metrowerks was the slowest of the bunch, but it dealt with all the different techniques just fine for a relatively small set of tests.

Overall, there should be no real difference between using internal guards and pragma directives (which happily confirms some of the measurements we had done in real code bases). So stick with internal guards, which are standard and work on any compiler, but if adding a #pragma once directive surrounded by conditionals (because it's not standard and it's not supported in all the compilers) in addition to the internal guards would make you sleep better, go for it. External guards might have an edge with Visual Studio, but the pain and potential trouble of using external guards clearly outweighs any speed benefits gained by using them.

icon generate_includes.py

| Comments (11)