Acovea

Using Natural Selection to Investigate Software Complexities



 

Acovea Logo

Downloads
acovea-gtk-1.0.1.tar.gz
libacovea-5.1.1.tar.gz
libevocosm-3.1.0.tar.gz
libcoyotl-3.1.0.tar.gz

About Acovea
Acovea Overview
A GUI for Acovea: Acovea-GTK
Configuration Files
FAQ
About the Genetic Algorithm

Analyses
Acovea 5.0, GCC 4.0, Opteron
Acovea 5.0, GCC 4.0, Pentium 4
Acovea 5.0, GCC 3.4, Opteron
Acovea 4.0, GCC 3.4, P4, AMD64 (May-04)
Acovea 3.3, GCC 3.x, P4 (Dec-03)

Licensing
GNU General Public License (GPL)
Commercial License

If you find this article useful, please consider supporting the author's free software efforts with a donation, no matter how small.

Abstract

ACOVEA (Analysis of Compiler Options via Evolutionary Algorithm) implements a genetic algorithm to find the "best" options for compiling programs with the GNU Compiler Collection (GCC) C and C++ compilers. "Best", in this context, is defined as those options that produce the fastest executable program from a given source code. Acovea is a C++ framework that can be extended to test other programming languages and non-GCC compilers.

I envision Acovea as an optimization tool, similar in purpose to profiling. Traditional function-level profiling identifies the algorithms most influential in a program's performance; Acovea is then applied to those algorithms to find the compiler flags and options that generate the fastest code. Acovea is also useful for testing combinations of flags for pessimistic interactions, and for testing the reliability of the compiler.

This article describes the application of Acovea to the analysis of the GNU C compiler optimization flags; details about the underlying genetic algorithm can be found in the companion article,  A Description of the Evolutionary Algorithm.

 

Modern software is difficult to understand and verify by traditional means. Millions of lines of code produce applications containing intricate interactions, defying simple description or brute-force investigation. A guided, deterministic approach to testing relies on human testers to envision every possible combination of actions -- an unrealistic proposition given software complexity. Yet, despite that complexity, we need answers to important questions about modern, large-scale software.

What sort of important questions? Consider the GNU Compiler Collection. I write articles that benchmark code generation, a task fraught with difficulties due to the myriad options provided by different compilers. For my benchmarks to have any meaning, I need to know which combination of options produces the fastest code for a given application.

Finding the "best" set of options sounds like a simple task, given the extent of GCC documentation and the conventional wisdom of the GCC developer community. Ah, if it were only so easy! The GCC documentation, while extensive, is also honestly imprecise. I appreciate this style of documentation; unlike many commercial vendors, who make absolute statements about the "quality" of their products, GCC's documenters admit uncertainties in how various options alter code generation. Indeed, code generation is entirely dependent on the type of application being compiled and the target platform. An option that produces fast executable code for one source code may be detrimental to the performance of another program.

"Conventional wisdom" arrives in my inbox whenever I publish a new article. Ranging from the polite to the insistent to the rude, these e-mails contain contradictory suggestions for producing fast code. In the vast majority of cases, such anecdotal assertions lack any formal proof of their validity, and, more often than not, the suggested "improvement" is ineffective or detrimental. It has become increasingly obvious that no one --myself included -- knows precisely how all these GCC options work together in generating program code.

I seek the Holy Grail of Optimization -- but exactly what is optimization? Understanding the problem is the first step in finding a solution.

What is Optimization?

Optimization attempts to produce the "best" machine code from source code. "Best" means different things to different applications; a database shovels chunks of information, while a scientific application is concerned with fast and accurate results; the first concern for an embedded system may be code size. And it is quite possible that small code is fast, or fast code accurate. Optimization is far from being an exact science, given the diversity of hardware and software configurations.

An optimization algorithm may be as simple as removing a loop invariant, or as complex as examining an entire program to eliminate global common sub-expressions. Many optimizations change what the programmer wrote into a more efficient form, producing the same result while altering underlying details for efficiency; other "optimizations" produce code that uses specific characteristics of the underlying hardware, such as special instruction sets.

Memory architectures, pipelines, on- and off-chip caches -- all affect code performance in ways that are not obvious to programmers using a high-level language. An optimization that may seem to produce faster code may, in fact, create large code that causes more cache misses, thus degrading performance.

Even the best hand-tuned C code contains areas of interpretation; there is no absolute, one-to-one correspondence between C statements and machine instructions. Almost any sequence of source code can be compiled into different -- but functionally equivalent -- machine instruction streams with different sizes and performance characteristics. Inlining functions is a classic example of this phenomena: replacing a call to a function with the function code itself may produce a faster program, but may also increase program size. Increased program size, may, in turn, prevent an algorithm from fitting inside high-speed cache memory, thus slowing a program due to cache misses. Notice my use of the weasel word "may" -- inlining small functions sometimes allows other optimization algorithms a chance to further improve code for local conditions, producing faster and smaller code.

Optimization is not simple or obvious, and combinations of algorithms can lead to unexpected results. Which brings me back to the question: For any given application, what are the most effective optimization options?

And the answer is... inscrutable

The GNU C compiler supports hundreds of options, dozens of which influence code generation on any given architecture. While the GCC documentation gives a short description of each option, not much (if anything) is said about how option -mAAA affects option -mBBB. Nor do all the side effects get listed.

Are the settings for -O1, -O2, and -O3 reasonable, or should they be changed? The only way to find an empirical answer is to run another set of tests with various combinations of individual optimization algorithms. Even if we assume that -O3 is the "optimal" set of options, which of the experimental and target-specific options will be most effective? Again, the only way to find out is trial and error, a tedious process for humans. It seems to me that most people guess -- since they're using a Pentium 4, they assume that -O3 -mfpmath=sse will generate faster code than will a plain -O3. I've made those sorts of guesses in the past, but they haven't always been correct. Acovea turns guessing into an empirical test.

GCC supports different switch sets for different languages and processors; in the case of the Pentium 4, for example, more than 60 different switches affect code optimization and generation. Testing every possible combination of those switches is simply impractical. Even at the unattainable theoretical speed of one test per second, it would take over hundreds of billions of years to run the complete set of possibilities!

My computer, GCC (and I!) will be quite obsolete before such an experiment could be finished. Clearly, a brute-force approach is out of the question. And it isn't possible to test options in isolation, either; optimization algorithms act on each other, requiring examination of different combinations. I might be able to make some educated guesses about specific options, but it is highly unlikely that I can predict the consequences of complex option interactions. 

Complexity is associated with the problem domain! At times like these, I consider using a genetic algorithm.

A What?

I've been using genetic algorithms (GA) for over a decade in various ways, and even wrote a book about them way back in 1996. You'll find GAs being applied across the computing spectrum, from network optimization to the design of aircraft wings. For the most part, applications of GAs tend to be rather specialized; they are only rarely seen in more pedestrian, general-purpose applications.

A particular efficacy of genetic algorithms is their ability to search large solution spaces. Clearly, quadrillions (in U.s. reckoning) of possibilities can be considered "large", and a GA looked to be a natural tool for satiating my curiosity. I've recently begun using GAs in experiments that test software reliability and performance; the GCC "option" question provided an excellent opportunity to put my theories to work on a practical problem.

I'm not going to introduce genetic algorithms in this article; details about the underlying genetic algorithm can be found in the companion article, Acovea: Analysis of Compiler Options via Genetic Algorithm. In a nutshell, a genetic algorithm is an iterative process based loosely on biological evolution through natural selection. A population of "organisms" is tested against a given problem and assigned a "fitness" value based on a level of success; a new generation of "organisms" is then generated, with higher-fitness "organisms" having the best chance of "breeding." Mutations may be introduced into members of the new population. Then the process repeats, until a specific number of generations are run or a desired outcome is reached. If you have difficulty understanding the mechanism of an evolutionary algorithm, read Melanie Mitchell's excellent book, An Introduction to Genetic Algorithms (MIT, 1996).

The C++ code for this article is based on the latest version of my Evocosm class library.

The Acovea GA itself is straight-forward: The initial population is built from random sets of compiler options (organisms), as provided by a compiler object. As of Avocea 4.0.0, a compiler is defined by an XML file; the "mechanics" section below contains more detail on how this works.

In each generation, gccacovea compiles an executable image for each set of options. When run, the benchmarks time their critical loop internally using clock_gettime, piping the result back to gccacovea as their "fitness". Smaller fitnesses (faster run times) are "more fit", and more likely to reproduce; mutation and migration between populations introduces variations to prevent populations from stagnating. In essence, finding the best options for GCC is a classic minimization problem. 

Building Acovea

Configuration and building proceeds along the usual lines:

    ./configure
    ./make
    ./make install

To build Acovea, you will also need the Coyotl and Evocosm libraries.

The installation creates a command-line program, runacovea, installed by default to /usr/local/bin. Running this program without arguments will display a command summary.

If you're more inclined toward GUI applications, try Acovea-GTK.

For parsing the compiler configuration files, which are XML, Acovea uses the expat library. Your distribution likely installed expat; if not, its home page is http://expat.sourceforge.net/

The installation directories can be changed using the --prefix option when invoking configure.

To run Acovea, use a command of the form:

    runacovea -config gcc34_opteron.acovea -input huffbench.c

XML Compiler Configurations

The configuration file is an XML document that sets the default and evolvable options and commands for a given compiler and platform. For example, the gcc34_opteron.acovea configuration is for GCC version 3.4 running on a 64-bit AMD Opteron system.

A sample set of GCC C compiler configuration files is located (by default) in /usr/local/share/libacovea/config. If the given configuration file can not be found based on the explicit name given, Acovea will attempt to locate the file in the installation directory above.

I provide configurations gcc and g++ for Pentium 3, Pentium 4, and Opteron processors using GCC 3.4. As I develop more configuration files (or if I received them from other users), I'll post the additional .acovea files on the web site. Acovea 5.0 uses a new configuration format.

Acovea-style Benchmarks

The benchmark file is a program in a language appropriate for the chosen configuration file; it must write its fitness value to standard output, where it will be read via pipe by the Acovea framework.

A sample set of C benchmark programs is located (by default) in /usr/local/share/acovea/benchmarks.

Acovea was designed with "spikes" in mind -- short programs that implement a limited algorithm that runs quickly (2-10 seconds per run). Using the default evolutionary settings, Acovea will perform 4,000 compiles and runs of the benchmark; even with a very small and fast benchmark, this can take several hours. If your program takes a full minute to compile, and another minute to run, Acovea will require 8,000 minutes, or more than 5.5 days to evolve optimal option sets!

Results from running Acovea against a few algorithms should NOT be applied across a broad spectrum of applications. For my Gentoo-based systems, I don't set the value of make.conf's CFLAGS based on Acovea results; I build specific, time-critical applications using algorithm-specific options.

However, due to popular demand, Acovea 5.0 now supports Makefiles.

About Analysis

As the genetic algorithm cycles through the generations, it refines the best set of options through natural selection; options that produce fast code will occur more often, while adverse option will tend to be winnowed away. Options that have no influence on code speed will occur, due to randomness, about 50% of the time. We can't simply say that the "best" flags are those enabled by the best organism, because it might include "null" options that don't affect performance.

So, how do we tell which options are truly important amid the noise of "null" options?

As it runs, Acovea counts the number of times an option is enabled by the best chromosome in each generation of each population. By default, Acovea evolves five populations of forty individuals for twenty generations; if an option is enable by every organism in every generation, it will accumulate a count of 50; the higher the count, the more often the option was enabled, and the more important it is for producing fast code on the given benchmark. Conversely, an option that is detrimental will appear very few times (or not at all), while neutral options (those that have no effect on code speed) should occur an average number of times (because it doesn't matter whether they are enabled or not).

When Acovea finishes a test run, it displays optimistic and pessimitic options, a set of suggested solutions, and a small graph comparing the performance of those suggestions with "baseline" tests. For example, when benchmarking a compiler, Acovea compares its solutions against compiles performed with -O1, -O2, and -O3. A cute little text graph is generated showing relative performance. A future version of Acovea-GTK will provide visual eye-candy.

To determine "optimistic" and "pessimistic" options, I take the final totals and calculate a "z score" for each option. The z score measures the distance of a value from a population's average, in units of standard deviation. If an option's z score is greater than 1, I assume it may be beneficial, while a z score of -1 or less indicates a detrimental option. In general, larger runs (more, bigger populations run over longer periods of time) will accentuate the differences between "good" and "bad" options, with some attaining z scores of 3 or more.

When analyzing Acovea runs, I look at three aspects:

  • Which optimization level produces the best code? For example, does -O3 provide any real benefit over -O1 and -O2?
  • Can Acovea find a set of options that is better than the existing "predefined" settings of -O1, -O2, and -O3?
  • Which options does Acovea identify as important in producing the fastest code for a given algorithm?




Software Products
Consulting Services
Curriculum Vitae

Computer Books
Fiction
Articles
Reviews

FAQ
Bibliography
Send E-mail



Link to Scott Ladd's Syraqua site

© 2008
Scott Robert Ladd
All rights reserved.
Established 1996


The grey-and-purple dragon logo, the blue coyote logo, Coyote Gulch Productions, Itzam, Evocosm, and Acovea are all Trademarks of Scott Robert Ladd.

Privacy Policy
Legal Stuff