IIT Bombay
GCC Resource Center Department of Computer Science & Engg.
IIT Bombay

Welcome to the GCC Resource Center at I.I.T. Bombay.

GCC is an acronym for GNU Compiler Collection. It is the de-facto standard compiler generation framework for all distros on GNU/Linux and many other variants of Unix on a wide variety of machines and is one of the most dominant softwares in the free software community. Although it follows an open, collaborative development methodology and its source code is available to all for inspection and modification, not much effort has gone in bridging the gap between standard conceptual structure of a compiler and the GCC implementation.

This web site is an attempt to give you an insight into ideas and concepts that go behind a practical, industry strength compiler. Visit the GCC Internals Documents page for more information about the internal structure and operation of GCC that we have uncovered. We also plan to have other activities at the center.

Interesting Aspects of GCC

Historically, GCC has been one of the first projects of the Free Software Foundation (FSF) to provide a free compiler for its GNU Operating System. It started as C compiler, and was the acronym for "GNU C Compiler" in the early days. Over the years, it has been continuously upgraded to support a number of backend machines. Similarly, on the front end side, it has grown to support a number of front end languages like C++, Objective C, Java, and Fortran to name a few. As a consequence, it has been renamed as "GNU Compiler Collection". The current release can be downloaded from the official GCC website. Although it follows an open development model whereby its source is available for all for inspection and modification, the GCC Steering Committee now guides the development of GCC.

Technically, GCC is a compiler generation framework which generates production quality optimizing compilers from descriptions of target platforms. It supports a wide variety of source languages and target machines (including operating system specific variants) in a ready-to-deploy form. Besides, new machines can be added by describing instruction set architctures and some other information (eg. calling conventions).

On the one hand, the availability of the source code has encouraged numerous people to contribute to GCC and to produce stable and reliable compilers for a wide variety of machines. On the other hand GCC has become a "hacker's paradise" as a consequence of multifarious influences. We have been working on GCC internals for some time. We share our insights and make our expertise available via this site.

Finally, as a free software with significant size and complexity, GCC is a challenge. For example, as of 2008, GCC supports six front end languages, and over 30 backend machines. This results in a quite huge code base, and GCC has earned a reputation of being one of the most complex and major free software/open source projects. A current version, 4.2.3, is 365 Mb big! A rough line count of all the C source files (including header files) of just the compiler code (i.e. only the "gcc" directory) with the major block comments removed is about 1440336 lines. That's over a million lines of pure code!   And that does not include the code that describes the supported back end machines, as well as code for other purposes like the build system, libraries etc. Understanding such a software requires significant documentation of the internals at a sufficient high level, and then details.

Last updated: July 27, 2008

Valid HTML 4.0 Strict Valid CSS!
| Email: uday[AT]cse[AT]iitb[DOT]ac[DOT]in | © 2008, GCC Resource Center | Last Site Update: July 27, 2008 |