Return to Google homepage.
 Google Programming Contest
First Annual Google Programming Contest

In celebration of more than three years of delivering the best search experience on the Internet, Google is sponsoring the first annual Google Programming Contest.

Grand Prize

The Challenge

Google is providing a selection of about 900,000 web pages in pre-parsed and raw format, together with a "ripper" program that provides a framework for processing the pre-parsed data. Your mission is to write a program (most likely by adding code to the ripper) that does something interesting with the data, in such a way that it would scale to a web-sized collection of documents. Part of your job is to convince us of why your program is interesting and why it will scale; other than that, you're free to implement whatever strikes your fancy.

We suggest you fit your entry in one of two different tracks: Systems or Applications.

1. Systems

Entries in the Systems track generally pertain to infrastructure for handling the data, where typical goals are systems-related (i.e., speed/space properties). Some examples of possible projects include:

2. Applications

Entries in the Applications track generally deal with the semantics of the data. Some examples include:

The supplied repository is several orders of magnitude smaller than the ultimate target repository for the code, because of the limitations of the distribution media and the likely resource constraints of many entrants. Keep this in mind when designing your implementation. You should assume that your code will ultimately run on a collection of networked machines with a reasonable amount of memory (~2-4 gigabytes each), where the data is divided among them. You will probably need to combine partial results from each machine to form a single final result.

The limited size of the repository being distributed and the selection of documents may preclude certain interesting kinds of document processing. This repository includes a selection of HTML Web pages from 100 different sites in the "edu" domain.

How to Enter

Read the Contest Rules located at the bottom of this page. By participating in this contest, you agree to be bound by these Contest Rules.

To get started, download the following files from our web site:

If you have the disk space and bandwidth available, you can alternatively download a somewhat larger test data set:

The README file contains links to the site where you can download the full set of 900,000 web pages on which to run your final program.

When you get to the point of needing the full data set, if you are unable to download it you may request that we mail you the data on a set of five CDs. Note that CDs will not be mailed out until late February, so we strongly encourage you to start your development by downloading the code and sample data provided. E-mail your request for CDs, including a postal address, to

We provide source code in C++. You may also choose to write your code in Java or Python, in which case you are responsible for implementing any necessary interface code. Your submission must include a Makefile and README, and must compile on Linux 2.2 or 2.4 using g++ (for C++ code) or standard Sun tools (for Java code) or Python version 2.2. If your code depends on third-party packages, you must include a complete list of all packages, including exact version information and download URLs. Sorry, we cannot accept entries that require commercial software or other software that is not provided as open source or under GPL.

  • Entries from outside the U.S. are permitted. All entries must include an English-language explanation of the design. Entries must also include an argument that it will scale to 2 billion pages with reasonable runtime, as well as source code for the implementation. We strongly encourage you to include all data needed to support your claims, such as sample output from your program. Also, clear instructions and an easy to use demo program that allows experimentation with your system will help.
  • Your entries must also include the names, e-mail addresses, and brief resumes (including postal addresses and telephone numbers) of everyone who contributed to the project.
  • Entries must be submitted in machine-readable format (gzipped tar file) via e-mail to
  • Entries will be accepted through midnight (PST) April 30 2002.

You may submit multiple entries. Keep copies for your records. Google assumes no responsibility for lost, misdirected, illegible or late entries or for failed computer transmissions or technical failures.

Discussion Group

If you want to discuss ideas and problems related to the programming contest with other participants, visit the Google Groups programming contest newsgroup: google.public.programming-contest.


Winners will be selected by a panel of Google staff scientists. The judges will grade entries using the following criteria:

The judges shall have the sole authority and discretion to select the award recipient(s).

Contest Rules

To participate in the Google Programming Contest (the "Contest"), you must be at least 18 years old. The Contest is open to individuals or teams of up to 3 people, but not to corporate entries. Employees and contractors of Google, Inc. ("Google") and members of their immediate families are not eligible to enter. Void where prohibited.

With regard to the software and repository that you obtain for the Contest, you agree to the license terms as stated in files you download or receive. With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry.

If you are selected as a contest winner, you agree that Google may publicize your name, likeness, and the description of work you did to win the contest. Apart from the prizes associated with being selected as a winner, Google shall not be obligated to compensate you in any way for such publicity.

One $10,000 cash prize will be awarded to the winning entry. If the winning entry is submitted by more than one individual, the $10,000 cash prize will be divided equally among the participants who submit the winning entry. In addition, Google shall provide each member of the winning team a round trip ticket for a commercial carrier flight to the San Francisco Bay Area, and will reimburse each member of the winning team for up to 3 nights stay at a hotel to be designated by Google, Inc.

Each entrant shall indemnify, defend, and hold Google harmless from any third party claims arising from or related to that entrant's participation in the Contest. In no event shall Google be liable to an entrant for acts or omissions arising out of or related to the Contest or that entrant's participation in the Contest.

Odds of winning depend on the number and quality of entries received. All taxes, including income taxes, are the sole responsibility of winners. No prize substitution is permitted. Winner(s) may be required to verify their entry.


The winning entry will be announced on the site by Google Inc. on May 31, 2002. Following the announcement, individual winners will be notified by e-mail. Winners have 14 days from notification to claim the prize. Prize may be claimed by return e-mail. Unclaimed prizes will not be awarded.


Contact Google Inc. at

©2002 Google - Home - All About Google - Cool Jobs at Google