The Wayback Machine - https://web.archive.org/all/20080729145050/http://googleblog.blogspot.com/
The Official Google Blog - Insights from Googlers into our products, technology and the Google culture

Goodbye to Randy Pausch, a great teacher

7/26/2008 10:25:00 AM
Randy Pausch, a professor of computer science at Carnegie Mellon University and a good friend of Google, passed away last night. In addition to being recognized as a pioneer in virtual reality research, he became widely known as a gifted teacher and a mentor to many. Millions of people saw his inspiring "Last Lecture" on YouTube. Read more about Randy and his contributions on our Research Blog.

Ragogmakan (Google) goes to the Amazon

7/25/2008 05:22:00 PM
Last month, a group of Googlers traveled to Brazil, to conduct our first-ever project in the Amazon. Organized by our Google Earth Outreach team, we went at the special invitation of Amazon Chief Almir Naramayoga Surui, who'd invited us down to train his people on using Google Earth, YouTube, blogs and other Internet tools in order to preserve their history and culture, protect their rainforest, and create a sustainable future for their tribe.

This was an unusual request, especially because until recently, the Surui Indians used stone tools and hunted and fished with bows and arrows. But as we considered this request, we realized that it was very much within the mission of Google Earth Outreach, which helps people around the world learn how to use Google Earth and Maps for public benefit. We had previously collaborated with the U.S. Holocaust Memorial Museum to map destroyed villages in Darfur, with UNHCR to show "A Refugee's Life", with Appalachian Voices to illustrate mountaintop removal coal-mining, and with the Jane Goodall Institute to follow chimpanzees in Tanzania. Maybe, we thought, it was time to go to the Amazon.

"New Technologies and Indigenous Peoples" - the logo
created by the Surui for our partnership


We learned from Chief Almir that just as the Amazon rainforest is disappearing at an alarming rate, so too are the indigenous peoples who live there. This loss of biological and cultural diversity, of natural resources, habitats and human beings, has profound consequences both locally and globally. Al Gore has called the Amazon rainforest "the lungs of the planet" for the vital role it plays in consuming carbon dioxide and producing oxygen for all of us to breathe. Chief Almir explained that his tribe had already begun replanting thousands of hectares of their forest which had been illegally logged by outsiders. He hopes that through this project, they will be able to participate in the emerging carbon offset marketplace. And he wants to use Google Earth, YouTube and blogs to give the world a virtual tour of these projects, to raise awareness, and educate other tribes in how to do the same thing.

So we spent several months preparing special training materials. We partnered closely with the Amazon Conservation Team, who'd previously taught the Surui how to GPS-locate their significant sites that the Surui now wanted to map in full 3D, in Google Earth. Along the way, we found that many people asked us these questions: "So why is Google going to the Amazon?" "Why are you trying to train Indians?" "Won't technology harm their culture?" "Are Amazon Indians even capable of learning to use the Internet?"

Without giving away too much of the story, the answer to the last question is YES. During the trainings, we were moved to see how committed the young Surui students were to learning everything they possibly could. Their first two web searches were "Povos Indigenas do Brasil" (Indigenous peoples of Brazil) and "Desmatamento Amazonia" (Deforestation of the Amazon). They succeeded in importing their cultural map into Google Earth (see image), as the starting point for their virtual tour. They showed their warrior spirit in their very first YouTube video. They began building a Google Site. All of these are now works in progress, and when they are ready to release to the world, we expect that they will be unlike anything anyone has seen before.

The Surui call Google "ragogmakan", or "messenger", because they are using our tools to get their message out. Although we traveled to the Amazon rain forest expecting to be the teachers, there are lessons for all of us in the story of the Surui. As they engage with the modern world, they are making choices about what to adopt, adapt or reject. If we pay attention, we may have as much to learn from them as they from us.

Read more on the Lat Long blog, and experience the story of our trip on the Google Earth Outreach site.

Surui cultural map

We knew the web was big...

7/25/2008 10:12:00 AM
We've known it for a long time: the web is big. The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, we've seen a lot of big numbers about how much content is really out there. Recently, even our search engineers stopped in awe about just how big the web is these days -- when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!

How do we find all those pages? We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links. In fact, we found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.

So how many unique pages does the web really contain? We don't know; we don't have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page. We're not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer.

We don't index every one of those trillion pages -- many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers. But we're proud to have the most comprehensive index of any search engine, and our goal always has been to index all the world's data.

To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google's index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

As you can see, our distributed infrastructure allows applications to efficiently traverse a link graph with many trillions of connections, or quickly sort petabytes of data, just to prepare to answer the most important question: your next Google search.

Knol is open to everyone

7/23/2008 10:31:00 AM
A few months ago we announced that we were testing a new product called Knol. Knols are authoritative articles about specific topics, written by people who know about those subjects. Today, we're making Knol available to everyone.

The web contains vast amounts of information, but not everything worth knowing is on the web. An enormous amount of information resides in people's heads: millions of people know useful things and billions more could benefit from that knowledge. Knol will encourage these people to contribute their knowledge online and make it accessible to everyone.

The key principle behind Knol is authorship. Every knol will have an author (or group of authors) who put their name behind their content. It's their knol, their voice, their opinion. We expect that there will be multiple knols on the same subject, and we think that is good.

With Knol, we are introducing a new method for authors to work together that we call "moderated collaboration." With this feature, any reader can make suggested edits to a knol which the author may then choose to accept, reject, or modify before these contributions become visible to the public. This allows authors to accept suggestions from everyone in the world while remaining in control of their content. After all, their name is associated with it!

Knols include strong community tools which allow for many modes of interaction between readers and authors. People can submit comments, rate, or write a review of a knol. At the discretion of the author, a knol may include ads from our AdSense program. If an author chooses to include ads, Google will provide the author with a revenue share from the proceeds of those ad placements.

We are happy to announce an agreement with the New Yorker magazine which allows any author to add one cartoon per knol from the New Yorker's extensive cartoon repository. Cartoons are an effective (and fun) way to make your point, even on the most serious topics.

Everyone knows something. See what people are writing about, then tell the world what you know: knol.google.com

Hitting 40 languages

7/18/2008 07:01:00 AM
One of our goals is to give everyone using Google the information they want, wherever they are, in whatever language they speak, and through whatever device they're using. A huge part of that goal is making our services available in as many languages as possible. And as I’m sure you can imagine, that isn't as easy as simply as translating a few lines of text.

Take Hebrew or Arabic, which are written from right to left. An Arabic speaker may search for [world cup football 2008] [كأس العالم 2008 لكرة القدم]. Part of the query will be written from right to left in Arabic, while the numbers will be written left to right. Sometimes the right-to-left difference can mean having to change the entire layout of a page, as with Gmail.

Or take Russian, where words change depending on their placement and role in a sentence. In Russian, for example [pizza in Moscow] is [пицца в Москве] but [pizza near Moscow] is [пицца рядом с Москвой].

Then there's the whole challenge of ensuring that results are locally relevant. While many Australians searching for [freedom] are looking for the Australian furniture chain, UK and US users are often looking for the definition of the word itself. Our search results, then, have to take into account these local differences.

Our efforts to make Google products available in as many languages as possible dates to 2001, when we started Google in Your Language, which lets volunteers translate and edit translations of Google products in their native languages.

As more and more users, advertisers, and partners interact with Google across the world, the need for local products has become even more obvious. In 2007, we undertook a company-wide initiative to increase the availability of our products in multiple languages. We picked the 40 languages read by over 98% of Internet users and got going, relying heavily on open source libraries such as ICU and other internationalization technologies to design products. Do you need web search in Chinese or AdWords online support in Spanish? Perhaps Google News in Hindi or Google Scholar in Korean? Not a problem.

Here's a taste of how far we've come.

Growth in local language versions.
  • 30 in 30: Today we have more than 30 products in more than 30 languages, up from 5 products in 30 languages just a year ago.
  • In 2004, we had 150 local-language versions of various products (e.g. a product local to the UK, not just the English-speaking world); today we're at more than 1500.
  • From January to March of 2008, we launched 256 local-language versions of various products, compared to 55 in the same period of 2007.
  • We've upgraded to Unicode 5.1 to make sure that we can handle any characters people read or write in.
The web is only useful - or utile, 便利, pożyteczny, or nyttig, depending on what language you speak - to the degree it can be accessible in your language. That's why we're so excited about how far we've come - and why we know there's still a lot of work to be done.

Celebrating young computer scientists

7/17/2008 01:30:00 PM
Last week, the ten grand prize winners for the first Google Highly Open Participation Contest, our initiative to get pre-university students involved in open source development. We were very excited to welcome these burgeoning computer scientists and their families to Silicon Valley in a celebration of their many accomplishments.


Our grand prize winners and the Open Source team

Chosen from more than 350 students worldwide, our winners created software, documentation and marketing materials for ten different open source projects, getting all this work accomplished in just over two months. For more details, including interviews with the winners and their mentors, check out the Google Open Source Blog.

Introducing our European 2008 Anita Borg Scholars

7/17/2008 10:25:00 AM


A few months ago we had the great pleasure of announcing the fifth class of Anita Borg Scholars in the U.S. and our first class of Scholars in Canada. Now it's the Europeans' turn.

This scholarship program, originally established in the U.S. to honor the work of Anita Borg and to recognize outstanding young women scholars in computer science and related fields, expanded to Europe most recently. Nearly 300 undergraduate and graduate students from more than 31 countries applied for the award. Sixty-three finalists were selected; 20 women received a €5,000 scholarship for the 2008-2009 academic year. The remaining 43 finalists received a €1,000 award.

Each of the finalists visited our Engineering Centre in Zurich for our annual Scholars' Retreat, which included tech talks, career panels and social fun. All of it was a way for the young women to share experiences and come together as leaders in the computer science field.

Visit the Google Europe Anita Borg Scholarship page for more on the program. Hearty congratulations to these winners!

The 2008 Europe Anita Borg Scholars
  • Cynthia Liem, Delft University of Technology, The Netherlands
  • Despina Michael - University of Cyprus, Cyprus
  • Dina Petri - University of Reading, UK; Aristotle University, Greece; Universidad Carlos III de Madrid, Spain
  • Inbal Talgam -Weizmann Institute of Science, Israel
  • Katy Howland - University of Sussex, UK
  • Kerstin Wendt - Universitat Autònoma de Barcelona, Spain
  • Ksenia Rogova - Petrozavodsk State University, Russia
  • Mirela Ben-Chen - Technion - Israel Institute of Technology, Israel
  • Nadezhda Baldina - Moscow Institute of Electronic Technology, Russia
  • Olga Boronenko - University of Reading, UK; Aristotle University, Greece; Universidad Carlos III de Madrid, Spain
  • Patricia Moore - Dublin City University, Ireland
  • Rebecca Stewart - Queen Mary, University of London, UK
  • Sara Elisabeth Adams - University of Oxford, UK
  • Seda Gürses - Katholieke Universiteit Leuven, Belgium
  • Silvia Breu - University of Cambridge, UK
  • Siska Fitrianie - Delft University of Technology, The Netherlands
  • Stefanie Jegelka - Max Planck Institute for Biological Cybernetics, Tuebingen, Germany
  • Svetlana Obraztsova - Steklov Institute of Mathematics, Russia
  • Sylvia Rueda - University of Nottingham, UK
  • Ulyana Tikhonova - Saint Petersburg State Polytechnical University, Russia

Update: Added photo.

Templates bring Docs to life

7/16/2008 05:37:00 PM
What do wedding planners, gas mileage calculators and photo albums have in common? They're all examples of templates available in the Google Docs Template Gallery that Sarah Beth Eisinger (Docs Templates engineer), Grant Dasher (intern), and I built and (happily!) released today.

When researching how people use templates, we saw that lots of you create documents for all aspects of your lives. You need resumes and cover letters to look for jobs and fax cover letters and invoices to run your businesses. And of course you want to use documents in fun ways with family and friends, such as unique designs and layouts for invitation cards and calendars. Finally, everyone wants to be able to have tools that "just work": print mailing labels, track portfolio values, and manage projects without having to painstakingly create documents from scratch.

These needs inspired our new templates and gallery. We developed these in conjunction with Avery Dennison, Vertex42.com, TemplateZone, and Visa Business.

Many templates leverage the collaborative aspect of Google Docs so that several people can work on a single document online without having to email attachments back and forth. To hear the story behind two templates, watch these videos:





To get started, go directly to the template gallery or access it from the "New" menu in your document list. Templates are currently available only in English, but other languages are coming soon. They're also available to Google Apps users.




(Cross-posted on the Google Docs Blog.)

Technologies behind Google ranking

7/16/2008 10:53:00 AM
In my previous post, I introduced the philosophies behind Google ranking. As part of our effort to discuss search quality, I want to tell you more about the technologies behind our ranking. The core technology in our ranking system comes from the academic field of Information Retrieval (IR). The IR community has studied search for almost 50 years. It uses statistical signals of word salience, like word frequency, to rank pages. (See "Modern Information Retrieval: A Brief Overview" for a quick overview of IR technology.) IR gave us a solid foundation, and we have built a tremendous system on top using links, page structure, and many other such innovations.

Search in the last decade has moved from give me what I said to give me what I want. User expectations from search have rightly increased. We work hard to fulfill the expectations of each and every user, and to do that we need to better understand the pages, the queries, and our users. Over the last decade we have pushed the technologies for understanding these three components (of the search process) to completely new dimensions.

When we talk about queries at Google, we use square brackets [ ] to mark the beginning and end of queries (see "How to write queries" by Matt Cutts), a notation I will use throughout this post. (Pages and search results change frequently, so in time, some examples used here may not behave as explained.)
  • Understanding pages: Over years we have invested heavily in our crawl and indexing system. As a result we have a very large and very fresh index. In addition to size and freshness, we have improved our index in other ways. One of the key technologies we have developed to understand pages is associating important concepts to a page even when they are not obvious on the page. We find the official homepage for Sprovieri Gallery in London for the Italian query [galleria sprovieri londra], even though the official page does not have either London or Londra on it. In the U.S., a user searching for [cool tech pc vancouver, wa] finds the homepage www.cooltechpc.com even though the page does not mention anywhere that they are in Vancouver, WA. Other technologies we have developed include distinctions between important and less important words in the page and the freshness of the information on the page.
  • Understanding queries: It is critical that we understand what our users are looking for (beyond just the few words in their query). We have made several notable advances in this area including a best-in-class spelling suggestion system, an advanced synonyms system, and a very strong concept analysis system.
Most users have used our spelling suggestion system at one time or another. It knows that someone searching for [kofee annan] is really searching for Mr. Kofi Annan, and is prompted: Did you mean: kofi annan; whereas someone searching for [kofee beans] is actually looking for coffee beans. Doing this internationally with very high accuracy is hard, and we do it well.

Synonyms are the foundation of our query understanding work. This is one of the hardest problems we are solving at Google. Though sometimes obvious to humans, it is an unsolved problem in automatic language processing. As a user, I don't want to think too much about what words I should use in my queries. Often I don't even know what the right words are. This is where our synonyms system comes into action. Our synonyms system can do sophisticated query modifications, e.g., it knows that the word 'Dr' in the query [Dr Zhivago] stands for Doctor whereas in [Rodeo Dr] it means Drive. A user looking for [back bumper repair] gets results about rear bumper repair. For [Ramstein ab], we automatically look for Ramstein Air Base; for the query query [b&b ab] we search for Bed and Breakfasts in Alberta, Canada. We have developed this level of query understanding for almost one hundred different languages, which is what I am truly proud of.

Another technology we use in our ranking system is concept identification. Identifying critical concepts in the query allows us to return much more relevant results. For example, our algorithms understand that in the query [new york times square church] the user is looking for the well-known church in Times Square and not for articles from the New York Times. We don't just stop at identifying concepts; we further enhance the query with the right concepts when, for instance, someone looking for [PC and its impact on people] is in fact looking for impact of computers on society, or someone who searches for [rainforest instructional activities for vocabulary] is really looking for rain forest lesson plans. Our query analysis algorithms have many such state-of-the-art techniques built into them, and once again, we do this internationally in almost every language we serve.
  • Understanding users: Our work on interpreting user intent is aimed at returning results people really want, not just what they said in their query. This work starts with a world class localization system, and adds to it our advanced personalization technology, and several other great strides we have made in interpreting user intent, e.g. Universal Search.
Our clear focus on "best locally relevant results served globally" is reflected in our work on localization. The same query typed in multiple countries may deserve completely different results. A user looking for [bank] in the US should get American banks, whereas a user in the UK is either looking for the Bank Fashion line or for British financial institutions. The results for this query should return local financial institutions in other English speaking countries like Australia, Canada, New Zealand, South Africa. The fun really starts when this query is typed in non-English-speaking countries like Egypt, Israel, Japan, Russia, Saudi Arabia, Switzerland. Likewise the query [football] refers to entirely different sports in Australia, the UK, and the US. These examples mostly show how we get the localized version of the same concept correctly (financial institution, sport, etc.). However, the same query can mean entirely different things in different countries. For example, [Côte d'Or] is a geographic region in France - but it is a large chocolate manufacturer in neighboring French-speaking Belgium; and yes, we get that right too :-).

Personalization is another strong feature in our search system which tailors search results to individual users. Users who are logged-in while searching and have signed up for Web History get results that are more relevant for them than the general Google results. For example, someone who does a lot football-related searches might get more football related results for [giants], while other users might get results related to the baseball team. Similarly, if you tend to prefer results from a particular shopping site, you will be more likely to get results from that site when you search for products. Our evaluation shows that users who get personalized results find them to be more relevant than non-personalized results.

Another case of user intent can be observed for the query [chevrolet magnum]. Magnum is actually made by Dodge and not Chevrolet. So we present the results for Dodge Magnum with the prompt See results for: dodge magnum in our result set.

Our work on Universal Search is another example of how we interpret user intent to give them what they (sometimes) really want. Someone searching for [bangalore] not only gets the important web pages, they also get a map, a video showing street life, traffic, etc. in Bangalore -- watching this video I almost feel I am there :-) -- and at the time of writing there is relevant news and relevant blogs about Bangalore.
Finally let me briefly mention the latest advance we have made in search: Cross Language Information Retrieval (CLIR). CLIR allows users to first discover information that is not in their language, and then using Google's translation technology, we make this information accessible. I call this advance: give me what I want in any language. A user looking for Tony Blair's biography in Russia who types the query in Russian [Тони Блэр биография] is prompted at the bottom of our results to search the English web with:
Similarly a user searching for Disney movie songs in Egypt with the query [أغاني أفلام ديزني] is prompted to search the English web. We are very excited about CLIR as it truly brings us closer to our mission to organize the world's information and make it universally accessible and useful.

I could go on and on showing examples of state-of-the-art technology that we have developed to make our ranking system as good as it is, but the fact is that search is nowhere close to being a solved problem. Many queries still don't get satisfactory results from Google, and each such query is an opportunity to improve our ranking system. I am confident that with numerous techniques under development in our group, we will make large improvements to our ranking algorithms in the near future.
I hope my two posts about Google ranking have made it clear that we live and breathe search, and we are more passionate than ever about it. Our fervor for serving all our users worldwide is unprecedented. We pride ourselves in running a very good ranking system, and are working incredibly hard every day to make it even better.

Students surf their way to success

7/16/2008 12:02:00 AM
Early on in my career at Google I was approached by a former professor of mine, Jamie Murphy, who was eager to give his students hands-on exposure to online marketing. Apart from delivering a great learning experience, Jamie wanted to make sure that his students would leave university with skills they could take directly into the workforce.

Together, Jamie and I recruited a panel of professors from all over the globe and came up with the Google Online Marketing Challenge. Student teams had to identify a local business with a website, but no experience of online marketing, and then were given free Google AdWords vouchers worth the equivalent of US$200. The student teams worked with the local business to set up an AdWords account and structure an online marketing campaign which would increase web traffic and sales for the local business. The teams had three weeks to run the campaign and had to submit their campaign report to a panel of international academic judges.

Today we're announcing the winners: an innovative team from the University of Western Australia who worked with an indoor rock-climbing school that scaled the heights and scooped the global prize. The winners will be whisked off to Mountain View, California for a tour of the Googleplex and meet with the creators of AdWords and other executives. To help them in their ongoing studies, each team member will also receive an Apple MacBook Pro.


L to R: Dr. Fang Liu, Glen Linthorne (from the partner business),
Victor Tsen (hanging), Amy Smith, Aaron Balm, and Lauren Bobridge. Absent: Anna Usikov.

There were also three regional winners, including students from Pennsylvania State University, who won in the fierce Americas field, while a team from the Universität Bern in Switzerland beat some impressive competition to win for Europe, the Middle East and Africa. The Asia Pacific winners came from the Australian Graduate School of Management with a skillful campaign for a small specialty cake business (actually based in California).

These four teams were clearly deserving winners, but the enthusiasm all the students and professors showed for the challenge was inspiring. We initially expected slightly more than 1,000 students to take part, and were thrilled when c. 8,500 students from 47+ countries put their marketing skills to the test.

The success of the challenge and the positive feedback we've had from both professors and students was more than we had hoped for. As Dr. Fang Liu, who taught the winning team, notes:

“The Online Marketing Challenge offers a great opportunity for students to develop their skills and experience in online marketing. Local businesses also benefit as the AdWords campaigns have helped promote their business to a wider community. I feel absolutely thrilled that one of my student teams is the global winner."
We're delighted to have worked with professors to find a fun and innovative way to introduce online marketing into the university curriculum. And we're happy to say the Challenge will carry on next year, and we hope it will go from strength to strength. Here are more details on the Challenge and our winners.