The Official Google Blog - Insights from Googlers into our products, technology and the Google culture
Showing posts with label search quality. Show all posts
Showing posts with label search quality. Show all posts

Giving you fresher, more recent search results

11/03/2011 08:19:00 AM
Search results, like warm cookies right out of the oven or cool refreshing fruit on a hot summer’s day, are best when they’re fresh. Even if you don’t specify it in your search, you probably want search results that are relevant and recent.

If I search for [olympics], I probably want information about next summer’s upcoming Olympics, not the 1900 Summer Olympics (the only time my favorite sport, cricket, was played). Google Search uses a freshness algorithm, designed to give you the most up-to-date results, so even when I just type [olympics] without specifying 2012, I still find what I’m looking for.

Given the incredibly fast pace at which information moves in today’s world, the most recent information can be from the last week, day or even minute, and depending on the search terms, the algorithm needs to be able to figure out if a result from a week ago about a TV show is recent, or if a result from a week ago about breaking news is too old.

We completed our Caffeine web indexing system last year, which allows us to crawl and index the web for fresh content quickly on an enormous scale. Building upon the momentum from Caffeine, today we’re making a significant improvement to our ranking algorithm that impacts roughly 35 percent of searches and better determines when to give you more up-to-date relevant results for these varying degrees of freshness.
  • Recent events or hot topics. For recent events or hot topics that begin trending on the web, you want to find the latest information immediately. Now when you search for current events like [occupy oakland protest], or for the latest news about the [nba lockout], you’ll see more high-quality pages that might only be minutes old. 
  • Regularly recurring events. Some events take place on a regularly recurring basis, such as annual conferences like [ICALP] or an event like the [presidential election]. Without specifying with your keywords, it’s implied that you expect to see the most recent event, and not one from 50 years ago. There are also things that recur more frequently, so now when you’re searching for the latest [NFL scores], [dancing with the stars] results or [exxon earnings], you’ll see the latest information. 
  • Frequent updates. There are also searches for information that changes often, but isn’t really a hot topic or a recurring event. For example, if you’re researching the [best slr cameras], or you’re in the market for a new car and want [subaru impreza reviews], you probably want the most up to date information. 
There are plenty of cases where results that are a few years old might still be useful for you. [fast tomato sauce recipe] certainly saved me after a call from my wife reminded me I had volunteered to make dinner! On the other hand, when I search for the [49ers score], a result that is a week old might be too old.

Different searches have different freshness needs. This algorithmic improvement is designed to better understand how to differentiate between these kinds of searches and the level of freshness you need, and make sure you get the most up to the minute answers.

Update 11/7/11: To clarify, when we say this algorithm impacted 35% of searches, we mean at least one result on the page was affected, as opposed to when we've said noticeably impacted in the past, which means changes that are significant enough that an average user would notice. Using that same scale, this change noticeably impacts 6 - 10% of searches, depending on the language and domain you're searching on.



(Cross-posted on the Inside Search blog)

Another look under the hood of search

8/25/2011 10:04:00 AM
(Cross-posted on the Inside Search blog and the Public Policy blog)

Over the past few years, we’ve released a series of blog posts to share the methodology and process behind our search ranking, evaluation and algorithmic changes. Just last month, Ben Gomes, Matt Cutts and I participated in a Churchill Club event where we discussed how search works and where we believe it’s headed in the future.

Beyond our talk and various blog posts, we wanted to give people an even deeper look inside search, so we put together a short video that gives you a sense of the work that goes into the changes and improvements we make to Google almost every day. While an improvement to the algorithm may start with a creative idea, it always goes through a process of rigorous scientific testing. Simply put: if the data from our experiments doesn’t show that we’re helping users, we won’t launch the change.



In the world of search, we’re always striving to deliver the answers you’re looking for. After all, we know you have a choice of a search engine every time you open a browser. As the Internet becomes bigger, richer and more interactive it means that we have to work that much harder to ensure we’re unearthing and displaying the best results for you.

Inside Google's search office

8/05/2011 02:11:00 PM
(Cross-posted on the Inside Search Blog)

I’ve been working with Matt Cutts and Ben Gomes in the same office for over 10 years. We work on search every day, and earlier this week, we took our office talk to the stage at an event hosted by the Churchill Club. Search Engine Land’s Danny Sullivan moderated our in-depth discussion on search, how it works, and what’s ahead for us in the future. We also reminisced about first joining Google, the time my car ran out of gas as Ben and I discussed a change to the algorithm, and other great memories over the years.

Come sit inside our office for a chat about Google Search:


  • To hear more about the principles that drive changes to the algorithm and how these changes are tested and implemented, go to 15:40
  • To hear the discussion on why we don’t hand-pick results, start watching at 41:04
  • For more on my vision for the future of search, jump to 1:12:28
  • Guess who Danny thinks is the brains, looks, and brawn of this operation at 1:08 (hint: I’m the brains).

Google Commerce Search 3.0: You won’t believe it’s online shopping

3/29/2011 08:00:00 AM
When we first introduced Google Commerce Search—our search solution for e-commerce websites—our focus was on improving search quality and speed to help online shoppers find what they’re looking for. Retailers such as Woodcraft Supply, BabyAge.com and HealthWarehouse.com implemented Google Commerce Search on their respective websites; Woodcraft increased search revenues 34 percent, BabyAge increased site searches 64 percent and HealthWarehouse saw online conversions increase 19 percent—and all have reported an increase in customer satisfaction.

Today we’re building on the capabilities that have proved useful to our retail partners with the third-generation Google Commerce Search (GCS). With this new version, we hope to help create an even more interactive and engaging experience for shoppers and retailers.



Here are some of the cool new features in GCS 3.0:
  • Search as You Type provides instant gratification to shoppers, returning product results with every keystroke, right from the search bar
  • Local Product Availability helps retailers bridge online and offline sales by showing shoppers when a product is also available in a store nearby—in-line with the search results
  • Enhanced Merchandising tools allow retailers to create product promotions that display in banners alongside related search queries, and to easily set query-based landing pages (for example, when a visitor types [shoes], they’re directed to a “shoe” page)
  • Product Recommendations (Labs) helps shoppers make purchase decisions by showing them what others viewed and ultimately bought

Search As You Type on www.babyage.com

With this release we're also welcoming three new retail partners: Forever21, General Nutrition Company (GNC) and L’Occitane. GNC implemented Google Commerce Search in less than a week on their mobile website, while Forever 21 and L’Occitane are currently working to implement various new features of GCS, such as Search as You Type and Local Product Availability. Here’s what Christine Burke, VP of International E-Commerce at cosmetics staple L’Occitane had to say about GCS 3.0:
L’Occitane is unique in that our beauty products center around ingredients—such as lavender, shea butter and verbena. As our customers visit our re-designed website to shop and research our products, we’re excited about the speed and accuracy of on-site search results that will be provided to us through Google Commerce Search. We’re also very excited about the possibility of the new local inventory feature, which can help us connect our customers with their favorite products in one of our 170 U.S. boutiques.
For more information, visit google.com/commercesearch.

Hide sites to find more of what you want

3/10/2011 11:00:00 AM
Over the years we’ve experimented with a number of ways to help you personalize the results you find on Google, from SearchWiki to stars in search to location settings. Now there’s yet another way to find more of what you want on Google by blocking the sites you don’t want to see.

You’ve probably had the experience where you’ve clicked a result and it wasn’t quite what you were looking for. Many times you’ll head right back to Google. Perhaps the result just wasn’t quite right, but sometimes you may dislike the site in general, whether it’s offensive, pornographic or of generally low quality. For times like these, you’ll start seeing a new option to block particular domains from your future search results. Now when you click a result and then return to Google, you’ll find a new link next to “Cached” that reads “Block all example.com results.”


As always, Matt’s been gracious enough to let us use him as an example. His site is awesome, though, and we doubt many people will want to block it!

Once you click the link to “Block all example.com results” you’ll get a confirmation message, as well as the option to undo your choice. You’ll see the link whether or not you’re signed in, but the domains you block are connected with your Google Account, so you’ll need to sign in before you can confirm a block.


Once you’ve blocked a domain, you won’t see it in your future search results. (Side note: Sometimes you may have to search on a new term, rather than simply refreshing your browser, before you'll notice the domain has been successfully removed.) The next time you’re searching and a blocked page would have appeared, you’ll see a message telling you results have been blocked, making it easy to manage your personal list of blocked sites. This message will appear at the top or bottom of the results page depending on the relevance of the blocked pages.


You can see a list of your blocked sites in a new settings page, which you can access by visiting your Search Settings or clicking on the “Manage blocked sites” link that appears when you block a domain. On the settings page you can find details about the sites you’ve blocked, block new sites, or unblock sites if you’ve changed your mind.


We’re adding this feature because we believe giving you control over the results you find will provide an even more personalized and enjoyable experience on Google. In addition, while we’re not currently using the domains people block as a signal in ranking, we’ll look at the data and see whether it would be useful as we continue to evaluate and improve our search results in the future. The new feature is rolling out today and tomorrow on google.com in English for people using Chrome 9+, IE8+ and Firefox 3.5+, and we’ll be expanding to new regions, languages and browsers soon. We hope you find it useful, and we’ll be listening closely to your suggestions.

Finding more high-quality sites in search

2/24/2011 06:50:00 PM

Our goal is simple: to give people the most relevant answers to their queries as quickly as possible. This requires constant tuning of our algorithms, as new content—both good and bad—comes online all the time.

Many of the changes we make are so subtle that very few people notice them. But in the last day or so we launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries—and we wanted to let people know what’s going on. This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.

We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.

It’s worth noting that this update does not rely on the feedback we’ve received from the Personal Blocklist Chrome extension, which we launched last week. However, we did compare the Blocklist data we gathered with the sites identified by our algorithm, and we were very pleased that the preferences our users expressed by using the extension are well represented. If you take the top several dozen or so most-blocked domains from the Chrome extension, then this algorithmic change addresses 84% of them, which is strong independent confirmation of the user benefits.

So, we’re very excited about this new ranking improvement because we believe it’s a big step in the right direction of helping people find ever higher quality in our results. We’ve been tackling these issues for more than a year, and working on this specific change for the past few months. And we’re working on many more updates that we believe will substantially improve the quality of the pages in our results.

To start with, we’re launching this change in the U.S. only; we plan to roll it out elsewhere over time. We’ll keep you posted as we roll this and other changes out, and as always please keep giving us feedback about the quality of our results because it really helps us to improve Google Search.

Update April 11: We’ve rolled out this algorithmic change globally to all English-language Google users and incorporated new signals as we iterate and improve. We’ll continue testing and refining the change before expanding to additional languages. You can learn more on our Webmaster Central Blog.

New Chrome extension: block sites from Google’s web search results

2/14/2011 12:00:00 PM
(Cross-posted on the Google Chrome Blog)

We’ve been exploring different algorithms to detect content farms, which are sites with shallow or low-quality content. One of the signals we're exploring is explicit feedback from users. To that end, today we’re launching an early, experimental Chrome extension so people can block sites from their web search results. If installed, the extension also sends blocked site information to Google, and we will study the resulting feedback and explore using it as a potential ranking signal for our search results.

You can download the extension and start blocking sites now. It looks like this:


When you block a site with the extension, you won’t see results from that domain again in your Google search results. You can always revoke a blocked site at the bottom of the search results, so it's easy to undo blocks:


You can also edit your list of blocked sites by clicking on the extension's icon in the top right of the Chrome window.


This is an early test, but the extension is available in English, French, German, Italian, Portuguese, Russian, Spanish and Turkish. We hope this extension improves your search experience, and thanks in advance for participating in this experiment. If you’re a tech-savvy Chrome user, please download and try the Personal Blocklist extension today.

Microsoft’s Bing uses Google search results—and denies it

2/01/2011 02:56:00 PM
By now, you may have read Danny Sullivan’s recent post: “Google: Bing is Cheating, Copying Our Search Results” and heard Microsoft’s response, “We do not copy Google's results.” However you define copying, the bottom line is, these Bing results came directly from Google.

I’d like to give you some background and details of our experiments that lead us to understand just how Bing is using Google web search results.

It all started with tarsorrhaphy. Really. As it happens, tarsorrhaphy is a rare surgical procedure on eyelids. And in the summer of 2010, we were looking at the search results for an unusual misspelled query [torsorophy]. Google returned the correct spelling—tarsorrhaphy—along with results for the corrected query. At that time, Bing had no results for the misspelling. Later in the summer, Bing started returning our first result to their users without offering the spell correction (see screenshots below). This was very strange. How could they return our first result to their users without the correct spelling? Had they known the correct spelling, they could have returned several more relevant results for the corrected query.



This example opened our eyes, and over the next few months we noticed that URLs from Google search results would later appear in Bing with increasing frequency for all kinds of queries: popular queries, rare or unusual queries and misspelled queries. Even search results that we would consider mistakes of our algorithms started showing up on Bing.

We couldn’t shake the feeling that something was going on, and our suspicions became much stronger in late October 2010 when we noticed a significant increase in how often Google’s top search result appeared at the top of Bing’s ranking for a variety of queries. This statistical pattern was too striking to ignore. To test our hypothesis, we needed an experiment to determine whether Microsoft was really using Google’s search results in Bing’s ranking.

We created about 100 “synthetic queries”—queries that you would never expect a user to type, such as [hiybbprqag]. As a one-time experiment, for each synthetic query we inserted as Google’s top result a unique (real) webpage which had nothing to do with the query. Below is an example:


To be clear, the synthetic query had no relationship with the inserted result we chose—the query didn’t appear on the webpage, and there were no links to the webpage with that query phrase. In other words, there was absolutely no reason for any search engine to return that webpage for that synthetic query. You can think of the synthetic queries with inserted results as the search engine equivalent of marked bills in a bank.

We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.

We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted. We were surprised that within a couple weeks of starting this experiment, our inserted results started appearing in Bing. Below is an example: a search for [hiybbprqag] on Bing returned a page about seating at a theater in Los Angeles. As far as we know, the only connection between the query and result is Google’s result page (shown above).


We saw this happen for multiple queries. For the query [delhipublicschool40 chdjob] we inserted a search result for a credit union:


The same credit union soon showed up on Bing for that query:


For the query [juegosdeben1ogrande] we inserted a page of hip hop bling jewelry:


And the same hip hop bling page showed up in Bing:


As we see it, this experiment confirms our suspicion that Bing is using some combination of:
or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

At Google we strongly believe in innovation and are proud of our search quality. We’ve invested thousands of person-years into developing our search algorithms because we want our users to get the right answer every time they search, and that’s not easy. We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor. So to all the users out there looking for the most authentic, relevant search results, we encourage you to come directly to Google. And to those who have asked what we want out of all this, the answer is simple: we'd like for this practice to stop.

Google search and search engine spam

1/21/2011 09:00:00 AM
January brought a spate of stories about Google’s search quality. Reading through some of these recent articles, you might ask whether our search quality has gotten worse. The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English. However, we have seen a slight uptick of spam in recent months, and while we’ve already made progress, we have new efforts underway to continue to improve our search quality.

Just as a reminder, webspam is junk you see in search results when websites try to cheat their way into higher positions in search results or otherwise violate search engine quality guidelines. A decade ago, the spam situation was so bad that search engines would regularly return off-topic webspam for many different searches. For the most part, Google has successfully beaten back that type of “pure webspam”—even while some spammers resort to sneakier or even illegal tactics such as hacking websites.

As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites.

As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.

One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
  • Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
  • Displaying Google ads does not help a site’s rankings in Google; and
  • Buying Google ads does not increase a site’s rankings in Google’s search results.
These principles have always applied, but it’s important to affirm they still hold true.

People care enough about Google to tell us—sometimes passionately—what they want to see improved. We deeply appreciate this feedback. Combined with our own scientific evaluations, user feedback allows us to explore every opportunity for possible improvements. Please tell us how we can do a better job, and we’ll continue to work towards a better Google.

A recent improvement for Arabic searches

2/02/2010 12:28:00 PM
This post is the latest in an ongoing series about how we harness the data we collect to improve our products and services for our users. - Ed.

We've learned that when performing a search on Google, people sometimes forget to separate words with spaces. Moreover, people often mistakenly repeat a letter within a single word. For instance, when writing the query [amazingly beautiful poem], you might write it as [amazingly beautiifullpoem].

These types of errors are much more common in languages like Arabic, where most of the letters are cursive. That means that the shapes of the letters change, based on the position of the letter in the word (initial, middle, final or isolated). Moreover, some Arabic letters are considered word breaks, meaning that the following letter must be in an "initial" shape. In other words, if the last letter of one word is a word break, the following word may not be separated with a space.

For example, the queries [وزارةالتعليم] and [وزارة التعليم] have an identical meaning (Ministry of Education) and they're both written in a common form for Arabic documents. But they have different, albeit correct, formats — the first query is written as a single word, while the second is written as two. Google needs to understand that while they're written differently, they mean the same thing and should yield the exact same search results. In this example, both queries were written correctly, just in different formats. But sometimes people just make errors — like repeating the same letter twice. For example, you might write [راائعة الجماال], repeating the letter "ا" twice in both query words. In this case the correct spelling should be [رائعة الجمال]. It's important that Google search recognizes your query — despite spelling errors.

To address issues like this, we recently developed a search ranking improvement that targets certain Arabic queries. Our algorithm employs rules of Arabic spelling and grammar along with signals from historical search data to decide when to leave out spaces between words or when to remove unnecessarily repeated letters. Now, when you type a query leaving out spaces or repeating a letter, we'll return better results based not only on what you typed, but also on what our algorithm understands is the "correct" query. For example, here's what happens when you type [قصيدة راائعةالجماال] ([amazingly beautiful poem] in Arabic) with repeated letters and dropped spaces between words.


As you can see, the Google results contain the corrected query, the terms قصيدة رائعة الجمال, in bold.

For most people, this might seem like a small enhancement. But for us, it’s a big change. Our tests show we've improved search for 10% of Arabic language queries. Which, when you think about it, is a lot of people.

Understanding the web to make search more relevant

1/22/2010 10:30:00 AM
Last year at our second Searchology event, we announced Google Squared and Rich Snippets, two approaches to improve search by better understanding the web. Today, we're kicking off the new year with two improvements based on those technologies. First, we're applying the research behind Google Squared to add a new "answer-highlighting" feature to search, and second we're expanding Rich Snippets to include events.

Answer highlighting in search results

Most information on the web is unstructured. For example, blogs integrate paragraphs of text, videos and images in ways that don't follow simple rules. Product review sites each have their own formats, rating scales and categories. Unstructured data is difficult for a computer to interpret, which means that we humans still have to do a fair amount of work to synthesize and understand information on the web.

Google Squared is one of our early efforts to automatically identify and extract structured data from across the Internet. We've been making progress, and today the research behind Google Squared is, for the first time, making search better for everyone with a new feature called "answer highlighting."

Answer highlighting helps you get to information more quickly by seeking out and bolding the likely answer to your question right in search results. The feature is meant for searches with factual answers, such as [meet john doe director], [john lennon died], or [what was the political party of president ford]. If the pages returned for these queries contain a simple answer, the search snippet will more often include the relevant text and bold it for easy reference.

Consider the example, [empire state height]. The first search result used to look like this:

With today's improvements, the answer —1250 ft, or 381 m — is highlighted right in the search result:

This kind of quick answer only makes sense for certain kinds of searches. For example, the answer to [history of france] can't readily fit in a search snippet. However, for the kinds of information you can easily put in a table, we've been able to take what we've learned from Google Squared to make search better for a wide range of queries. Answer highlighting is rolling out during the next couple days on google.com in English.

Rich Snippets for events

Sometimes the easiest way to understand somebody is by having a conversation. The web is similar. As much as we're happy with the progress we're making with Google Squared, we also appreciate that a great way to understand web pages is to simply ask webmasters to teach us (and other search engines) about their content. To that end, we continue to make improvements to our search results with Rich Snippets, enabling webmasters to annotate pages with structured data in a standard format.

So far we've launched improved search result snippets for reviews and people. When your search results contain web pages with review information, you might see the number of user reviews on the page and the average rating in the search result. When your search contains a public profile page about a person from a social networking site, you may see the person's location and occupation, or a list of her friends.

Today, we're announcing support for a new Rich Snippets format for events. The new format improves search results by including links to specific event names, dates and locations. Here's an example of a new event result from livenation.com if you search for [irving plaza]:


The new result format provides a fast and convenient way to identify pages with events and click directly to the ones you find interesting. If you're into Hip Hop Karaoke, you can quickly find out when and where the next show is in Irving Plaza, and click for more info. We've been working with a few sites to ramp them up for our initial launch, but it will take time for other webmasters to start implementing the new markup. Check out our blog post on Webmaster Central for more details.

Helping computers understand language

1/19/2010 11:51:00 AM
This post is the latest in an ongoing series about how we harness the data we collect to improve our products and services for our users. - Ed.

An irony of computer science is that tasks humans struggle with can be performed easily by computer programs, but tasks humans can perform effortlessly remain difficult for computers. We can write a computer program to beat the very best human chess players, but we can't write a program to identify objects in a photo or understand a sentence with anywhere near the precision of even a child.

Enabling computers to understand language remains one of the hardest problems in artificial intelligence. The goal of a search engine is to return the best results for your search, and understanding language is crucial to returning the best results. A key part of this is our system for understanding synonyms.

What is a synonym? An obvious example is that "pictures" and "photos" mean the same thing in most circumstances. If you search for [pictures developed with coffee] to see how to develop photographs using coffee grinds as a developing agent, Google must understand that even if a page says "photos" and not "pictures," it's still relevant to the search. While even a small child can identify synonyms like pictures/photos, getting a computer program to understand synonyms is enormously difficult, and we're very proud of the system we've developed at Google.

Our synonyms system is the result of more than five years of research within our web search ranking team. We constantly monitor the quality of the system, but recently we made a special effort to analyze synonyms impact and quality. Most of the time, you probably don't notice when your search involves synonyms, because it happens behind the scenes. However, our measurements show that synonyms affect 70 percent of user searches across the more than 100 languages Google supports. We took a set of these queries and analyzed how precise the synonyms were, and were happy with the results: For every 50 queries where synonyms significantly improved the search results, we had only one truly bad synonym.

An example of a bad synonym from this analysis is in the search [dell system speaker driver precision 360], where Google thinks "pc" is a synonym for precision. Note that you can still see that on Google today, because while we know it's a bad synonym, we don't typically fix bad synonyms by hand. Instead, we try to discover general improvements to our algorithms to fix the problems. We hope it will be fixed automatically in some future changes.

We also recently made a change to how our synonyms are displayed. In our search result snippets, we bold the terms of your search. Historically, we have bolded synonyms such as stemming variants — like the word "picture" for a search with the word "pictures." Now, we've extended this to words that our algorithms very confidently think mean the same thing, even if they are spelled nothing like the original term. This helps you to understand why that result is shown, especially if it doesn't contain your original search term. In our [pictures developed with coffee] example, you can see that the first result has the word "photos" bolded in the title:


(Note that because our synonyms depend on the other words in your search and use many signals, you won't necessarily always see the word "photos" bolded for "pictures", only when our algorithms think it is useful and important to bold.)

We use many techniques to extract synonyms, that we've blogged about before. Our systems analyze petabytes of web documents and historical search data to build an intricate understanding of what words can mean in different contexts. In the above example "photos" was an obvious synonym for "pictures," but it's not always a good synonym. For example, it's important for us to recognize that in a search like [history of motion pictures], "motion pictures" means something special (movies), and "motion photos" doesn't make any sense. Another example is the term "GM." Most people know the most prominent meaning: "General Motors." For the search [gm cars], you can see that Google bolds the phrase "General Motors" in the search results. This is an indication that for that search we thought "General Motors" meant the same thing as "GM." Are there any other meanings? Many people can think of the second meaning, "genetically modified," which is bolded when GM is used in queries about crops and food, like in the search results for [gm wheat]. It turns out that there are more than 20 other possible meanings of the term "GM" that our synonyms system knows something about. GM can mean George Mason in [gm university], gamemaster in [gm screen star wars], Gangadhar Meher in [gm college], general manager in [nba gm] and even gunners mate in [navy gm].

Here are screenshots of those disambiguations of GM in action:


As a nomenclatural note, even obvious term variants like "pictures" (plural) and "picture" (singular) would be treated as different search terms by a dumb computer, so we also include these types of relationships within our umbrella of synonyms. Pictures/picture are typically called stemming variants, which refers to the fact that they share the same word stem, or root. The same systems that need to understand that "pictures" and "photos" mean the same thing also need to understand that "pictures" and "picture" mean the same thing. This is something that is even more obvious to a human but is also still a difficult task for a computer. An example of how this is difficult are the words "animal" and "animation," which share the same stem and etymology, but don't mean the same thing in standard use. Another tricky case that is very dependent on the other words in the query is "arm" vs. "arms." Arms might seem like the plural of arm, but consider how it might be used in a search: [arm reduction] vs. [arms reduction]. Google search is smart enough to know that the former is about removing fat from one's arm, and the latter is about reducing stockpiles of weaponry, and that arm/arms are dangerous synonyms in that case because they would change the meaning. These subtle differences between words that seem related is what makes synonymy very hard to get right.

Here are some other examples of synonyms we thought were interesting:

[song words], "lyrics" is bolded for "words".
[what state has the highest murder rate], "homicide" is bolded for "murder".
[himalayan kitten breeder], Google knows that "cat breeder" is the same as "kitten breeder".
[dura ace track bb axle njs], Google knows that "bb" here means "bottom bracket".
[software update on bb color id], "blackberry is bolded for "bb".
[bb cream dark], Google knows here that bb means "blemish balm".
[southeastern usa bb fitness & figure], "bodybuilding" is bolded for "bb."

Lastly, language is used with as much variety and subtlety as is present in human culture, and our algorithms still make mistakes. We flinch when we find such mistakes; we're always working to fix them. One of the best ways for us to discover these problems is to get feedback from real users, which we then use to inspire improvements to our computer programs. If you have specific complaints about our synonyms system, you can post a question at the web search help center forum or you can tweet them with the hash tag #googlesyns. You can also turn off a synonym for a specific term by adding a "+" before it or by putting the words in quotation marks.

Surfacing forum posts in search results

9/30/2009 07:17:00 PM
Today, we introduced a new search feature that makes it easier for you to find forum posts or discussions related to what you're searching for. This new addition to Google search results applies to sites that tend to have a large number of posts on a specific topic. When several different discussions on a site are relevant to your query, we indent them under the primary result and include the date of each post.

So for instance, if you search for [getting from rome to florence] you will see, below the third result, a list of relevant discussions on various ways to get between these cities.


It's always nice to know what others are saying about the best ways to get around (by boat or train) and how recent those comments are — especially if it's your first time traveling to Italy.

We hope this feature gives you a deeper view into the relevant content available on sites throughout the web — even when that content spans multiple pages or discussions.
At the same time, the main search results are diverse as always — so if you can't pinpoint a useful comment there's a list of relevant sites there to help.

Jump to the information you want right from the search snippets

9/25/2009 12:29:00 PM
For most search results, Google shows you a few lines of text to give you an idea of what the page is about — we call this a "search snippet." Recently, we've enhanced the search snippet with two new features that make it easier to find information buried deep within a page.

Normally, a search snippet shows how a page, as a whole, relates to a your query by excerpting content that appears near and around where your query terms show on the page. But what if only one section of the page is relevant to your search?

That's where these new features can help, by providing links within the snippet to relevant sections of the page, making it faster and easier to find what you're looking for. Imagine, for example, that we're researching trans fats and cholesterol, and their effects on the body. If we start with a generic query like [trans fats], Google returns several results with lots of information about trans fats in general, including this result from Wikipedia:

Now, included with the snippet are links to specific sections within the page, covering different subtopics of trans fats. Since we're particularly interested in what's healthy and what's not, "Nutritional guidelines" is probably where the most relevant information is. Clicking this link will take you directly to that section, midway down the page.

Now imagine we're particularly interested in learning about good cholesterol and what levels of it are healthy, so we try a more specific query, [good cholesterol level]. The top result is from the American Heart Association and has tons of information about cholesterol levels. The specific information about good (HDL) cholesterol, however, is contained in one section titled "Your HDL (good) cholesterol level"‎. Since the query was more specific, the snippet for this result now provides the option to "jump to" just this section of the website.


Clicking on "Jump to Your HDL (good) cholesterol level" takes you directly to the most relevant information on the page:


Clicking on the title of the snippet ("What Your Cholesterol Levels Mean") still takes you to the top of the page, as always.

If you're a webmaster and would like to have these links appear for your webpages, take a look at the Google Webmaster Central Blog for info on some of the things you can do. And in the meantime, we hope these enhancements help you find the information you're looking for faster.

Two new improvements to Google results pages

3/24/2009 07:18:00 AM
Today we're rolling out two new improvements to Google search. The first offers an expanded list of useful related searches and the second is the addition of longer search result descriptions -- both of which help guide users more effectively to the information they need.

More and better search refinements

Starting today, we're deploying a new technology that can better understand associations and concepts related to your search, and one of its first applications lets us offer you even more useful related searches (the terms found at the bottom, and sometimes at the top, of the search results page).

For example, if you search for [principles of physics], our algorithms understand that "angular momentum," "special relativity," "big bang" and "quantum mechanic" are related terms that could help you find what you need. Here's an example (click on the images in the post to view them larger):

Let's look at a couple of examples in other languages. In Russian, for the query [гадание на картах] (fortune-telling with cards), the algorithms find the related terms "таро" (tarot), "ленорман" (lenormand) and "тибетское гадание мо" (tibetan divination mo). In Italian, if you search for [surf alle canarie] (surf at the canary islands), we now offer suggestions based on the three most famous Canary Islands: "lanzarote," "gran canaria," and "fuerteventura":

We are now able to target more queries, more languages, and make our suggestions more relevant to what you actually need to know. Additionally, we're now offering refinements for longer queries — something that's usually a challenging task. You'll be able to see our new related searches starting today in 37 languages all around the world.

And speaking of long queries, that leads us to our next improvement...

Longer snippets

When you do a search on Google, each result we give you starts with a dark blue title and is followed by a few lines of text (what we call a "snippet"), which together give you an idea of what each page is about. To give more context, the snippet shows how the words of your query appear on the page by highlighting them in bold.

When you enter a longer query, with more than three words, regular-length snippets may not give you enough information and context. In these situations, we now increase the number of lines in the snippet to provide more information and show more of the words you typed in the context of the page. Below are a couple of examples.

Suppose you were looking for information about Earth's rotation around the sun, and specifically wanted to know about its tilt and distance from the sun. So you type all of that into Google: [earth's rotation axis tilt and distance from sun]. A normal-length snippet wouldn't be able to show you the context for all of those words, but with longer snippets you can be sure that the first result covers all those topics. In addition, the extra line of snippets for the third result shows the word "sun" in context, suggesting that the page doesn't talk about Earth's distance from the sun:

Similarly, if you're looking for a restaurant review that covers all the parts of the meal, longer snippets can help:

But don't just take our word for it — try it out yourself with your favorite long, detailed query.

These are just two recent examples of improvements we've made. We are constantly looking for ways to get you to the web page you want as quickly as possible. Even if you don't notice all of our changes, rest assured we're hard at work making sure you have the highest quality search experience possible.

Eye-tracking studies: more than meets the eye

2/06/2009 09:45:00 AM
Imagine that you need a refresher on how to tie a tie. So, you decide to type [how to tie a tie] into the Google search box. Which of these results would you choose?


Where did your eyes go first when you saw the results page? Did they go directly to the title of the first result? Did you first check the terms in boldface to see if the results really talk about tying a tie? Or maybe the images captured your attention and drew your eyes to them?

You might find it difficult to answer these questions. You probably did not pay attention to where you were looking on the page and you most likely only used a few seconds to visually scan the results. Our User Experience Research team has found that people evaluate the search results page so quickly that they make most of their decisions unconsciously. To help us get some insight into this split-second decision-making process, we use eye-tracking equipment in our usability labs. This lets us see how our study participants scan the search results page, and is the next best thing to actually being able to read their minds. Of course, eye-tracking does not really tell us what they are thinking, but it gives us a good idea of which parts of the page they are thinking about.

To see what the eye-tracking data we collect looks like, let's go back to the results page we got for the query [how to tie a tie]. The following video clip shows in real time how a participant in our study scanned the page. And yes, seriously — the video is in real time! That's how fast the eyes move when scanning a page. The larger the dot gets, the longer the users' eye pauses looking at that specific location.



Based on eye-tracking studies, we know that people tend to scan the search results in order. They start from the first result and continue down the list until they find a result they consider helpful and click it — or until they decide to refine their query. The heatmap below shows the activity of 34 usability study participants scanning a typical Google results page. The darker the pattern, the more time they spent looking at that part of the page. This pattern suggests that the order in which Google returned the results was successful; most users found what they were looking for among the first two results and they never needed to go further down the page.


When designing the user interface for Universal Search, the team wanted to incorporate thumbnail images to better represent certain kinds of results. For example, in the [how to tie a tie] example above, we have added thumbnails for Image and Video results. However, we were concerned that the thumbnail images might be distracting and disrupt the well-established order of result evaluation.

We ran a series of eye-tracking studies where we compared how users scan the search results pages with and without thumbnail images. Our studies showed that the thumbnails did not strongly affect the order of scanning the results and seemed to make it easier for the participants to find the result they wanted.

The thumbnail image seemed to make results with thumbnails easy to notice when the users wanted them (see screenshots below — page with the thumbnail image on the right)...

Click the images to  view them larger.

...and the thumbnails also seemed to make it easy for people to skip over the results with thumbnails when those results were not relevant to their search (page with the thumbnail on the right).


For the Universal Search team, this was a successful outcome. It showed that we had managed to design a subtle user interface that gives people helpful information without getting in the way of their primary task: finding relevant information.

In addition to search research, we also use eye-tracking to study the usability of other products, such as Google News and Image Search. For these products, eye-tracking helps us answer questions, such as "Is the 'Top Stories' link discoverable on the left of the Google News page?" or "How do the users typically scan the image results — in rows, in columns or in some other way?"

Eye-tracking gives us valuable information about our users' focus of attention — information that would be very hard to come by any other way and that we can use to improve the design of our products. However, in our ongoing quest to make our products more useful, usable, and enjoyable, we always complement our eye-tracking studies with other methods, such as interviews, field studies and live experiments.

Our international approach to search

11/21/2008 11:46:00 AM
In previous posts in this series, you have read about the challenges of building a world-class search engine. Our goal is to make Google’s search be relevant to all people, regardless of their language or country. As my colleague Amit Singhal described, we use statistical data as the basis for making sweeping algorithmic changes. Many of these changes can be rolled out across all languages we support, but in some cases the unique characteristics of each language require some algorithmic considerations and tuning. And to make things really interesting, there are cases where the same language is different across countries. Obvious examples are "color" in the U.S. vs. "colour" in the U.K., or "camião" in Portugal vs. "caminhão" in Brazil.

My name is Daphne Dembo, and my focus is improving Google's international search. This is a tough challenge, since Google search is used in many countries and languages where our engineers have little personal knowledge. Initially, the international search improvements were done by Search Quality engineers who were passionate about their languages and countries: Lina from Sweden improved our parsing of compound words in German and Swedish; Dimitra from Greece introduced diacritical support; Ishai from Israel worked on transliteration corrections for Hebrew and Arabic; Trystan from Australia created methods for identifying local search results and ranking them together with foreign ones from the same language; Alex, a bilingual Ukrainian and Russian, introduced morphological understanding of these languages. As the importance of our international search grew, we solicited help from Googlers in all our offices. Finally, we are leveraging an international network of search specialists who help us understand search within the unique combination of their language and country.

Our first step in providing search support for a language is to train our language model on a large collection of documents in that language. This ensures that our language model is more precise and comprehensive — for example, it incorporates names, idioms, colloquial usage, and newly coined words not often found in static dictionaries. For instance, we recently started identifying Swahili, and used pages such as this one for the Parliament of Tanzania to train our system with the language's nuances. Having a trained language model helps to categorize documents during crawling and indexing of the web and to parse the user's query. Once this stage was complete, we launched Swahili search in countries such as Tanzania and Kenya, enabling local searches for the "Dar es Salaam stock exchange" [Soko la hisa dar es salaam], and "cure for Malaria" [Tiba ya malaria]. (As always, we are using square brackets to denote a search query. For example, you can search for "soccer" in Hamburg, Germany by clicking on [fußball in hamburg]).

We learn some things from our users, so as people start using our search engine, we can improve the way we rank in that language. Here are few examples:
  • Spell corrections: We recently launched spell corrections in Estonian. If your Estonian is rusty, and you don't remember how to spell "smoke detector," we can suggest a spell correction for [suitsuantur], leading to better search results.
  • Diacritical marks: Many languages have diacritical marks, which alter pronunciation. Our algorithms are built to support them, and even help users who mis-type or completely ignore them. For example, if you're a resident of Quebec, Canada and would like to know the weather forecast in Quebec City, we'll serve good results whether you type with diacritical signs [Météo à Québec] or without [meteo quebec]. Czech users can read the same excellent results for a popular kids' cartoon by searching for [krtecek] and [krteček]. On the other hand, sometimes diacriticals change the meaning of the word and we have to use them correctly. For example, in Thai, [ข้าว] is "rice," with completely different results than [ข่าว], which is "news"; or in Slovakia, results for "child" [dieťa] are different than results for "diet" [diéta].
  • Synonyms: A general case of diacritical support is the handling of synonyms in different languages. Korean searches showed that "samsung" can be viewed as a synonym of "삼성", so that when users search for [samsung], they find results which have the company's name in Korean.
  • Compounding: Some languages allow compounding, which is the formation of new words by combining together existing words. You can see a nice example in Swedish, where we return documents about a Swedish credit card for both compounded [Visakort] and non-compounded [visa kort] queries.
  • Stemming: Google has developed morphological models that can receive compound words as queries, and return pages which contain their stem, possibly as part of a different compound. For example, when searching for cars in Saudi Arabia, you can search for [سيارة] and [سيارات] because both are variants of the same stem, and both return many common results. A Polish user can search for "movie" [film], and get back results that contain other variants of the stem, such as "filmów," "filmu," "filmie," "filmy." A user from Belarus will find results for all word forms of the capital, Minsk [Мінск]: "Мінску," "Мінска," "Мінскага."
In addition to these semantic factors, Google does even more to parse documents and queries. Understanding the details of language usage in a country is important. Notation of acronyms is different across languages: In Hebrew it is double quotes before the last (left-most) character, as in "prime minister" [רה"מ]; in Thai — a dot at the end of the word, as in police station [สน. ]; while in the U.S. — dots after each character, as in [I.B.M.]. Chinese users quote works of art with a "《", as in: [《手机》剧情], and denote dates with a "日", as in: [2006年1月13日].

Beyond the linguistic elements of a language, we consider how people enter a query. For example, some languages that do not have Latin scripts require keyboards with dual alphanumeric keys. The user can switch between language input modes by typing special keystrokes. In case the user forgets to type this sequence, the queries end up being gibberish. You can see correct handling of these mistakes in Arabic ([hgsuv] corrected to [السعر]) and ([حقثسهيثىفهشم ثممثؤفهخىس ] corrected to [presidential elections]), Hebrew ([vdrk, kuyu] corrected to [הגרלת לוטו]), and Cyrillic ([rehc ljkffhf] corrected to [курс доллара]).

Another way of avoiding the inconvenience of switching keyboard modes is by typing the phonetic sounds of the query in Latin characters. Recreating the correct query in the target language isn't trivial, since there might be many possibilities. We can see several such examples in which we suggest the same query in the intended language for Russian ([biskvitnyi rulet] to [бисквитный рулет]), "movies" in Chinese ([dianying] to [电影]), and "Bank of Attica" in Greek [trapeza attikhs] returns good results for "Τράπεζα Αττικής". Users of 8 Indic languages (such as Hindi, Gujarati, Telugu) can type the phonetic sound of the query, and choose the words in Hindi script:


Ease of typing and reading is also influenced by the language used. Since every Chinese word requires several keystrokes on a standard keyboard, we provide category browsing by Images and related searches so that people don't need to type as much. Similarly, we are now launching Google Suggest, or real-time completion of queries, in many languages.

So far I described how we improve the quality of search in a language. However, there is a strong effect of the location of the user, even if it is only approximated to the country, since in many cases local content is more relevant than global information. For example, searching for Spanish Yellow Pages [Páginas Amarillas] will result in several documents of global interest and several local results in Peru, Mexico, and Spain. Similar to that, searching for [Côte d'Or] in France will return results for that region, whereas searches in Belgium will return results about the chocolate maker.

Note that the display of information should conform to the standards in that country, so we display "," as a decimal notation for Croatian users who want to know how many millimeters are in an inch [inč u milimetrima], or for Italian users who are interested in currency exchange rates [50 euro in dollari]. Similarly, temperatures in Norway [Været i Oslo] will be displayed in Celsius, while in the U.S. — in Fahrenheit [weather Boston].

If everything else fails, we provide cross-language translations based upon Google's translation technology described in this blog post. We will translate your query to English, search English documents on the web, and translate the returned results from English back into the original query language. For example, Japanese users who are interested in viewing Halloween illustrations (Halloween is a holiday which originated in Ireland) can search for [ハロウィン イラスト]. You can then request a Japanese translation of the English pages (at the bottom of the page), which will bring up the translation page in the screenshot below. Similarly, Korean users can search for the latest on Harry Potter [해리 포터], and Arabic readers can search for the opening of the Sydney Opera house [افتتاح دار الاوبرا في سيدني]. (Click on the image to see a larger version.)



All in all, Google Search is being actively developed for more than 100 languages, in 150+ countries, with dozens of improvements launched each month. So far I've covered the basics of how international search works, but this is just the surface of all the international work we do. There are many other interesting topics that impact international markets like usability, homepage and results page layout, and connectivity. An understanding of real cultural and human factors is essential to creating a search engine that resonates with the people who use it. (Click on the image to see a larger version.)



(Update: Replaced example in the 4th bullet point.)

The art of the field study

11/06/2008 02:06:00 PM
I'm Dan Russell, a member of the Search Quality team doing user experience research. This post is part of our ongoing series to talk about the Search Quality team at Google, showing a bit of what we do in the day-to-day course of improving the quality of the user experience.

The role of "user experience" research is to try and get the inside story on what people do when they search. We're constantly asking: What's the user's experience of search? What works and doesn't work for them? What are they looking for? What DO they want?

To understand the full richness and variety of what people do when they are using Google, we spend many hours in the field, watching people search and listening to what they say as they do this. We hear it when they're happy, and when they're terribly frustrated. And perhaps most importantly, we also pay attention to the things they don't say -- the inexpressible "gotchas" that slow users down or get in the way of their search.

It turns out that people are masters of saying one thing and doing another, particularly when it comes to nearly automatic behavior. We find that searchers often turn so quickly to Google that they don't really think too much about what they're actually searching for. It's surprising, but often we'll see people trying to find out something about a topic, but then never actually mention the topic itself. That is, there's often a big discrepancy between what they'll tell me (the human observer) they're trying to do, and the search terms they enter into Google. One person I shadowed for the day spent ten minutes trying to find the schedule of the ferry that runs between San Francisco and Larkspur, but somehow only thought of adding the word "ferry" much later in their search.

We also study eye tracking. The eye makes a complex scan path over the search results, building up a composite picture of what is presented on the page. It's clear that what actually happens is a very rapid scan and assessment of each result as they are seen. In those milliseconds between the eye landing on the first fixation and seeing a few results, all kinds of decisions and choices are made--nearly all of them subconsciously.

In this short video, you can see three different searchers all looking for the same thing (in this case, a child's backpack). The red dot is the searcher's gaze moving around on the search results page. Notice how methodically the gaze moves from result title to title, occasionally inspecting the snippet text to gain more detail about the result.


(Video courtesy of Kerry Rodden)


So the job of figuring out what people actually do when they search isn't as simple as asking someone what they search for during the day. It's basically impossible to give an accurate telling of what you saw (or didn't see) on the results page while actively searching for a high quality results.

Memories of your own behavior are also notoriously unreliable. People's search behavior in the lab is often different than when they're at home or at work. This is a natural (and expected) side effect of lab studies: people will work especially hard to please a researcher. If we ask them to search for a pair of brown shoes they'd like to buy for themselves, in the lab they'll find the first pair that seems reasonable and then stop, satisfied. If it was real, they would go on and spend more time. We still do lab studies, but we know what to watch for, and what to ignore.

Data from field studies gives us insight into how people respond to the Google experience in ways that we can't otherwise measure.

For instance, in several field studies we discovered that many of the people who went to the previous version of the Advanced Search page had a strong, almost visceral negative reaction when the page appeared. The text of the original page had language that many people saw as intimidating--words like "Domain," "Usage Rights" and "Safe Search" can be a bit much if you're not sure what they mean.

The old Advanced Search page was a little off-putting (click on the image to see a larger version):


Based on our field studies, we dug more deeply into how people were actually using our Advanced Search page, and quickly discovered that, indeed, a large number of users were going to the page, and then leaving it without ever filling in any of the slots.

Armed with this insight from field studies, we redesigned the page, simplifying it by removing terms that were unclear to the average user (the word "occurrences," for example, just didn't mean anything to many of the Advanced Search page users), moving rarely used features (numeric range searches, date searches, etc.) into a part of the page that was expandable with a single click. That made them easy to get to for people who knew they wanted to search with those restrictions, but out of the way in a non-threatening way.

One of the other things we noted in the field study was that people often didn't understand how the Advanced Search page worked. So we added a "visible query builder" region at the top of the page. As you fill in the blanks, the box at the top of the page fills in with the query that you could type into Google. It was our way of making visible the effects of advanced search operators.

The Advanced Search page post-redesign (click on the image to see a larger version):


The good effect of these changes quickly became clear. The number of users that bounced out of the Advanced Search page dropped significantly. Interestingly, the total number of Advanced Search page users didn't increase significantly... at least not yet. By improving the UI on the page, we hope to attract even more searchers to the large range of search options available on Google.

In the end, this example shows the kind of insights that field studies can bring. As with the eye-tracking example, asking someone about their emotional response to a web page just isn't a useful way to get that data. But watching them in situ, as they actually use Google to go about their daily search lives can reveal all kinds of remarkable, otherwise undiscoverable, and actionable insights into searcher behavior.