Search Engine Queries May be Used to Identify Entity Attributes

Sharing is caring!

How Search Engine Queries to Identify Entity Attributes

What are query stream ontologies, and how might they change search?

Search Engine Queries to identify entity attributes

Search engines trained us to use keywords when we searched – to guess what words or phrases might be the best ones to try to find something we are interested in. That we might have a situational or informational need to find out more about. Keywords were an important and essential part of SEO – trying to get pages to rank highly in search results for certain keywords found in search engine queries that people would search for. SEOs still optimize pages for keywords, hoping to use a combination of information retrieval relevance scores and link-based PageRank scores to get pages to rank highly in search results.

With Google moving towards a knowledge-based attempt to find “things” rather than “strings,” we see patents that focus upon returning results that provide answers to questions in response to search engine queries. For example, one of those from January describes how query stream ontologies might be created from search engine queries that can be used to identify entity attributes used to respond to fact-based questions using information about those entities.

There is a white paper from Google co-authored by the same people who are the inventors of this patent published around the time this patent was filed in 2014, and it is worth spending time reading through. The paper is titled, Biperpedia: An Ontology for Search Applications

The entity attributes patent (and paper) both focus on the importance of structured data. The summary for the patent tells us this:

Search engines often are designed to recognize queries that structured data can answer. As such, they may invest heavily in creating and maintaining high-precision databases. While conventional databases in this context typically have relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small.

The patent is:

Identifying entity attributes
Inventors: Alon Yitzchak Halevy, Fei Wu, Steven Euijong Whang, and Rahul Gupta
Assignee: Google Inc.
US Patent: 9,864,795
Granted: January 9, 2018
Filed: October 28, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, generate an ontology of entity attributes. One of the methods includes extracting a plurality of attributes based upon a plurality of queries and constructing an ontology based upon the plurality of attributes and a plurality of entity classes.

The paper echoes sentiments in the patent, with statements such as this one:

For the first time in the history of the Web, structured data is a first-class citizen among search results. The main search engines make significant efforts to recognize when a user’s query can be answered using structured data.

To cut right to the heart of what this patent covers, it’s worth pulling out the first claim from the patent that expresses how much of an impact this patent may have on uncovering entity attributes from a knowledge-based approach to collecting data and indexing information on the Web. Like most patent language, it’s a long passage that tends to run on, but it is very detailed about the process that this patent covers:

1. A method comprising: generating an ontology of class-attribute pairs, wherein each class that occurs in the class-attribute pairs of the ontology is a class of entities and each attribute occurring in the class-attribute pairs of the ontology is an attribute of the respective entities in the class of the class-attribute pair in which the attribute occurs, wherein each attribute in the class-attribute pairs has one or more domains of instances to which the attribute applies and a range that is either a class of entities or a type of data, and wherein generating the ontology comprises: obtaining class-entity data representing a set of classes and, for each class, entities belonging to the class as instances of the class; obtaining a plurality of entity-attribute pairs, wherein each entity-attribute pair identifies an entity that is represented in the class-entity data and a candidate attribute for the entity; determining a plurality of attribute extraction patterns from occurrences of the entities identified by the entity-attribute pairs with the candidate attributes identified by the entity-attribute pairs in text of documents in a collection of documents, wherein determining the plurality of attribute extraction patterns comprises: identifying an occurrence of the entity and the candidate attribute identified by a first entity-attribute pair in a first sentence from a first document in the collection of documents; generating a candidate lexical attribute extraction pattern from the first sentence; generating a candidate parse attribute extraction pattern from the first sentence; and selecting the candidate lexical attribute extraction pattern and the candidate parse attribute extraction pattern as attribute extraction patterns if the candidate lexical attribute pattern and the candidate parse attribute extraction patterns were generated using at least a predetermined number of unique entity-attribute pairs; and applying the plurality of attribute extraction patterns to the documents in the collection of documents to determine entity-attribute pairs, and from the entity-attribute pairs and the class-entity data, for each of one or more entity classes represented in the class-entity data, attributes possessed by entities belonging to the entity class.

Rather than making this post just the claims of this patent (which are worth going through if you can tolerate the legalese), I’m going to pull out some information from the description, which describes some of the implications of the process behind the patent. This first one tells us of the benefit of crowdsourcing an ontology, by building it from search engine queries from many searchers, and how that may mean that focusing upon matching keywords in queries with keywords in documents becomes less important than responding to queries with answers to questions:

Extending the number of attributes known to a search engine may enable the search engine to answer more precisely queries that lie outside a “long tail” of statistical query arrangements, extract a broader range of facts from the Web, and/or retrieve information related to semantic information of tables present on the Web.

This patent provides a lot of information about how such an ontology-based on search engine queries might be used to assist search:

The present disclosure provides systems and techniques for creating an ontology of, for example, millions of (class, attribute) pairs, including 100,000 or more distinct attribute names, which is up to several orders of magnitude larger than available conventional ontologies. Extending the number of attributes “known” to a search engine may provide several benefits. First, additional attributes may enable the search engine to answer “long-tail” queries more precisely, e.g., Brazil coffee production. Second, additional attributes may allow for the extraction of facts from Web text using open information extraction techniques. As another example, a broad repository of attributes may enable recovery of the semantics of tables on the Web because it may be easier to recognize attribute names in column headers and the surrounding text.

Answering Search Queries with Entity Attributes

I wrote about the topic of How Knowledge Base Entities could be Used in Searches to describe how Google might search a data store of entity attributes such as movies to return search results by asking about facts related to a movie, such as “What is the movie where Robert Duvall loves the smell of Napalm in the morning?” Building up a detailed ontology that includes many facts can mean a search engine can answer many questions quickly. This may be how featured snippets may be responded to in the future, but the patent that describes this approach is returning SERPs filled with links to web documents rather than answers to questions.

Open Information Extraction

That mention of open information extraction methods from the patent reminded me of an acquistion that Google made a few years ago when Google acquired Wavii in April of 2013. Wavii did research about open extraction as described in these papers:

A video that might be helpful to learn about how Open Information Extraction works is this one:

Open Information Extraction at Web Scale

An Ontology created from a query stream of search engine queries can lead to this kind of open information extraction.

Semantics from Tables on the Web

Google has been running a Webtables project for a few years and has released a follow-up that describes how the project has been going. Semantics from Tables is mentioned in this patent, so it’s worth including some papers about the Webtables project to give you more information about them if you hadn’t come across them before:

Ontologies based on Search Engine Queries

search engine queries

The process in the patent involves extracting information from search engine queries to identify entity attributes and build an ontology. I enjoyed the statements in this patent about what an ontology was and how one works to help search. I recommend clicking through and reading the description in the patent along with the Biperpedia paper. This transformation of search brings it beyond keywords and understanding entities better and how search works. This appears to be an authentic future of Search:

Systems and techniques disclosed herein may extract attributes from a query stream and then use extractions to seed attribute extraction from other text. For every attribute, a set of synonyms and text patterns in which it appears is saved, thereby enabling the ontology to recognize the attribute in more contexts. An attribute in an ontology as disclosed herein includes a relationship between a pair of entities (e.g., CAPITAL of countries), between an entity and a value (e.g., COFFEE PRODUCTION), or between an entity and a narrative (e.g., CULTURE). An ontology as disclosed herein may be described as a “best-effort” ontology in the sense that not all the attributes it contains are equally meaningful. Such an ontology may capture attributes that people consider relevant to classes of entities. For example, people may primarily express interest in attributes by querying a search engine for the attribute of a particular entity or using the attribute in the written text on the Web. In contrast to a conventional ontology or database schema, a best-effort ontology may not attach a precise definition to each attribute. However, it has been found that such an ontology still may have relatively high precision (e.g., 0.91 for the top 100 attributes and 0.52 for the top 5000 attributes).

The ontologies that are created from search engine queries expressly to assist search applications are different from more conventional manually generated ontologies in several ways:

Ontologies as disclosed herein may be particularly well-suited for use in search applications. In particular, tasks such as parsing a user query, recovering the semantics of columns of Web tables, and recognizing when sentences in the text refer to entities’ attributes may be performed efficiently. In contrast, conventional ontologies tend to be relatively inflexible or brittle because they rely on a single way of modeling the world, including a single name for any class, entity, or attribute. Hence, supporting search applications with a conventional ontology may be difficult because mapping a query or a text snippet to the ontology can be arbitrarily hard. An ontology as disclosed herein may include one or more constructs that facilitate query and text understanding, such as attaching to every attribute a set of common misspellings of the attribute, exact and/or approximate synonyms, other related attributes (even if the specific relationship is not known), and common text phrases that mention the attribute.

The patent does include more about ontologies and schema and data sources and query patterns.

This is a direction that search is traveling towards, and if you want to know or do SEO, it’s worth learning about. SEO is changing, just as it has many times in the past.

I’ve also written a follow-up to this post on the Go Fish Digital blog at: SEO Moves From Keywords to Ontologies and Query Patterns

A Related older post on this topic is Google Adds Entity Attributes to its Knowledge Base from Queries

Last Updated July 11, 2019

Sharing is caring!

18 thoughts on “Search Engine Queries May be Used to Identify Entity Attributes”

  1. This is a brilliant explanation of how search algorithm evolves as per user intent, search queries and as we adopt different technologies. However, the focus has always been on the user intent and in providing the best result. What I am looking forward to and would find it interesting is SERP results based on natural language and voice-based queries. For the big G, the structured data, schema markup, and multi-media data/object recognition will evolve due to more advancement in AI.

  2. Hi Ace,

    I think some of the differences between typed queries and voice-based queries include: (1) accounting for accents to possibly personalize results (2) Understanding when words within a query might be stressed vocally to give it additional meaning, and (3) reference to entities in a previous query by use of a pronoun, as a continuation of a conversational interaction with a search engine. A focus on natural language, rather than matching keywords can lead towards an evolution of search. We are seeing things such at Google Lens appear as an option in Google photos, which uses object recognition and schema to show us how Google is advancing search. It will be interesting seeing even more.

  3. You said things and not strings would form the crux of the google search engine algorithm. But I feel these things are still in theory and it will take a while to get adapted practically in the actual search algorithm of the Google. I still feel Google has a long way to go to realize this cherished ideal.

  4. Hi Ozment Media,

    We have knowledge panels in search results. Google is using entity IDs in Google Trends and reverse image search. When Paul Haahr spoke at SMX East 2016, he told us that Google now regularly looks at queries to see if they contain entities in them before they do any other processing, and he has been a senior search engineer at Google since the early 2000s. Google has shown off other patents that use Facts and attributes to answer search queries, such as the one I wrote about in:

    How Knowledge Base Entities can be Used in Searches
    https://www.seobythesea.com/2014/07/knowledge-base-entities-used-in-searches/

    Google has told us that Structured Data is important and Google uses it to learn more precise data about things being searched for:

    Google: Schema & Structured Data Is Here For The Long Run
    https://www.seroundtable.com/google-schema-structured-data-here-to-stay-25293.html

    Google introduced the knowledge graph five years ago in 2012, telling us that they would be searching for things and not strings then.

    They have shown us how important it is, given us examples of it being used in SERPS and with knowledge panels and rich results.

    It does appear to have moved beyond the theory stage. The old saying about the best time to plant a tree is, “20 years ago,” and if that isn’t possible to do it now. The same could be said to start exploring a “things and not strings” approach to search. It is not the only thing Google is doing, and Google is still using keywords and matching of keywords in queries to keywords in documents, but it is clear that they are moving to something new, and they have already shown it to us. It is more than a hopeful dream on Google’s part.

  5. This is a brilliant explanation Bill. I’m lucky I got to your blog today and I want to ask you something unrelated to the topic of the article: in the last period, google index my last published article after 1 day, even if I use “fetch like google”. Why do you think this is happening? I mention that before, indexing was almost instantaneous. I thank you in advance for an answer.

  6. Hey Bill, great article. You explained very well how search seach changes. Glad I came across your blog.

  7. This is indeed an excellent explanation. It is so hard to get your head around algorithms, even if you are mathematically minded (which I must admit I am not)

  8. Hi Bill,

    One year ago, indexing was almost instantaneous after “fetch in webmaster tools ” or submiting a sitemap.
    Now with the new Google alghoritm, strange things is happening … Google allows us to fetch only a certain number of pages per day and. I think this is happening because of the battle between Google PPC and Organic Search

    Glad I came across your blog and thanks for sharing with us.

  9. Hey Bill!

    First of all – as a general note, I just randomly came across this website, and now I think you’ve got a new fan… I like your ‘philosophical’ approach to SEO and Google. “Learning SEO directly from the search engines” is a neat way of putting it because you do an amazing job at digging into how Google alters (and plans to alter) its algorithm to be more laser-focused on specific user intentions as well as constantly adapt to new web browsing and mobile use trends. Maybe it’s just me, but I think that you putting forth your studies of Google patents alerts SEO experts to forthcoming change well before it happens – and we all know it happens on a frequent basis.

    Now, on to the article itself – the idea of answering queries with structured data instead of keyword matches is fascinating. (I personally find structured data and Schema.org markup to be one of my favourite aspects of SEO.) It reminds me: numerous SEO experts stress in 2018 that the roadmap to ranking is writing high-quality niche-specific content with low keyword density and that attracts natural ‘earned’ links. It all ties back into Google’s intention of ‘humanizing’ the search engine experience as much as possible – when one considers that in their optimization strategies, they’ll be able to rank much more sustainably because Google recognizes there’s a clear win-win between the readers/customers and the website owner.

    Keep up the good research Bill!

    Very best,
    Alexander V. – Co-Founder, Triple Agent Digital Media, Inc.
    “Your Secrets Are Safe With Us!”

  10. Most importantly – as a general note, I just arbitrarily went over this site, and now I think you have another fan… I like your ‘philosophical’ way to deal with SEO and Google. “Gaining SEO specifically from the web crawlers” is a slick method for putting it since you complete a stunning activity at delving into how Google adjusts (and plans to modify) its calculation to be more laser-centered around particular client goals and in addition continually adjust to new web perusing and portable utilize patterns. Perhaps it’s simply me, yet I feel that you advancing your investigations of Google licenses cautions SEO specialists to pending change a long time before it happens – and we as a whole know it occurs on a successive premise.
    Presently, on to the article itself – noting inquiries with organized information rather than watchword matches is captivating. (I for one find organized information and Schema.org markup to be one of my most loved parts of SEO.) It reminds me: various SEO specialists worry in 2018 that the guide to positioning is composing amazing specialty particular substance with low watchword thickness and that pulls in regular ‘earned’ connections. Everything ties once more into Google’s expectation of ‘adapting’ the internet searcher encounter however much as could be expected – when one thinks about that in their streamlining procedures, they’ll have the capacity to rank substantially more reasonably in light of the fact that Google perceives there’s an unmistakable win-win between the perusers/clients and the site proprietor

  11. Hi Bill Like always, great content, Keep up the good work!. And This is indeed an excellent explanation. It is so hard to get your head around algorithms, even if you are mathematically minded.

  12. Thank you Bill for this. And you are right when you said in the end, “SEO is changing”

Comments are closed.