Boolean Searching in WPopac

WPopac takes advantage of MySQL’s indexing and relevance-ranked searching (go ahead, try it), including boolean searching (on MySQL versions > 4.x). Here are some details and examples taken wholesale from the MySQL manual:

  • +
    A leading plus sign indicates that this word must be present in each result returned.
     
  • -
    A leading minus sign indicates that this word must not be present in any of the resuls that are returned.
     
  • > <
    These two operators are used to change a word’s contribution to the relevance value that is assigned to a result. The > operator increases the contribution and the  
  • ( )
    Parentheses group words into subexpressions. Parenthesized groups can be nested.
     
  • ~
    A leading tilde acts as a negation operator, causing the word’s contribution to the result’s relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator.
     
  • *
    The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.
     
  • “
    A phrase that is enclosed within double quote (‘”’) characters matches only results that contain the phrase literally, as it was typed.
     

In short, it supports the quotes and plus/minus operators that people are familiar with in Google and others. The following examples demonstrate some search strings that use boolean operators:

  • apple banana
    Find records that contain at least one of the two words.
     
  • +apple +juice
    Find records that contain both words.
     
  • +apple macintosh
    Find records that contain the word “apple”, but rank records higher if they also contain “macintosh”.
     
  • +apple -macintosh
    Find records that contain the word “apple” but not “macintosh”.
     
  • +apple ~macintosh
    Find records that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for ‘+apple -macintosh’, for which the presence of “macintosh” causes the row not to be returned at all.
     
  • +apple +(>turnover <strudel)
    Find records that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”.
     
  • apple*
    Find records that contain words such as “apple”, “apples”, “applesauce”, or “applet”.
     
  • “some words”
    Find records that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”).
     

The other thing to be aware of, however, is MySQL’s over-reaching stopword list. I’ve yet to experiment with configuring a custom “ft_stopword_file” or changing the value of “ft_min_word_len” in my.cnf.

libraries, library, library catalog, online catalog, opac, wpopac, mysql, boolean searching, boolean



Be the first to comment on this post!

Leave a Reply

Comments should show a courteous regard for the presence of other voices in the discussion. We reserve the right to edit or delete comments that do not adhere to this standard.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>