MySQL's full-text search capability has few user-tunable parameters. You can exert more control over full-text searching behavior if you have a MySQL source distribution because some changes require source code modifications. See Section 2.17, “Installing MySQL from Source”.
Note that full-text search is carefully tuned for the most effectiveness. Modifying the default behavior in most cases can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing.
Most full-text variables described in this section must be set at server startup time. A server restart is required to change them; they cannot be modified while the server is running.
Some variable changes require that you rebuild the
FULLTEXT
indexes in your tables. Instructions
for doing so are given later in this section.
The minimum and maximum lengths of words to be indexed are
defined by the
ft_min_word_len
and
ft_max_word_len
system
variables. (See Section 5.1.4, “Server System Variables”.)
The default minimum value is four characters; the default
maximum is version dependent. If you change either value,
you must rebuild your FULLTEXT
indexes.
For example, if you want three-character words to be
searchable, you can set the
ft_min_word_len
variable by
putting the following lines in an option file:
[mysqld] ft_min_word_len=3
Then restart the server and rebuild your
FULLTEXT
indexes. Note particularly the
remarks regarding myisamchk in the
instructions following this list.
To override the default stopword list, set the
ft_stopword_file
system
variable. (See Section 5.1.4, “Server System Variables”.)
The variable value should be the path name of the file
containing the stopword list, or the empty string to disable
stopword filtering. The server looks for the file in the
data directory unless an absolute path name is given to
specify a different directory. After changing the value of
this variable or the contents of the stopword file, restart
the server and rebuild your FULLTEXT
indexes.
The stopword list is free-form. That is, you may use any
nonalphanumeric character such as newline, space, or comma
to separate stopwords. Exceptions are the underscore
character (“_
”) and a single
apostrophe (“'
”) which are
treated as part of a word. The character set of the stopword
list is the server's default character set; see
Section 10.1.3.1, “Server Character Set and Collation”.
The 50% threshold for natural language searches is
determined by the particular weighting scheme chosen. To
disable it, look for the following line in
myisam/ftdefs.h
:
#define GWS_IN_USE GWS_PROB
Change that line to this:
#define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the indexes in this case.
By making this change, you severely
decrease MySQL's ability to provide adequate relevance
values for the MATCH()
function. If you really need to search for such common
words, it would be better to search using IN
BOOLEAN MODE
instead, which does not observe the
50% threshold.
To change the operators used for boolean full-text searches,
set the ft_boolean_syntax
system variable. This variable can be changed while the
server is running, but you must have the
SUPER
privilege to do so. No
rebuilding of indexes is necessary in this case. See
Section 5.1.4, “Server System Variables”, which describes
the rules governing how to set this variable.
If you want to change the set of characters that are
considered word characters, you can do so in several ways,
as described in the following list. After making the
modification, you must rebuild the indexes for each table
that contains any FULLTEXT
indexes.
Suppose that you want to treat the hyphen character ('-') as
a word character. Use one of these methods:
Modify the MySQL source: In
myisam/ftdefs.h
, see the
true_word_char()
and
misc_word_char()
macros. Add
'-'
to one of those macros and
recompile MySQL.
Modify a character set file: This requires no
recompilation. The true_word_char()
macro uses a “character type” table to
distinguish letters and numbers from other characters. .
You can edit the contents of the
<ctype><map>
array in one
of the character set XML files to specify that
'-'
is a “letter.” Then
use the given character set for your
FULLTEXT
indexes. For information
about the <ctype><map>
array format, see Section 10.3.1, “Character Definition Arrays”.
Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation. For general information about adding collations, see Section 10.4, “Adding a Collation to a Character Set”. For an example specific to full-text indexing, see Section 12.9.7, “Adding a Collation for Full-Text Indexing”.
If you modify full-text variables that affect indexing
(ft_min_word_len
,
ft_max_word_len
, or
ft_stopword_file
), or if you
change the stopword file itself, you must rebuild your
FULLTEXT
indexes after making the changes and
restarting the server. To rebuild the indexes in this case, it
is sufficient to do a QUICK
repair operation:
mysql> REPAIR TABLE tbl_name
QUICK;
Alternatively, use ALTER TABLE
with the DROP INDEX
and ADD
INDEX
options to drop and re-create each
FULLTEXT
index. In some cases, this may be
faster than a repair operation.
Each table that contains any FULLTEXT
index
must be repaired as just shown. Otherwise, queries for the table
may yield incorrect results, and modifications to the table will
cause the server to see the table as corrupt and in need of
repair.
Note that if you use myisamchk to perform an
operation that modifies table indexes (such as repair or
analyze), the FULLTEXT
indexes are rebuilt
using the default full-text parameter
values for minimum word length, maximum word length, and
stopword file unless you specify otherwise. This can result in
queries failing.
The problem occurs because these parameters are known only by
the server. They are not stored in MyISAM
index files. To avoid the problem if you have modified the
minimum or maximum word length or stopword file values used by
the server, specify the same
ft_min_word_len
,
ft_max_word_len
, and
ft_stopword_file
values for
myisamchk that you use for
mysqld. For example, if you have set the
minimum word length to 3, you can repair a table with
myisamchk like this:
shell> myisamchk --recover --ft_min_word_len=3 tbl_name
.MYI
To ensure that myisamchk and the server use
the same values for full-text parameters, place each one in both
the [mysqld]
and
[myisamchk]
sections of an option file:
[mysqld] ft_min_word_len=3 [myisamchk] ft_min_word_len=3
An alternative to using myisamchk for index
modification is to use the REPAIR
TABLE
, ANALYZE TABLE
,
OPTIMIZE TABLE
, or
ALTER TABLE
statements. These
statements are performed by the server, which knows the proper
full-text parameter values to use.