• Aucun résultat trouvé

Using Full-Text Searches

Dans le document MySQL Cookbook (Page 197-200)

Using Patterns with Nonstring Values

5.12. Using Full-Text Searches

LOCATE(), LIKE, and REGEXP use the collation of their arguments to determine whether the search is case sensitive. Recipes 5.5 and 5.7 discuss changing the argument com‐

parison properties if you want to change the search behavior.

5.12. Using Full-Text Searches

Problem

You want to search a lot of text.

Solution

Use a FULLTEXT index.

Discussion

Pattern matches enable you to look through any number of rows, but as the amount of text goes up, the match operation can become quite slow. It’s also a common task to search for the same text in several string columns, but with pattern matching, that results in unwieldy queries:

SELECT * from tbl_name

WHERE col1 LIKE 'pat' OR col2 LIKE 'pat' OR col3 LIKE 'pat' ...

A useful alternative is full-text searching, which is designed for looking through large amounts of text and can search multiple columns simultaneously. To use this capability, add a FULLTEXT index to your table, and then use the MATCH operator to look for strings in the indexed column or columns. FULLTEXT indexing can be used with MyISAM tables (or, as of MySQL 5.6, InnoDB tables) for nonbinary string data types (CHAR, VARCHAR, or TEXT).

Full-text searching is best illustrated with a reasonably good-sized body of text. If you don’t have a sample dataset, you can find several repositories of freely available electronic text on the Internet. For the examples here, the one I’ve chosen is the complete text of the King James Version of the Bible (KJV), which is both relatively large and nicely structured by book, chapter, and verse. Because of its size, this dataset is not included with the recipes distribution, but is available separately as the mcb-kjv distribution at the MySQL Cookbook website (see the Preface). The mcb-kjv distribution includes a file named kjv.txt that contains the verse records. Some sample records look like this:

O Genesis 1 1 1 In the beginning God created the heaven and the earth.

O Exodus 2 20 13 Thou shalt not kill.

N Luke 42 17 32 Remember Lot's wife.

5.12. Using Full-Text Searches | 169

Each record contains the following fields:

• Book section (O or N, signifying Old or New Testament)

• Book name and corresponding book number, from 1 to 66

• Chapter and verse numbers

• Text of the verse

To import the records into MySQL, create a table named kjv that looks like this:

CREATE TABLE kjv (

bsect ENUM('O','N') NOT NULL, # book section (testament) bname VARCHAR(20) NOT NULL, # book name

bnum TINYINT UNSIGNED NOT NULL, # book number cnum TINYINT UNSIGNED NOT NULL, # chapter number vnum TINYINT UNSIGNED NOT NULL, # verse number vtext TEXT NOT NULL, # text of verse FULLTEXT (vtext) # full-text index

) ENGINE = MyISAM; # can be InnoDB for MySQL 5.6+

The table has a FULLTEXT index to enable its use in full-text searching. It also uses the MyISAM storage engine. If you have MySQL 5.6 or higher and want to use InnoDB instead, modify the ENGINE clause to ENGINE=InnoDB.

After creating the kjv table, load the kjv.txt file into it using this statement:

mysql> LOAD DATA LOCAL INFILE 'kjv.txt' INTO TABLE kjv;

You’ll notice that the kjv table contains columns both for book names (Genesis, Exo‐

dus, ...) and for book numbers (1, 2, ...). The names and numbers have a fixed corre‐

spondence, and one can be derived from the other—a redundancy that means the table is not in normal form. It’s possible to eliminate the redundancy by storing just the book numbers (which take less space than the names), and then producing the names when necessary in query results by joining the numbers to a mapping table that associates each book number with the corresponding name. But I want to avoid using joins at this point. Thus, the table includes book names so search results can be interpreted more easily, and numbers so the results can be sorted easily into book order.

To perform a search using the FULLTEXT index, use MATCH() to name the indexed column and AGAINST() to specify what text to look for. For example, you might wonder, “How many times does the name Hadoram occur?” To answer that question, search the vtext column using this statement:

mysql> SELECT COUNT(*) from kjv WHERE MATCH(vtext) AGAINST('Hadoram');

+---+

| COUNT(*) | +---+

| 4 | +---+

170 | Chapter 5: Working with Strings

To find out what those verses are, select the columns you want to see (the example here truncates the vtext column and uses \G so the results better fit the page):

mysql> SELECT bname, cnum, vnum, LEFT(vtext,65) AS vtext -> FROM kjv WHERE MATCH(vtext) AGAINST('Hadoram')\G

*************************** 1. row ***************************

bname: Genesis cnum: 10 vnum: 27

vtext: And Hadoram, and Uzal, and Diklah,

*************************** 2. row ***************************

bname: 1 Chronicles cnum: 1

vnum: 21

vtext: Hadoram also, and Uzal, and Diklah,

*************************** 3. row ***************************

bname: 1 Chronicles cnum: 18

vnum: 10

vtext: He sent Hadoram his son to king David, to inquire of his welfare,

*************************** 4. row ***************************

bname: 2 Chronicles cnum: 10

vnum: 18

vtext: Then king Rehoboam sent Hadoram that was over the tribute; and th

The results may come out in book, chapter, and verse number order, but that’s just coincidence. By default, full-text searches compute a relevance ranking and use it for sorting. To make sure a search result is sorted the way you want, add an explicit OR DERBY clause:

SELECT bname, cnum, vnum, vtext

FROM kjv WHERE MATCH(vtext) AGAINST('search string') ORDER BY bnum, cnum, vnum;

To see the relevance ranking, repeat the MATCH() … AGAINST() expression in the output column list.

To narrow the search further, include additional criteria. The following queries perform progressively more specific searches to determine how often the name Abraham occurs in the entire KJV, the New Testament, the Book of Hebrews, and Chapter 11 of Hebrews:

mysql> SELECT COUNT(*) from kjv

-> WHERE MATCH(vtext) AGAINST('Abraham');

+---+

-> WHERE MATCH(vtext) AGAINST('Abraham') -> AND bsect = 'N';

5.12. Using Full-Text Searches | 171

+---+

-> WHERE MATCH(vtext) AGAINST('Abraham') -> AND bname = 'Hebrews';

-> WHERE MATCH(vtext) AGAINST('Abraham') -> AND bname = 'Hebrews' AND cnum = 11;

If you expect to use search criteria frequently that include other non-FULLTEXT columns, add regular indexes to those columns so that queries perform better. For example, to index the book, chapter, and verse columns, do this:

mysql> ALTER TABLE kjv ADD INDEX (bnum), ADD INDEX (cnum), ADD INDEX (vnum);

Search strings in full-text queries can include more than one word, and you might suppose that adding words would make a search more specific. But in fact that widens it because a full-text search returns rows that contain any of the words. In effect, the query performs an OR search for any of the words. The following queries illustrate this;

they identify successively larger numbers of verses as additional search words are added:

mysql> SELECT COUNT(*) from kjv

-> WHERE MATCH(vtext) AGAINST('Abraham');

+---+

-> WHERE MATCH(vtext) AGAINST('Abraham Sarah');

+---+

-> WHERE MATCH(vtext) AGAINST('Abraham Sarah Ishmael Isaac');

+---+

| COUNT(*) |

172 | Chapter 5: Working with Strings

Dans le document MySQL Cookbook (Page 197-200)