I have a MySQL 5.5.4 table with >200M rows which has a FULLTEXT index on two columns (Title,Body).
When I do a simple FULLTEXT query in the default NATURAL LANGUAGE mode for some popular results (they'd return 2M+ rows), I'm getting zero rows back:
SELECT COUNT(*) FROM itemsearch WHERE MATCH (Title, Body) AGAINST ('fubar');
But when I do a FULLTEXT query in BOOLEAN mode, I can see the rows in question do exist (I get 2M+ back, depending):
SELECT COUNT(*) FROM itemsearch WHERE MATCH (Title, Body) AGAINST ('+fubar' IN BOOLEAN MODE);
I have some queries which return ~500K rows which are working fine in either mode, so if it's result size related, it seems to crop up somewhere between 500K and a little north of 2M.
I've tried playing with the various buffer size variables, to no avail. It's clearly not the 50% threshold, since we're not getting 100M rows back for any result.
Any ideas?
The behavior you see is by design. When using the natural language query (NLQ) if the total number of documents containing the term is greater than 2 million, then the weight will be 0.
This is done on the following lines of /storage/myisam/ft_nlq_search.c:
According to revision 1346.322.1 this was a bugfix to prevent "tree overflow".
It looks like you're going to have to switch to Sphinx or a Lucene solution.
Its likely that these columns are not being indexed. The relevant section of the manual:
This would explain why you're seeing different behavior in Boolean mode.