We have a site thats using MySQL fulltext searching but has an issue that a number of strings that need to be indexed are 3 characters long. I know that the default for MySQL is not to index strings of this length but I also know that it is possible to override this in my.cnf. I assume the reason for the limit is primarily performance based so i'm not sure if this is something we should consider changing? What options do we have here? There are a number of sites on this server so any performance degredation could cause issues to a number of sites.
mysql version: 5
The reason is performance based, yes. Lowering the default limit will increase the space required to store the indexes, and the increased index size takes longer to search. Impact will depend upon usage (the type of queries performed) and size of current data set. The default minimum is 4, you can lower it like so:
When you rebuild your indexes (as you must) be sure not to repair, but drop and rebuild the indexes. This is considerably quicker than repairing them.
Probably your best solution is to monitor the size of the indexes on a dev server before and after the index length change.
The best option (that I have deferred to) is to ignore MySQL fulltext matching (which has serious limitations including MyISAM only, inability to match prefixed wildcards, a prohibitive default stop words list) and implement a third party solution. The best options available are:
I would personally opt for Lucene, although it requires a local java instance. If this is not possible, Sphinx is very easy to configure for PHP (walkthrough here) and many other languages.
Here are some good benchmarks and sombody else's first impressions on the matter.
As an alternative to Andy's answer, instead of changing the global setting you can add a dummy character to the end of a short word to artificially increase its length.
For example, for the word "at" you can replace it with "at--" (which has length 4) and for the word "see" you can replace it with "see-" (which also has length 4).
You could derive a column like this and index on that but return the original data.