I read here and there that using the utf8_unicode_ci
collation ensures a better treatment of unicode text (for example, it knowns how to expand characters such as 'œ' into 'oe' for searching and ordering) compared to the default utf8_general_ci
which basically just strips diacritics. Unfortunately, both sources indicate that utf8_unicode_ci
is slightly slower than utf8_general_ci
.
So my question is: what does "slightly slower" mean? Has anyone run benchmarks? Are we talking about a -0.01% performance impact or rather something like -25%?
Thanks for your help.
Well, I did not find any benchmarks in the Internet, so I decided to made benchmarks myself.
I created a very simple table with 500000 rows:
Then I filled it with random data by running this stored procedure:
Then I created the following stored procedures to benchmark simple SELECT, SELECT with LIKE, and sorting (SELECT with ORDER BY):
In the stored procedures above utf8_general_ci collation is used, but of course during the tests I used both utf8_general_ci and utf8_unicode_ci.
I called each stored procedure 5 times for each collation (5 times for utf8_general_ci and 5 times for utf8_unicode_ci) and then calculated the average values.
Here are the results:
benchmark_simple_select() with utf8_general_ci: 9957 ms
benchmark_simple_select() with utf8_unicode_ci: 10271 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 3.2%.
benchmark_select_like() with utf8_general_ci: 11441 ms
benchmark_select_like() with utf8_unicode_ci: 12811 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 12%.
benchmark_order_by() with utf8_general_ci: 11944 ms
benchmark_order_by() with utf8_unicode_ci: 12887 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 7.9%.
I didn't see any benchmark, but you can run your own using the BENCHMARK function:
BENCHMARK(count,expr)
As advised by Matthew you may run a parallel installation of MYSQL, but consider that there could be a huge difference between different architecture (sparc, intel, 32bit, 64bit, ...).