Our database servers (mainly based on the Debian stable packages (=currently Wheezy) seem to have about 4 times more load for the same workload in kernel 3.2.0-4-amd64
then in it's previous 2.6.32-5-amd64
kernel. With all packages the same & booting in the other kernel we can clearly see the difference, and I'm at a loss as to why. The problem is, I don't see that much IO or CPU load difference.
Setting the default kernel.sched_min_granularity_ns
& kernel.sched_latency_ns
back to it's 2.6.32
values helps a little (thrice the load instead of 4 times), but not to the level we'd like. As a lot of kernel settings changed, we can hardly just blindly set the new kernel to the old default values of the 2.6
one.
Has anybody else had experience with this? If so, what caused this (and ideally: how could it be solved)?
As it's deep kernel-related, perhaps a difference in sysctl values might be of interest: here is a diff of the 2 (pastebinned to prevent an overly long question).
edit: currently we're investigating this SO answer to see if that applies.
Linux kernels 3.0 - 3.8 should be avoided or upgraded to address IO performance degradation
Linux kernel IO performance degradation demonstrated by Josh Berkus using a private benchmark workload running against PostgreSQL 9.3 on Ubuntu 12.04 with kernel 3.2.0.
"...you really need to avoid every kernel between 3.0 and 3.8. While RHEL has been sticking to the 2.6 kernels (which have their own issues, but not as bad as this), Ubuntu has released various 3.X kernels for 12.04...upgraded...to kernel 3.13.0, and ran the same exact workload...an 80% reduction in IO. We can thank the smart folks in the Linux FS/MM group for hammering down a whole slew of performance issues."
Please see http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
I addressed an issue in the DBA StackExchange about the kernel and journaling. I learned this from Percona back in May that a certain flush behavior is actually simulated.
Maybe the reported load is simply not correct, like in this bug report: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=693942
Can you see that there is anything actually slower? or does vmstat look like the server is really doing more work? otherwise i'd assume you've just hit that reported bug, the same happened to me some time ago, the performance of the server was not different only the outputted load average was higher.
I don't have the reputation to make this a comment.. but as you were upgrading the kernel did you also upgrade the version of MySQL? Can you list which MySQL 5.5.X you are running?
Ironically bugs in some of the newer versions of MySQL have actually made performance noticeably worse.. they've gone on to fix them of course but it did create a significant red-hearing for me while making changes in my app.
"InnoDB: The fix for Bug#17699331 caused a high rate of read/write lock creation and destruction which resulted in a performance regression. (Bug #18345645, Bug #71708)"
http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-19.html
"InnoDB: A regression introduced by Bug #14329288 would result in a performance degradation when a compressed table does not fit into memory. (Bug #18124788, Bug #71436)"
http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-17.html
..etc
It's just the same for 5.5:
"InnoDB: A regression introduced by Bug #14329288 would result in a performance degradation when a compressed table does not fit into memory. (Bug #18124788, Bug #71436)"
http://dev.mysql.com/doc/relnotes/mysql/5.5/en/news-5-5-37.html
Does upgrading to a newer MySQL return it back to reasonable performance?
MySQL does have some kernel specific code in there too:
"asynchronous I/O is not supported on tmpfs in some Linux kernel versions. The workaround was to turn off the innodb_use_native_aio setting or use a different temporary directory. The fix causes InnoDB to turn off the innodb_use_native_aio setting automatically if it detects that the temporary file directory does not support asynchronous I/O. (Bug #13593888, Bug #11765450, Bug #58421)"
"http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-5.html
So I'd ensure you're running the latest build.
As an aside consider MySQL 5.6.X (which is now officially stable and has been for some time), "For Linux, MySQL 5.6 shows up to a 150% improvement in TPS throughput over MySQL 5.5" http://dev.mysql.com/tech-resources/articles/mysql-5.6-rc.html
I had huge mysql performance problems moving from debian w/ kernel 2.6 and mysql 5.1 to debian w/ kernel 3.2 and mysql 5.5 (wheezy).
What solved the problem for mysql was barrier=0 in /etc/fstab. Check out https://wiki.archlinux.org/index.php/Ext4