Strange problem. We have 6 nodes behind a loadbalancer. They are high-spec VPSes running Ubuntu. On a separate node we run Redis. Further nodes run MySQL. The whole LAMP setup hosts Magento.
Transitioning from a file based cache to Redis central cache, we started to change each Magento node initially one by one to use Redis trough Cm_Cache_Backend_Redis With Redis being used by two servers, everything runs fine. So we decided to switch the remaining 4 servers too. But then performance starts to tank big time. The performance regression is as much as 300% as confirmed by New Relic. App response time goes from a reasonable 900-1200ms to 3K+ms. Page load time gets horrible, jumps at least 2 seconds, oftentimes more. Under heavy-ish (200 users spread across 6 servers) peak load, the regression is even more profound.
In the traces, we start seeing that all is not well with Redis.
Category Slowest components Count Duration %
Custom Varien_Simplexml_Element::asNiceXml 578 19,200 ms 33%
Custom Varien_Simplexml_Element::extendChild 673 10,200 ms 18%
Custom Cm_RedisSession_Model_Session::read 1 5,070 ms 9%
Custom Varien_Simplexml_Element::extend 76 4,380 ms 8%
Custom Varien_Simplexml_Element::hasChildren 69 2,690 ms 5%
Custom Mage_Core_Model_Config::loadModulesConfiguration 1 2,270 ms 4%
Remainder Remainder 1 13,700 ms 24%
Total time 57,500 ms 100%
The XML module and core config loading becomes dead slow, Redis sessions, which are normally fast, now instantly become slow. The whole lot grinds down to a slow crawl.
The Redis server is a default Ubuntu install we don't directly control right now. The client side on the 6 nodes we do control. Right now, it uses the built-in Credis standalone client, which we intend to swap out with phpredis PECL client, which should give somewhat of a performance boost.
Everything else is default as per https://github.com/colinmollenhour/Cm_Cache_Backend_Redis
Hopefully the client swap will make all the difference, but in the meanwhile, we're keen to hear further suggestions. Why would 2 nodes work fine and fast, but it starts choking on 6? Does this sound like client or server side trouble to you?
Your thoughts are very welcome.
0 Answers