We have a MySQL-based application that is very response time sensitive. A MySQL stored procedure needs to respond on average in less than 10 milliseconds and the maximum acceptable response time is 50 milliseconds. We are hitting the average response time with flying colors (more like 3 milliseconds) but over a five hour load test we will get some (~50) responses greater than 50 milliseconds with a maximum response time sometimes as high as 265 milliseconds.
There is a lot of information out there about tuning MySQL and Linux for optimal performance, and I believe we've applied most of the best practices, but I can find very little about how to reduce the incidence of outliers and reduce maximum response time.
How can I diagnose the cause of these outlier response times and what Linux and MySQL tunings can I apply to reduce them?
Here are some tips for mysql performance:
1- Create indexes on the fields used for select quries to improve speed.
2- Keep small number of records if applicable in the same table.
3- Use Myisam engine as it is usually faster than Innodb if you don't care about transactions and forigen keys.
4- Optimize your queries.
5- Run mysql on a dedicated server if applicable and turn off unneeded services to save the server resources.
What is the level of deviation here? If you are meeting your average SLA then there will always be some outliers that are three and four and five standard deviations out there. Something you will need to come to grips with on the SLA front is what level of exception is allowable against your maximum?
Several years ago there was a great commercial from motorola speaking about six sigma manufacturing principals and using a golfer as an analogy, noting that their level of defects was similar to missing one put in 100 years. The key here is that the one exception in 100 years was an allowable exception level. The cost to move from 1:100,000 to 1:1,000,000 can be extraordinary for little payoff.
You note 50 exceptions over the 50 millisecond level. This is out of a total sample size of ??? Is this rate of exception allowable/desirable from a business perspective taking into account the cost required to cut the rate?