I have a project coming up where one of the requirements will be to cache what's essentially read-only data in order to lessen the strain on the db. I'm just a bit confused on how caching works with a db.
How does caching work? I've seen something called query caching, but does that cache query results? Or just the queries? And isn't a view a form of caching anyway?
I'm just looking for a nudge in the right direction. The project will use MySQL 5.1 for the store, so any links that can clear up my confusion would be a big help. A general Google search has only provided me with query caching, and with my lack of knowledge in this area, I'm not sure if that's the direction I should go.
The MySQL query cache does indeed cache the results of the query and as long as subsequent queries are exactly the same, the calculated result will be provided instead of actually running the query. MySQL will also skip calculating the execution plan.
To use it, you can either turn it on in your server's
my.cnf
or use the SQL_CACHE hint in front of your queries. If you have it turned on in my.cnf, you can tell MySQL not to cache a result by using SQL_NO_CACHE in front of your queries.Any write (INSERT, UPDATE, DELETE, etc.) to a table will invalidate all of the entries in the query cache that came from that table.
One performance anomaly with the query cache: it uses an inefficient algorithm for finding entries within the cache so creating a larger cache can result in worse performance. You should experiment with changing it with your own data and query profiles but the last time I did this experiment I found that around 256MB was the sweet spot. Larger or smaller than this and the performance got worse. The manual suggests "Tens of megabytes"
You can also implement caching outside of MySQL using something like memcached. This is not transparent to the application, so you will have to add extra code in your application to handle doing the lookup to memcached and then, if you get a miss, doing the lookup in the database and storing the result in memcached.
You could go about this several ways. There are likely other ways to go about this, but these 2 spring to mind instantly.
You could create another table to hold your cached results and on a schedule run the expensive queries which might join multiple tables with lots of records in them and dump the records into this table. Doing a
SELECT *
from a single table should be less expensive that doing a SELECT which joins 12 different tables, each with millions of records in them. While this doesn't exactly remove the work from the DB server, it should reduce the computational workload by only having to do the expensive queries on a schedule and having your regular clients fetch data from the cache table.Alternatively, for a cache which is totally separate from your DB you could implement something like Redis. This will store data in memory and should be super-quick, but will need some extra logic in your application to use this as a data source instead of the real database. It scales well too - the Stack Exchange network utilise it quite effectively across their sites.