Our memcache install recently started removing keys, and we're not sure why. Large groups of keys vanish at the same time.
Memcache reports that evictions are low to non-existent, and our app has no way to clear memcache (it can only delete specific keys). Even keys of which the app has no knowledge get deleted, so we're pretty convinced they're getting expired. However, our memcache configuration hasn't been touched in some time.
Has anyone debugged an issue like this before, and if so, are there any steps you'd recommend we take? How flexible is memcache's expiration policy - is it possible that we're suddenly running into a criterion based on (say) write frequency to a key?
Even if you have set the keys to not expire in Memcached, records will be deleted according to least-recently-used if Memcached fills up. The standard slab-based storage mechanism stores records in fixed size chunks that are allocated from slabs of 1Mb size. Although this is fast, it also means that Memcached can end up wasting quite a lot of memory. Once a slab has been allocated to hold chunks of a particular size I do not think the chunks can be resized. If a mix of large and small objects are cached, and the composition changes over time, it is possible that memcached will end up storeing small objects in much larger chunks if these are the only ones available.
This is one of the issues that companies like e.g. gear6 (www.gear6.com) and northscale (www.northscale.com) have addressed in their Memcached distributions.
This turned out to be due to some debug code having been left in the application when it was deployed.
The debug code manually called Memcache::flush.
I think the moral of this story is: "Any time you say, 'it can't be the application', you're probably wrong." You'd think I would have known that by the time I asked this question, but apparently, I could do with a reminder from time to time.