Ping a Specific Port

Question

Sparsh Gupta

Asked: 2011-10-27 22:04:06 +0800 CST2011-10-27 22:04:06 +0800 CST 2011-10-27 22:04:06 +0800 CST

memcached crashed without notification

772

We rely heavily on memcache and are serving a few billion requests per month. We have 5 memcache servers. Last night, we saw an 25% increase in our traffic. The graphs show that requests and data transfered by each memcache increased and made them crash. It started a chain reaction and each memcache server crashed one after another (Load per server increased).

We found no logs in syslog, messages, memcache log file (Verbose settings was off).

I have two questions:

How can I find out why exactly this happened. If load is an issue for memcache, is there any documentation on how much a normal memcache (running on decent config) can handle. How can I increase this value.
How can I ensure they never go down again. It eventually impacted our mysql servers and replication and impacted a lot of other related services. Do I need more memcache servers?

I started my memcache using this init.d script: http://pastebin.com/wfMnB4ta where ENABLE_MEMCACHE is YES in /etc/default/memcached

/usr/share/memcached/scripts/start-memcached: http://pastebin.com/LaUugXye

Thanks

2 Answers

Voted

dormando · Answer 1 · 2011-10-27T22:31:00+08:00

Best Answer

dormando

2011-10-27T22:31:00+08:002011-10-27T22:31:00+08:00

I'm going to guess that you run version 1.4.5 or older.

Since you mention an increase in traffic, then a sudden exit:

You may have hit the max connections limit (see http://memcached.org/timeouts for some discussion on this).
If you hammer the connection limit for a long time, there was a bug which would cause memcached to exit.
This was partially repaired in 1.4.6, further repaired in 1.4.7, and refined through 1.4.9.

If you ever experience a crash, the first thing to do is make sure you're on the latest stable release. If you still experience crashes, the best thing to do is to contact the actual mailing list or file a bug report with information, rather than get lucky with a maintainer seeing this via a twitter search.

Doing periodic upgrades to match the latest stable can help you avoid having your whole cluster crash in the future.

6

David Schwartz · Answer 2 · 2011-10-27T22:49:49+08:00

David Schwartz

2011-10-27T22:49:49+08:002011-10-27T22:49:49+08:00

You should also work out some kind of structural solution to deal with similar problems. For example, if you notice that the response time on requests is increasing, reduce the number of requests. You can do this various ways, including disabling non-essential services.

This particular failure would likely not have been avoidable though. There's not much you can about a failure causing increasing load.

0

memcached crashed without notification

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?