Over a cluster of over 12 centos 5.8 servers, I deployed logstash using the native logstash shipper, which sends /var/log/*/*.log
back to a central logstash server.
We tried using rsyslogd as the shipper, but due to a bug in rsyslogd's ImFile module, if the remote end didn't reply, the logs would pile up in memory.
We're currently using Redis as the transport mechanism, so logstash01 has redis running locally, bound to the IP for the VLAN for these logs.
So logstash-shipper sends to redis on logstash01. logstash01 sends to Elasticsearch running in a separate process.
Here's what we're seeing. Elasticsearch has 141 blocked threads. Stracing the elasticsearch parent shows:
futex(0x7f4ccd1939d0, FUTEX_WAIT, 26374, NULL
Here's the jstack from elasticsearch
Here's the jstack from logstash
So.. Last night, some of the webservers (whose logs are tailed by logstash) went nuts, with load averages over 500.
On logstash01, there's this
Dec 19 00:44:45 logstash01 kernel: [736965.925863] Killed process 23429 (redis-server) total-vm:5493112kB, anon-rss:4248840kB, file-rss:108kB
So OOM-killer killed redis-server, which then meant logs piled up in memory on the servers which were shipping stuff.. Which somehow means that apache gets its knickers in a twist. (Frankly, I'm not sure how, I just assume it's tailing the log)..
This is my theory of how events unfolded:
- We had a traffic spike.
- An immense amount of logs were generated.
- These piled up in Redis, as logstash/elasticsearch only seems to be able to handle 300-400 new events / second.
- Redis had filled up entirely to the point where OOM-killer slaughtered it senselessly.
- Redis stops accepting new items.
- Items now start to pile up on the remote hosts side.
- Everything goes nuts. Apache stops accepting requests. (Why?).
Questions are these:
Why does apache go nuts if there's just something tailing its log. Is it that the thing tailing it blocks apache from writing?
Is there a sane way to make elasticsearch faster/better/resilient?
Is there a sane way to make redis resilient and not die because of being OOM'd
Is there a fundamental flaw in the way I've set it all up, or does everyone have this problem?
-- EDIT --
Some specs for @lusis.
admin@log01:/etc/init$ free -m
total used free shared buffers cached
Mem: 7986 6041 1944 0 743 1157
-/+ buffers/cache: 4140 3845
Swap: 3813 3628 185
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 19G 5.3G 13G 31% /
udev 3.9G 4.0K 3.9G 1% /dev
tmpfs 1.6G 240K 1.6G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 3.9G 0 3.9G 0% /run/shm
/dev/sda1 90M 72M 14M 85% /boot
/dev/mapper/data-disk 471G 1.2G 469G 1% /data
/dev/sda2 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/sda1 on /boot type ext2 (rw)
/dev/mapper/data-disk on /data type ext3 (rw)
/data/elasticsearch on /var/lib/elasticsearch type none (rw,bind)
log01:/etc/init$ top
top - 14:12:20 up 18 days, 21:59, 2 users, load average: 0.20, 0.35, 0.40
Tasks: 103 total, 1 running, 102 sleeping, 0 stopped, 0 zombie
Cpu0 : 3.0%us, 1.0%sy, 0.0%ni, 95.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 12.0%us, 1.0%sy, 0.0%ni, 86.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 4.7%us, 0.3%sy, 0.0%ni, 94.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 5.3%us, 1.3%sy, 0.0%ni, 93.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 6.4%us, 1.0%sy, 0.0%ni, 92.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 8178120k total, 6159036k used, 2019084k free, 761780k buffers
Your post doesn't describe much in the way of specs (memory on the LS indexer, log volume or much else) but I'll try and answer your questions best I can first. Disclaimer: I'm one of the logstash devs -
Apache going nuts was likely a side effect of the logstash process acting up. I'd put that aside for now.
The sane way to make ES f/b/s is add more ES nodes. It's seriously that easy. They even autodiscover each other depending on network topology. After 17 years in this industry I've never seen anything scale horizontally as easy as ElasticSearch.
To f/b/s Redis, don't use any redis clustering. Newer versions of Logstash can do Redis loadbalancing internally. The Redis output supports a list of Redis hosts in the plugin config and support is about to be added to the input side as well to match that. In the interim you can use multiple Redis input definitions on the indexer/consumer side.
I can't answer this beyond saying that it sounds like you're trying to do to much with a single (possibly underpowered host).
Any good scaling process starts with breaking collocated components into distinct systems. I don't see your configs gist'd anywhere but the places where logstash 'bottlenecks' are in filters. Depending on how many transformations you're doing it can have an impact on the memory usage of Logstash processes.
Logstash works a lot like legos. You can either use a 2x4 brick or you can use two 2x2 bricks to accomplish the same task. Except in the case of logstash, it's actually sturdier to use smaller bricks than a single big brick.
Some general advice we normally give is:
ship logs as quickly as possible from the edge If you can use pure network transport instead of writing to disk, that's nice but not required. Logstash is JVM-based and that has good and bad implications. Use an alternate shipper. I wrote a python based one ( https://github.com/lusis/logstash-shipper ) but I would suggest that folks use Beaver instead ( https://github.com/josegonzalez/beaver ).
generate your logs in a format that requires as little filtering as possible (json or optimally json-event format) This isn't always possible. I wrote a log4j appender to do this ( https://github.com/lusis/zmq-appender ) and eventually broke out the pattern layout into its own repo ( https://github.com/lusis/log4j-jsonevent-layout ). This means I don't have to do ANY filtering in logstash for those logs. I just set the type on input to 'json-event'
For apache, you can try this approach: http://cookbook.logstash.net/recipes/apache-json-logs/
Also note that the logstash mailing list is VERY active so you should always start there. Sanitize and gist your configs as that's the best place to start.
There are companies (like Sonian) scaling ElasticSearch to petabyte levels and companies (like Mailchimp and Dreamhost) scaling Logstash to insane levels as well. It can be done.