I am using Amazon Spot Instances to crawl a lot data. Most of them run until Amazon terminates them once the current price exceeds our max bid.
I need to monitor and mainly archive the logs generated in those spot instances. Those logs are very important for debugging and analytics. We have application logs, system logs such as syslog, secure log. Below are the options I could think of:
- use Chukwa/Flume. Not listing Facebook's scribe here because I think the project is dead. There is a rare possibility to lose few logs with this approach.
- Attach an EBS volume to those spot instance. But then managing those volumes when the spot instances are terminated will be a pain.
- Mount a NFS volume so that we write logs in that volume. The performance is really bad sometimes in this approach.
Also, I think the ability to run Linux commands such as grep, awk on those archived files are also important. What are people using in such situation?
P.S. We are already using splunk but I will not archive logs in splunk.