Ping a Specific Port

Question

Sabya

Asked: 2012-12-22 02:51:34 +0800 CST2012-12-22 02:51:34 +0800 CST 2012-12-22 02:51:34 +0800 CST

How to process logs and debug information generated by applications running on spot instances

772

I am using Amazon Spot Instances to crawl a lot data. Most of them run until Amazon terminates them once the current price exceeds our max bid.

I need to monitor and mainly archive the logs generated in those spot instances. Those logs are very important for debugging and analytics. We have application logs, system logs such as syslog, secure log. Below are the options I could think of:

use Chukwa/Flume. Not listing Facebook's scribe here because I think the project is dead. There is a rare possibility to lose few logs with this approach.
Attach an EBS volume to those spot instance. But then managing those volumes when the spot instances are terminated will be a pain.
Mount a NFS volume so that we write logs in that volume. The performance is really bad sometimes in this approach.

Also, I think the ability to run Linux commands such as grep, awk on those archived files are also important. What are people using in such situation?

P.S. We are already using splunk but I will not archive logs in splunk.

2 Answers

Voted

jamieb · Answer 1 · 2012-12-22T08:10:22+08:00

Best Answer

jamieb

2012-12-22T08:10:22+08:002012-12-22T08:10:22+08:00

Two approaches that I've used:

Ship your logs to another location using Syslog. With AWS we use a VPC with a VPN connection to a private rack in a local datacenter. All of our instances are running Syslog-NG and send their logs to a server in our datacenter. The data is stored in MongoDB.
Use logrotate to archive your logs to S3. It's not as real-time as using Syslog, but it's simpler to set-up and maintain, especially if you're generating a lot of data. AWS's newly announced Data Pipeline could also be a good addition to this solution because you could use it to automatically process your logs using Elastic Map Reduce jobs.

0

Napster_X · Answer 2 · 2012-12-23T02:57:28+08:00

Napster_X

2012-12-23T02:57:28+08:002012-12-23T02:57:28+08:00

As you have already written, that you can have an option to use Chkwa/Flume.

I believe that's the best and most efficient way to do the log processing and storing, but I might suggest to use logstash for the same.

Logstash is pretty efficient and have supports a lot of internal message formats. Logstash also provides with a front-end where you can use regular expression and check the results.

Though for front-end I would suggest using graylog2, which have a lots of functionality comparing to logstash front-end.

Though, if you already have splunk, I don't understand why you don't want to store the data there. Could it be due to the licensing fee? I am not sure about their fees structure, but I know that it's a lot :)

0

How to process logs and debug information generated by applications running on spot instances

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?