We use logstash to store/search logs from our mail servers. I noticed today that we didn't have any indices from this year (2015). Quick investigation showed that current logs were being stored as 2014.01.05 (ie same day but last year) and these indices were being deleted by a cron job that looks for old indices.
Restarting logstash fixed things so I assume that logstash is filling in the year information based on the time it started.
We're running Logstash 1.4.1 with Elasticsearch 1.2.4. So not the latest version of Elasticsearch but I don't see anything relevant in the changelog for 1.4.2.
Log entries are sent to logstash using syslog - config below along with example of input line and parsed output.
Is there a better fix for this than just remembering to restart Logstash on New Year's day?
Example of input line
Jan 5 15:03:35 cheviot22 exim[15034]: 1Y89Bv-0003uU-DD <= [email protected] H=adudeviis.ncl.ac.uk (campus) [10.8.232.56] P=esmtp S=2548 [email protected]
{
"_index": "logstash-2014.01.05",
"_type": "mails",
"_id": "HO0TQs66SA-1QkQBYd9Jag",
"_score": null,
"_source": {
"@version": "1",
"@timestamp": "2014-01-05T15:03:35.000Z",
"type": "mails",
"priority": 22,
"timestamp": "Jan 5 15:03:35",
"logsource": "cheviot22",
"program": "exim",
"pid": "15034",
"severity": 6,
"facility": 2,
"facility_label": "mail",
"severity_label": "Informational",
"msg": "1Y89Bv-0003uU-DD <= [email protected] H=adudeviis.ncl.ac.uk (campus) [10.8.232.56] P=esmtp S=2548 [email protected]",
"tags": [
"grokked",
"exim_grokked",
"dated"
],
"xid": "1Y89Bv-0003uU",
"exim_rcpt_kv": "[email protected] H=adudeviis.ncl.ac.uk (campus) [10.8.232.56] P=esmtp S=2548 [email protected]",
"H": "adudeviis.ncl.ac.uk",
"P": "esmtp",
"S": "2548",
"id": "[email protected]"
},
"sort": [
1388934215000,
1388934215000
]
}
Logstash config (with irrelevant bits removed) ...
input {
syslog {
codec => "plain"
debug => false
port => 514
type => "mails"
}
}
filter {
mutate {
remove_field => [ "path", "host" ]
}
if [type] == "mails" {
grok {
patterns_dir => [ "/etc/logstash/patterns" ]
match => [ "message", "(?<msg>.*)" ]
add_tag => [ "grokked" ]
break_on_match => true
remove_field => [ "message" ]
}
}
date {
match => [ "timestamp", "ISO8601", "MMM dd HH:mm:ss", "MMM d HH:mm:ss"]
add_tag => [ "dated" ]
}
}
output {
elasticsearch {
cluster => "logstash"
host => "iss-logstash01"
flush_size => 1000
index => "logstash-%{+YYYY.MM.dd}"
}
}
Found a pointer to answer in the logstash-users Google group (which had slipped my mind). Recent discussion pointed to https://logstash.jira.com/browse/LOGSTASH-1744 which (a) confirms that other people are seeing the same as me and (b) offers a couple of possible solutions.
Option 1 is a patch to Elasticsearch (not in the standard distribution) which updates Logstash's idea of the current year.
Option 2 is to not parse the timestamp from the syslog line and instead just rely on the time the message arrives with Logstash. This is probably an acceptable solution for us since ordering of lines is more important than exact time (as long as it's close).