We are running an Elastic Stack with ECK in EKS (7.8). We noticed that our filebeat daemonset and the AWS module were not processing logs from S3 and our SQS queues backing up. Looking at the logs on our FileBeat containers, we noticed the following repeating error in each of the filebeat pods:
ERROR [s3] s3/input.go:206 SQS ReceiveMessageRequest failed:
EC2RoleRequestError: no EC2 instance role found caused by:
EC2MetadataError: failed to make Client request
We have found:
- Restarting the pods did not resolve the issue.
- Removing and reapplying the daemonset did not resolve the issue.
- Restarting the nodes did not resolve the issue.
- Terminating the nodes and allowing them to be replaced DID resolve the issue. The filebeat-pods began to process data again.
This issue has repeated many times and at different intervals. The filebeat pods will successfully process data for weeks at a time, or sometimes only a few days before this error recurs.
Although we have a work-around (replacing the nodes) we still wanted to look into why this occurred. Working with AWS we finally tracked down what we feel is a hint as to the root cause.
When the pods successfully start up, it seems that we see a message like the following:
{"log":"2021-02-17T17:08:47.378Z\u0009INFO\u0009[add_cloud_metadata]
\u0009add_cloud_metadata/add_cloud_metadata.go:93\u0009add_cloud_metadata: hosting provider
type detected as aws, metadata={\"account\":{\"id\":\"##########\"},\"availability_zone
\":\"#########\",\"image\":{\"id\":\"#############\"},\"instance\":{\"id\":\"i-#########
\"},\"machine\":{\"type\":\"#######\"},\"provider\":\"aws\",\"region\":\"######
\"}\n","stream":"stderr","time":"2021-02-17T17:08:47.379170444Z"}
If we restart a pod and monitor it on a "failed" node, we see this log message:
{"log":"2021-02-17T17:08:47.439Z\u0009INFO\u0009[add_cloud_metadata]
\u0009add_cloud_metadata/add_cloud_metadata.go:89\u0009add_cloud_metadata: hosting provider
type not detected.\n","stream":"stderr","time":"2021-02-17T17:08:47.439427267Z"}
We have also verified that the pods themselves CAN hit the EC2 metadata endpoint successfully via curl.
Any information on how to resolve this recurring issue would be appreciated.
Filebeat Config:
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
data:
filebeat.yml: |-
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
processors:
- add_cloud_metadata: ~
- add_docker_metadata: ~
output.elasticsearch:
workers: 5
bulk_max_size: 5000
hosts: ["${ES_HOSTS}"]
username: "${ES_USER}"
password: "${ES_PASSWORD}"