Since my last reboot, I am seeing the following every 1-2 minutes:
Aug 02 13:53:00 monitor systemd[1]: influxdb.service: start operation timed out. Terminating.
Aug 02 13:53:00 monitor systemd[1]: influxdb.service: Failed with result 'timeout'.
Aug 02 13:53:00 monitor systemd[1]: Failed to start InfluxDB is an open-source, distributed, time series database.
Aug 02 13:53:00 monitor systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 4.
Aug 02 13:53:00 monitor systemd[1]: Stopped InfluxDB is an open-source, distributed, time series database.
Aug 02 13:53:00 monitor systemd[1]: Starting InfluxDB is an open-source, distributed, time series database...
Aug 02 13:53:00 monitor influxd-systemd-start.sh[3539]: Merging with configuration at: /etc/influxdb/influxdb.conf
on 29/07/2021 influx was updated from 1.8.6-1 to 1.8.7-1. The OS is Ubuntu 20.04 server.
The first reboot after this is when the issues started.
Initially there was a permissions issue with /usr/lib/influxdb/scripts/influxd-systemd-start.sh
, which prevented it starting. I changed the perms to 0755 and it started, but keeps restarting. It seems that it is accepting connections and data between the restarts, as telegraf is still populating the database, and Grafana is able to display the stats, so long as it doesn't coincide with the restart.
I am also seeing the message
influxd-systemd-start.sh[12171]: [tcp] 2021/08/02 14:21:40 tcp.Mux: Listener at 127.0.0.1:8088 failed failed to accept a connection, closing all listeners
It is listening on those ports
root@monitor$ ss -ilpn | grep influx
tcp LISTEN 0 4096 127.0.0.1:8088 0.0.0.0:* users:(("influxd",pid=15115,fd=3))
tcp LISTEN 0 4096 *:8086 *:* users:(("influxd",pid=15115,fd=32))
As far as I am aware there have been no config changed. There is no firewall rules active.
Anybody have any idea why it started misbehaving?
It looks like
/usr/lib/influxdb/scripts/influxd-systemd-start.sh
is trying to do a health check:this is failing. From the file date, the start wrapper was only created on 21 July, so it looks like the start check is new.
If I manually try I get:
It fails for several reasons.
To resolve it I edited the
/lib/systemd/system/influxdb.service
file andExecStart=/usr/bin/influxd -config /etc/influxdb/influxdb.conf --pidfile /var/lib/influxdb/influxd.pid $INFLUXD_OPTS
This is a bug introduced in Influxdb v1.8.7. Github Issue.
There's a variety of ways of fixing this, your solution being one of the ways. In our case Influx took a bit longer to startup than the 10 second window the startup script allows, so I simply changed the line
sleep 1
in the file/usr/lib/influxdb/scripts/influxd-systemd-start.sh
tosleep 2
to give Influx more time to startup.