I have a ubuntu server hosted and a couple of times it becomes unresponsive to everything, until a hard reboot is done.. i have pulled the logs but i need a little bit of help working out what they mean.. and if they are actually related or if you think it could be a hardware issue:
syslog
Mar 1 15:11:01 xxxxxxxx CRON[24473]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:12:01 xxxxxxxx CRON[24530]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:13:01 xxxxxxxx CRON[24585]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:14:01 xxxxxxxx CRON[24654]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:15:01 xxxxxxxx CRON[24713]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:16:01 xxxxxxxx CRON[24770]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:17:01 xxxxxxxx CRON[24827]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:17:01 xxxxxxxx CRON[24828]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 1 15:17:05 xxxxxxxx postfix/pickup[23311]: 3CFEE2E3CF: uid=0 from=<root>
Mar 1 15:17:05 xxxxxxxx postfix/cleanup[24880]: 3CFEE2E3CF: message-id=<[email protected]>
Mar 1 15:17:05 xxxxxxxx postfix/qmgr[3886]: 3CFEE2E3CF: from=<[email protected]>, size=2080, nrcpt=1 (queue active)
Mar 1 15:17:05 xxxxxxxx postfix/smtp[24882]: 3CFEE2E3CF: to=<[email protected]>, relay=xxxxxxxxxxxxxx.dyndns.org[xxx.xxx.xxx.xxx]:25, delay=0.56, delays=0.08/0/0.21/0.26, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery)
Mar 1 15:17:05 xxxxxxxx postfix/qmgr[3886]: 3CFEE2E3CF: removed
Mar 1 15:18:01 xxxxxxxx CRON[24897]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:19:01 xxxxxxxx CRON[24944]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:20:01 xxxxxxxx CRON[24999]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:21:01 xxxxxxxx CRON[25046]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 16:02:40 xxxxxxxx kernel: imklog 4.6.4, log source = /proc/kmsg started.
Mar 1 16:02:40 xxxxxxxx rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="3425" x-info="http://www.rsyslog.com"] (re)start
Mar 1 16:02:40 xxxxxxxx rsyslogd: rsyslogd's groupid changed to 103
Mar 1 16:02:40 xxxxxxxx rsyslogd: rsyslogd's userid changed to 101
Mar 1 16:02:40 xxxxxxxx rsyslogd-2039: Could no open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
Mar 1 16:02:40 xxxxxxxx kernel: Initializing cgroup subsys cpuset
Mar 1 16:02:40 xxxxxxxx kernel: Linux version 3.2.13-grsec-xxxx-grs-ipv6-64 ([email protected]) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Mar 29 09:48:59 UTC 2012
Mar 1 16:02:40 xxxxxxxx kernel: Command line: root=/dev/sda1 console=tty0 BOOT_IMAGE=bzImage-2.6-xxxx-grs-ipv6-64
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-provided physical RAM map:
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 0000000000100000 - 00000000df790000 (usable)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df790000 - 00000000df79e000 (ACPI data)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df79e000 - 00000000df7d0000 (ACPI NVS)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df7d0000 - 00000000df7e0000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df7ec000 - 00000000f0000000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 0000000100000000 - 0000000620000000 (usable)
Mar 1 16:02:40 xxxxxxxx kernel: NX (Execute Disable) protection: active
Mar 1 16:02:40 xxxxxxxx kernel: DMI present.
As you can see, the server stops right after a cronjob has run.. there are no complicated jobs running on here..
Are there any pointers you can give me on diagnosing the issue?
Thanks
That would be a confirmed bug in your version of Ubuntu, so I'd hope that's what causing the issue, and try to resolve it first.
You could upgrade to get around it, or try what's suggested here, which would be editing your
/etc/rsyslog.d/50-default.conf
file (or runningapt-get upgrade
).Failing that, stop running the cron job that occurs just before the server hangs and take a look at it to see what it might be doing that could cause your server to hang. If nothing else, fixing the
rsyslog
bug might allow you to capture some useful logging info that could point you in the right direction.