I recently upgraded clamav from I'm not sure which version, but whatever was current on EPEL on February 13th, to 0.102.3. Under the previous version, I had to set TimeoutStartSec
in the systemd conf file to 5 minutes to get it to start without timing out. Under the current version, AFAICT, it ~ never starts. I'm trying to figure out both how to get it to actually start, and how to get it to a success or failed state in some time less than hours.
$sudo systemctl -l status clamd@amavisd
● [email protected] - clamd scanner (amavisd) daemon
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: activating (start) since Wed 2020-07-15 17:33:53 UTC; 4h 18min ago
I was working my way up. I'd set the timeout to 15 minutes, then 20 minutes, then 40 minutes, then 1.5 hours, and finally 4 hours. Amongst the many curious things, the above was still "activating" long after the 4 hours was up, but ultimately it did fail:
Jul 15 17:23:29 ip-10-0-200-85 systemd: Starting clamd scanner (amavisd) daemon.
...
Jul 15 22:47:08 ip-10-0-200-85 systemd: [email protected] failed.
Here's the systemd configuration:
$cat /usr/lib/systemd/system/[email protected]
[Unit]
Description = clamd scanner (%i) daemon
Documentation=man:clamd(8) man:clamd.conf(5) https://www.clamav.net/documents/
After = syslog.target nss-lookup.target network.target
[Service]
Type = forking
ExecStart = /usr/sbin/clamd -c /etc/clamd.d/%i.conf
# Reload the database
ExecReload=/bin/kill -USR2 $MAINPID
Restart = on-failure
TimeoutStartSec=240min
[Install]
WantedBy = multi-user.target
The last log entries before it ultimately failed were
...
Jul 15 20:06:18 ip-10-0-200-85 clamd: LibClamAV debug: bytecode: registered ctx variable at 0x55ceb28157b0 (+744) id 7
Jul 15 20:06:18 ip-10-0-200-85 clamd: LibClamAV debug: bytecode debug: startup: bytecode execution in auto mode
Jul 15 20:06:18 ip-10-0-200-85 clamd: LibClamAV debug: interpreter bytecode run finished in 50106us, after executing 96 opcodes
Jul 15 20:06:18 ip-10-0-200-85 clamd: LibClamAV debug: Bytecode: disable status is 0
Jul 15 20:06:18 ip-10-0-200-85 clamd: LibClamAV debug: bytecode: JIT disabled
Jul 15 20:06:18 ip-10-0-200-85 clamd: LibClamAV debug: Cannot prepare for JIT, LLVM is not compiled or not linked
Jul 15 20:06:29 ip-10-0-200-85 clamd: LibClamAV debug: Bytecode: 0 bytecode prepared with JIT, 95 prepared with interpreter, 95 total
...
I wasn't sure what it meant that the start process was well past the 4-hour timeout and it hadn't yet killed itself. At previous, short intervals, it just restarted the startup process. For example, here was the timer set to 20 minutes:
Jul 15 16:12:32 ip-10-0-200-85 systemd: Starting clamd scanner (amavisd) daemon.
...
Jul 15 16:33:13 ip-10-0-200-85 systemd: [email protected] start operation timed out. Terminating.
Jul 15 16:33:13 ip-10-0-200-85 systemd: Failed to start clamd scanner (amavisd) daemon.
Jul 15 16:33:13 ip-10-0-200-85 systemd: Unit [email protected] entered failed state.
Jul 15 16:33:13 ip-10-0-200-85 systemd: [email protected] failed.
Jul 15 16:33:14 ip-10-0-200-85 systemd: [email protected] holdoff time over, scheduling restart.
Jul 15 16:33:14 ip-10-0-200-85 systemd: Starting clamd scanner (amavisd) daemon.
Given that, 20 minutes after the 4-hour timer had expired, it hadn't hit the Failed to start clamd
point, I figured I'd give it a try and run some mail through it. Mail actually showed up in my inbox with the header X-Virus-Scanned: amavisd-new at example.com
, and it appeared to be doing something (though I confess I'm not sure what these log files are supposed to say):
Jul 15 21:57:07 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: Breaking command loop, mode is no longer MODE_COMMAND
Jul 15 21:57:07 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: Consumed entire command
Jul 15 21:57:07 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: Number of file descriptors polled: 1 fds
Jul 15 21:57:07 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: fds_poll_recv: timeout after 600 seconds
Jul 15 21:57:07 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: THRMGR: queue (single) crossed low threshold -> signaling
Jul 15 21:57:07 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: THRMGR: queue (bulk) crossed low threshold -> signaling
Jul 15 21:57:10 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: Finished scanthread
Jul 15 21:57:10 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: Scanthread: connection shut down (FD 10)
Jul 15 21:57:10 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: THRMGR: queue (single) crossed low threshold -> signaling
Jul 15 21:57:10 ip-10-0-200-85.eu-central-1.compute.internal clamd[16152]: THRMGR: queue (bulk) crossed low threshold -> signaling
I just let it run, continuing in its "activating" state, but it did ultimately fail, after 5 hours and 15 minutes (again, against a 4 hour timer ¯_(ツ)_/¯ ):
Jul 15 17:23:29 ip-10-0-200-85 systemd: Starting clamd scanner (amavisd) daemon...
...
Jul 15 22:47:06 ip-10-0-200-85 systemd: [email protected] start operation timed out. Terminating.
Jul 15 22:47:08 ip-10-0-200-85 systemd: Failed to start clamd scanner (amavisd) daemon.
Jul 15 22:47:08 ip-10-0-200-85 systemd: Unit [email protected] entered failed state.
Jul 15 22:47:08 ip-10-0-200-85 systemd: [email protected] failed.
Jul 15 22:47:08 ip-10-0-200-85 systemd: [email protected] holdoff time over, scheduling restart.
So, like I said at the beginning, I'm trying to figure out how to get this to actually start — and whilst working on that, trying to figure out how to get it to fail in sometime less than hours. Obviously, continually upping the timer is a choice, but I don't want to wait 5, 6, 7 hours between tests if there's a better way to do this, or something obvious I'm missing.
Also, I looked at My Notes On Getting ClamAV Working on Centos 7 [not actually my notes, that's just what it's called]. There's some interesting stuff in there are setting nice level, and CPU and memory limits, but those seem like they'd slow things down even more.
TIA for any help/hints/suggestions.
Ultimately, this appears to have been a memory issue. The service was running on an AWS EC2 t2.nano (i.e., 500MB RAM), with 2GB swap. That had worked in the past, but apparently at some point in the last six months ClamAV just ... needed more. I switched the instance type to t2.micro (1GB RAM) and it Just Worked. Probably for some people it would still be nice to figure out how to make clamd use more swap / less real memory, but in most cases, just ensuring you have at least 1GB RAM seems to be sufficient.