On Ubuntu 16.04.1 running:
rsyslogd 8.16.0, compiled with:
PLATFORM: x86_64-pc-linux-gnu
PLATFORM (lsb_release -d):
FEATURE_REGEXP: Yes
GSSAPI Kerberos 5 support: Yes
FEATURE_DEBUG (debug build, slow code): No
32bit Atomic operations supported: Yes
64bit Atomic operations supported: Yes
memory allocator: system default
Runtime Instrumentation (slow code): No
uuid support: Yes
Number of Bits in RainerScript integers: 64
with:
#rsyslogd -N1
rsyslogd: version 8.16.0, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]
Hangs upon service stop if the below config is in effect:
$ModLoad imfile
##########
# MySQL slow query log (standard log is autosent to syslog):
$InputFileName /var/log/mysql/mysql-slow.log
$InputFileTag mysqld_slow:
$InputFileStateFile state-mysqld_slow
$InputFileSeverity info
$InputRunFileMonitor
##########
# Apache access:
###
$InputFileName /var/log/virtualmin/mysite.com_access_log
$InputFileTag www-access-mysite.com:
$InputFileStateFile state-www-access-mysite.com
$InputFileSeverity info
$InputRunFileMonitor
##########
# Apache Error:
###
$InputFileName /var/log/virtualmin/mysite.com_error_log
$InputFileTag www-errors-mysite.org:
$InputFileStateFile state-www-errors-mysite.com
$InputFileSeverity error
$InputRunFileMonitor
### Main Error Log
$InputFileName /var/log/apache2/error.log
$InputFileTag www-errors-main:
$InputFileStateFile state-www-errors
$InputFileSeverity error
$InputRunFileMonitor
# Setup disk assisted queues
$ActionQueueFileName fwdRule1 # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
$ActionQueueType LinkedList # run asynchronously
$ActionResumeRetryCount -1 # infinite retries if host is down
#RsyslogGnuTLS
$DefaultNetstreamDriverCAFile /etc/rsyslog.d/keys/ca.d/certs.crt
###
###First BLOCK
###
template(name="access" type="string" string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% Host1 %app-name% %procid% %msgid% [id@40000 tag=\"access\"] %msg%\n")
template(name="errors" type="string" string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% Host1 %app-name% %procid% %msgid% [id@40000 tag=\"errors\"] %msg%\n")
# Send messages over TCP using the template.
:syslogtag, startswith, "www-access" action(type="omfwd" protocol="tcp" target="logs.com" port="6000" template="access" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.logs.com")
:syslogtag, startswith, "www-error" action(type="omfwd" protocol="tcp" target="logs.com" port="6000" template="errors" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.logs.com")
###
###Second BLOCK
###
template(name="standard" type="string" string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% Host1 %app-name% %procid% %msgid% %msg%\n")
# Send messages over TCP using the template.
:syslogtag, startswith, "www-access" action(type="omfwd" protocol="tcp" target="logs.other.com" port="38549" template="standard" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.other.com")
:syslogtag, startswith, "www-errors" action(type="omfwd" protocol="tcp" target="logs.other.com" port="38549" template="standard" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.other.com")
:syslogtag, startswith, "mysqld" action(type="omfwd" protocol="tcp" target="logs.other.com" port="38549" template="standard" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.other.com")
And if ran via rsyslogd -dn
you can see:
9744.300605710:main thread : main Q: EnqueueMsg advised worker start
9744.300724026:main thread : Terminating main queue... 9744.300734161:main thread : main Q: initiating worker thread shutdown sequence
9744.300739666:main thread : main Q: trying shutdown of regular workers
9744.300766918:main thread : sent SIGTTIN to worker thread 0x9b57a700
9744.300776037:main Q:Reg/w0 : wti 0x16756a0: worker awoke from idle processing
9744.300782296:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects and enqueued 0 objects
9744.300786960:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new sizes: log 0, phys 0
9744.300791987:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 sz phys 0
9744.300797344:main Q:Reg/w0 : wti 0x16756a0: terminating worker terminateRet=5, bInactivityTOOccured=0
9744.300803075:main Q:Reg/w0 : DDDD: wti 0x16756a0: worker cleanup action instances
9744.300807699:main Q:Reg/w0 : wti 0x16756a0, action 0, ptr 0x7f1394004210
9744.300825960:main Q:Reg/w0 : wti 0x16756a0, action 1, ptr 0x7f1394038f80
9744.300835440:main Q:Reg/w0 : wti 0x16756a0, action 2, ptr 0x7f13940022a0
9744.300875784:main Q:Reg/w0 : wti 0x16756a0, action 3, ptr 0x7f1394004470
9744.300885088:main Q:Reg/w0 : wti 0x16756a0, action 4, ptr (nil)
9744.300890048:main Q:Reg/w0 : wti 0x16756a0, action 5, ptr (nil)
9744.300894505:main Q:Reg/w0 : wti 0x16756a0, action 6, ptr (nil)
9744.300898883:main Q:Reg/w0 : wti 0x16756a0, action 7, ptr (nil)
9744.300903665:main Q:Reg/w0 : wti 0x16756a0, action 8, ptr (nil)
9744.300908515:main Q:Reg/w0 : wti 0x16756a0, action 9, ptr (nil)
9744.300912910:main Q:Reg/w0 : wti 0x16756a0, action 10, ptr 0x7f13940023f0
9744.300918720:main Q:Reg/w0 : wti 0x16756a0, action 11, ptr 0x7f13940068f0
9744.300941837:main Q:Reg/w0 : wti 0x16756a0, action 12, ptr 0x7f13940504b0
9744.300951981:main Q:Reg/w0 : wti 0x16756a0, action 13, ptr 0x7f1394462e10
9744.301055305:main thread : main Q:Reg: waiting 1499ms on worker thread termination, 1 still running
9745.800919600:main thread : main Q:Reg: timeout waiting on worker thread termination
9745.800976996:main thread : sent SIGTTIN to worker thread 0x9b57a700
9745.801024045:main thread : main Q: regular shutdown timed out on primary queue (this is OK)
9745.801030518:main thread : main Q: checking to see if we need to cancel any worker threads of the primary queue
9745.801035688:main thread : sent SIGTTIN to worker thread 0x9b57a700, giving it a chance to terminate
9745.811204331:main thread : cooperative worker termination failed, using cancellation...
9745.811246502:main thread : wti 0x16756a0: canceling worker thread
I already tried starting/stopping with the firewall off, according to the suggestion in this thread in the support forum (where I got no help)
The tail end of strace shows
6700 14:13:02 write(6, "Oct 6 14:13:02 hostname rsyslogd: [origin software=\"rsyslogd\" swVersion=\"8.16.0\" x-pid=\"6696\" x-info=\"http://www.rsyslog.com\"] exiting on signal 15.\n", 146) = 146
6700 14:13:02 futex(0x1a3516c, FUTEX_WAIT_PRIVATE, 99, NULL <unfinished ...>
6696 14:13:02 <... futex resumed> ) = 1
6696 14:13:02 futex(0x1a3516c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1a35168, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
6700 14:13:02 <... futex resumed> ) = 0
6700 14:13:02 futex(0x1a34f80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
6696 14:13:02 <... futex resumed> ) = 1
6696 14:13:02 tgkill(6696, 6700, SIGTTIN <unfinished ...>
6700 14:13:02 <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
6700 14:13:02 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_TKILL, si_pid=6696, si_uid=104} ---
6700 14:13:02 rt_sigreturn({mask=~[KILL STOP TTIN RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
6700 14:13:02 futex(0x1a34f80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
6696 14:13:02 <... tgkill resumed> ) = 0
6696 14:13:02 futex(0x1a34f80, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
6700 14:13:02 <... futex resumed> ) = 0
6700 14:13:02 futex(0x1a34f80, FUTEX_WAKE_PRIVATE, 1) = 0
6700 14:13:02 writev(14, [{"\25\3\3\0\32\0\0\0\0\0\0\1qv\373\273}\275\276\252\207\\~\7\343p\237[\343'\363", 31}], 1) = 31
6700 14:13:02 recvfrom(14, <unfinished ...>
6696 14:13:02 <... futex resumed> ) = 1
6696 14:13:02 futex(0x1a35014, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1475763183, 729652779}, ffffffff <unfinished ...>
6700 14:13:02 <... recvfrom resumed> "\25\3\3\0\32", 5, 0, NULL, NULL) = 5
6700 14:13:02 mprotect(0x7fd3145d8000, 16384, PROT_READ|PROT_WRITE) = 0
6700 14:13:02 recvfrom(14, "w_\345'\310\3473\3\\q\32 \37\276\237A\270a\260\200\24\34p\262G\277", 26, 0, NULL, NULL) = 26
6700 14:13:02 close(14) = 0
6700 14:13:02 writev(25, [{"\25\3\3\0\32\0\0\0\0\0\0\1\20\3535\fh\5\324bZ\211U\203AX.\37\205`m", 31}], 1) = 31
6700 14:13:02 recvfrom(25, "\25\3\3\0\32", 5, 0, NULL, NULL) = 5
6700 14:13:02 recvfrom(25, "\307y\343\27z#\31\373\232\333\371\263\275R\203\31\332?)>\201\245\301z\335;", 26, 0, NULL, NULL) = 26
6700 14:13:02 close(25) = 0
6700 14:13:02 writev(24, [{"\25\3\3\0000\26f\310\217\23\247d?\4\3\301\252\277+\207\250q[9L\306\274\2414}\306\237\253X\221\35\27:\376\206\203\36PP\177\363\362I\321\223\372\203W", 53}], 1) = 53
6700 14:13:02 recvfrom(24, <unfinished ...>
6696 14:13:03 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
6696 14:13:03 tgkill(6696, 6700, SIGTTIN) = 0
6700 14:13:03 <... recvfrom resumed> 0x7fd31449c17b, 5, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
6700 14:13:03 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_TKILL, si_pid=6696, si_uid=104} ---
6700 14:13:03 rt_sigreturn({mask=~[KILL STOP TTIN RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
6700 14:13:03 recvfrom(24, <unfinished ...>
6696 14:13:03 futex(0x1a34fe8, FUTEX_WAKE_PRIVATE, 1) = 0
6696 14:13:03 tgkill(6696, 6700, SIGTTIN) = 0
6700 14:13:03 <... recvfrom resumed> 0x7fd31449c17b, 5, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
6700 14:13:03 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_TKILL, si_pid=6696, si_uid=104} ---
6700 14:13:03 rt_sigreturn({mask=~[KILL STOP TTIN RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
6700 14:13:03 recvfrom(24, <unfinished ...>
6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.)
6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.)
6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.)
6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.)
.
.
.
and it repeats until
6696 14:14:32 select(0, NULL, NULL, NULL, 0x7ffe5e77b200 <ptrace(SYSCALL):No such process>
6700 14:14:32 +++ killed by SIGKILL +++
6696 14:14:32 +++ killed by SIGKILL +++
Turns out that one of the logging services had an issue with it's certificate. Interestingly enough it didn't complain about the certificate at all during start or operation, only upon exit.
The exact issue was one or more of these:
So I am not sure if I can have multiple certs within the same file, despite what is said in this answer.
All in all, if you are shipping to multiple services, I'd recommend using multiple configuration files.