Ping a Specific Port

Question

hegemon

Asked: 2012-06-07 00:08:06 +0800 CST2012-06-07 00:08:06 +0800 CST 2012-06-07 00:08:06 +0800 CST

Postfix logs missing information on delivery status (postfix/smtp message)

772

I noticed a problem with postfix logs, that information on some of emails delivery status is missing. The issue affects about 1% of emails.

"Healthy" log:

<server># grep 8EB992EFBB44 postfix_log/mail04.log
Jun  5 03:09:29 mail04 postfix/smtpd[8537]: 8EB992EFBB44: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  5 03:09:29 mail04 postfix/cleanup[34349]: 8EB992EFBB44: message-id=<[email protected]>
Jun  5 03:12:02 mail04 postfix/qmgr[76377]: 8EB992EFBB44: from=<[email protected]>, size=48845, nrcpt=1 (queue active)
Jun  5 03:15:12 mail04 postfix/smtp[35058]: 8EB992EFBB44: to=<[email protected]>, relay=mx.baz.com[xxx.xxx.xxx.xxx]:25, conn_use=70, delay=343, delays=153/190/0/0.24, dsn=2.0.0, status=sent (250 ok)
Jun  5 03:15:12 mail04 postfix/qmgr[76377]: 8EB992EFBB44: removed

"Broken" log:

<server># grep F3C362EF37CA postfix_log/mail04.log
Jun  5 04:03:27 mail04 postfix/smtpd[39666]: F3C362EF37CA: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  5 04:03:27 mail04 postfix/cleanup[41287]: F3C362EF37CA: message-id=<[email protected]>
Jun  5 04:03:28 mail04 postfix/qmgr[76377]: F3C362EF37CA: from=<[email protected]>, size=48892, nrcpt=1 (queue active)
** here should be a log line from postfix/smtp but there is none **
Jun  5 04:03:29 mail04 postfix/qmgr[76377]: F3C362EF37CA: removed

Background information:

system: FreeBSD xxx.xxx.xxx 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64

Postfix is installed inside jail. Logs are on the same machine, log dir is mounted thru nullfs. The site has spikes of heavy load, causing disks (local) to run at 100%.

Update

The log is rotated daily, current size is ~ 500MB.

I made a test by queuing 99000 messages to same destination (in order to rule out dns/network/mx issues). 5715 messages don't have any DSN record. Failed messages queue time is spread evenly over time, i don't see any time-bound issues.

Some undelivered emails:

  envelopeid  |       processed_time
--------------+----------------------------
 8D7652EF3BAE | 2012-06-06 13:19:11.072715
 DD53A2EF3C5C | 2012-06-06 13:33:24.374783
 8C52F2EF4E3F | 2012-06-06 13:39:15.810616
 BBC572EF525C | 2012-06-06 13:44:22.762812
 E95822EF54D1 | 2012-06-06 13:52:01.134533
 839DD2EF4FBB | 2012-06-06 14:13:48.511236
 017EE2EF6234 | 2012-06-06 15:04:48.618963

Those are a few picks, such records of undelivered email occur almost every second.

<server># egrep '(8D7652EF3BAE|BBC572EF525C|017EE2EF6234)' mail04.log
Jun  6 13:19:10 mail04 postfix/smtpd[20350]: 8D7652EF3BAE: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  6 13:19:10 mail04 postfix/cleanup[21024]: 8D7652EF3BAE: message-id=<[email protected]>
Jun  6 13:19:10 mail04 postfix/qmgr[7939]: 8D7652EF3BAE: from=<[email protected]>, size=63718, nrcpt=1 (queue active)
Jun  6 13:19:11 mail04 postfix/qmgr[7939]: 8D7652EF3BAE: removed
Jun  6 13:44:22 mail04 postfix/smtpd[20346]: BBC572EF525C: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  6 13:44:22 mail04 postfix/cleanup[24811]: BBC572EF525C: message-id=<[email protected]>
Jun  6 13:44:22 mail04 postfix/qmgr[7939]: BBC572EF525C: from=<[email protected]>, size=63758, nrcpt=1 (queue active)
Jun  6 15:04:49 mail04 postfix/smtpd[20344]: 017EE2EF6234: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  6 15:04:49 mail04 postfix/cleanup[35585]: 017EE2EF6234: message-id=<[email protected]>
Jun  6 15:04:49 mail04 postfix/qmgr[7939]: 017EE2EF6234: from=<[email protected]>, size=63706, nrcpt=1 (queue active)
<server>#
<server># find /var/spool/postfix/active/ -type f -print | wc -l
       1
<server>#

IMPORTANT: As you can see above some of emails doesn't event have the removed line.

2 Answers

Voted

adaptr · Answer 1 · 2012-06-07T04:57:55+08:00

Best Answer

adaptr

2012-06-07T04:57:55+08:002012-06-07T04:57:55+08:00

WhIch postfix version is this ?

Have you tried tuning syslog ?
If the disk is at or near its I/O capacity, syslogs may be delayed longer than you expect.

Also, what is "mail04.log" ? How big is it ? Are you certain the entirety of that message was logged in that file ?

Send the logs offsite using either UDP or (better) TCP remote syslog and check that the message is missing from all logs.

1

Sergey Vlasov · Answer 2 · 2012-06-07T05:23:12+08:00

Sergey Vlasov

2012-06-07T05:23:12+08:002012-06-07T05:23:12+08:00

If any Postfix processes are configured in $config_directory/master.cf to run in the chroot jail, you need to configure syslogd to listen on $queue_directory/dev/log in addition to the default /dev/log socket (so that chrooted processes will have access to /dev/log inside chroot). If you don't do this, initially things seem to work, but logging from chrooted processes breaks when syslogd is restarted for some reason (which may happen during log rotation).

0

Postfix logs missing information on delivery status (postfix/smtp message)

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?