Our Plesk Linux Ubuntu 64-bit mail server has extremely high load and we don't know how to isolate it. The load was okay will two weeks ago but in the last two weeks it's seriously deteriorated.
The mail server has been running for years and we have had sporadic performance issues. Normally we reduce the load by turning off all SPAM checks until the problem is sorted (which sometimes resolves itself).
Currently we have turned of real time block lists, SPF checking and we have attempted to turn off SpamAssassin.
No matter what we do the SpamAssassin check box stays ticked in the GUI. Out of desperation we have done /etc/init.d/psa-spamassassin stop. For years we haven't been able to do SpamAssassin because it kills the server. We would like to use it but performance is more important for now.
We cannot turn off Greylisting. The moment we turn off Greylisting our help desk is inandated with calls. Out of desperation we investigated truncating the Greylisting database which is now 2.5 GB big but we abandoned this after noticing turning of Greylisting doesn't improve the performance at all.
We have no anti-virus. It's just more load and Dr. Web never really worked that well for us. But we'll try that if it will make a difference.
We have implemented Postfix Anvil. This seems to have made the situation worse so we disabled it. We’re not sure if this is the case.
Our current mail server is configured to forward all SMTP to a relay server. We did so to reduce the load. This helped a lot because outgoing queues are generally empty.
We are running in an Expand configuration. The mail server has about 12 000 accounts of which maybe half are active.
We have read through this document: http://www.postfix.org/STRESS_README.html but there are too many settings and we don’t know which ones to choose.
Please assist urgently. We need advice on how to fix this problem before all our clients abandon is.
The only clue we have is that there are 100s of these processes:
30 13205 1 0 13:18 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10027 before-queue 30
13207 1 0 11:38 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10027 before-queue 30
13208 1 0 13:18 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10026 before-remote 30
13209 1 0 11:38 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10026 before-remote 30
13213 1 0 13:18 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10027 before-queue
The problem might have been caused by lag that developed over a couple of days. At first, for about 9 days, the Perl backup process ran but didn't always complete during office hours when the load was heavy. I has a +- 60 GB backup file.
We did
on the Perl processed this but the bad performance persisted. Eventually we rebooted the server and added spamhaus RBL in addition to spamcop. After the reboot the server returned to normal load.
We enquired with Parallels for $75 and they advised that the disk is under performing. The next step in the isolation process will be SAS or another high performance drive.