We deployed our new Linux/Exim/Spamassassin mail server on Friday (always a good idea to deploy the day before a long weekend when no admins are around). The load has been hovering around 1.3 on the 15-minute average.
The machine is responsive, and mails are delivered in reasonable time. Can we assume that this is acceptable?
How is a certain amount of load deemed acceptable or not acceptable? What metrics are used?
The load average is a value that give an idea of the number of processor needed by the kernel to be able to run all task when they need to without waiting.
In you case, if you have 2 or more CPU/Core. There is no problem. If you have only 1 CPU with 1 Core it's means that there is 'too much' time between the time your app want to run and the time the kernel run it. A load > "number of cpu/core" will not be a problem for a mail system until it reach a too high value for a too long time.
Of course they're is no rule and value to give and while you get your mail in a short time it's ok. But you probably need to start to look closely at your server when the load is higher than 2*number of cpu/core too much often for a 'long' period (~1 hour).
Again for a mail server this will not be a big problem but it will start to mean that you server is a bit overloaded.
Basic rule of thumb: if the system is responsive, if it's working in a timely fashion, then you're fine.
Loads below two aren't much of a worry. I've had systems hit four or five and still work fine, although that would be an indicator that there's a lot of queuing issues with the network or drives (I/O issues can cause high loads even though the system is very responsive).
Check your mail queue lengths periodically and the logs for undeliverable issues and problems of that nature. If the delivery queue stays relatively low that's fine.
You can much around with getting disk averages and network I/O information but if you're not seeing delivery issues (I sent the message fifteen minutes ago and it hasn't arrived yet!) and you can work on the system via console (or ssh) without a lot of latency, you should be fine.
As always with tuning related questions, there are no yes/no answers, it all depends :-)
Having said that, a load of 1.3 doesn't sound high, especially if you have a multi-core CPU configuration. If the load number is the same as the number of cores, then all the cores always have a process ready to run.
Ultimately, if, as you say, the messages are being delivered in a timely fashion then the performance is fine :-)
will give you basic metrics in near enough real time.
A load average less than the number of cpu's you have means there are cpu's sitting around with nothing to do. Equal means they're all working at the moment. Greater means there are processes that could be running, but are stuck in line waiting.
For super time sensitive stuff like a voip server or memcache you want your load avg to be well under the number of cores. For asynchronous stuff that can live with the occasional backup (like email) you could easily run 4x the number of cores.
The biggest caveat to remember is that processes that are waiting for disk or network i/o but are otherwise runnable still show up in the load average. So if you've got an apache server spoonfeeding jpg's to 56k users you can run a much higher load average than if you have it firing back php/whatever-script responses to a proxy/loadbalancer over a gigabit LAN. In your case a smtp connection to some slow mailserver thats taking forever to transfer an attachment will show up a 1 process on the run-queue, but could get interrupted a twenty times to send out a quick one-liner email to gmail without issue.
Push comes to shove, load average is like the DOW. It doesn't actually in any way measure the "economy", people just use it as a very loosely correlated metric cuz its easy to talk about. Focus on monitoring metrics you actually care about, like delivery queue depth and messages per second.
How many cores do you have? cat /proc/cpuinfo | grep processor | wc -l
(caveat: hyperthreading looks like more cores, but it isn't)
If your load level is under your processor count, then you're generally OK.
Also take a look at top and hit '1' and you can watch each CPU's individual load.
Yes, that's pretty acceptable, and generally something to be expected with a mail filter.
Our setup is a bit different. We have a separate server for SpamAssassin, while our POP server runs ClamAV to scan for viruses. The POP server is generally running under a server load of 2, but occasionally spikes up to 10 or more. Our SpamAssassin server on the other hand, used to run around 2 until we also installed the Openprotect.com filters, when it doubled the CPU usage and is now running under about 5 with spikes above 15. This is still acceptable because we don't have any delays in mail that result in a growing mail queue (we use qmail for incoming SMTP), and there's still room to spare CPU usage/memory wise.
Coincidentally, I highly recommend Munin for monitoring your servers. It does a great job of visually demonstrating historical data and showing you what resources you have to spare. Monitoring in real time with Top(1) doesn't help you much. :)
Oh, and by the way, deploying on the Friday before the long weekend is a great way to work through the whole weekend. Especially for critical systems like a mail server.
How's the memory comsumption? Is it stable or growing?
The load doesn't seem out of the norm. If the mail server is responsive, and the mail is going through i'd say the only measurement of failure beyond memory consumption would be if the wrong emails are getting through (spam).
Mind you today would be your first real test. I'd probably monitor it lightly today. If something is going to go wrong, now would be the time.