Is there any kind of best practice when it comes to verifying the hardware of a new server, before putting it into production?
For instance, running it through the manufacturer's hardware test, or running memtest on it for x amount of hours?
-Josh
Is there any kind of best practice when it comes to verifying the hardware of a new server, before putting it into production?
For instance, running it through the manufacturer's hardware test, or running memtest on it for x amount of hours?
-Josh
I like to run through the quick memtest tests, but it doesn't actually generate a lot of load, so it's more of a verification that nothing is horribly wrong than it is a system burn-in.
Then, I install and run
kcbench -a -r -n -n 50
. This runs a kernel compile in a loop (using all CPUs), which approximates a lot of our real load, and kcbench is available in Fedora and EPEL, so it's within easy reach. And as a bonus, I get a simple benchmark number giving me an idea of the performance of the new hardware.Afterward, check dmesg for errors.
The phrase you're looking for is "burn in". I typically use the UCBD and run memtest for a days, and perform an extended drive test for whichever hard drive manufacturer. I have not had enough problems with new processors to convince me to test them as well.
For a lot more information check out this community wiki.
Testing a server before installing an OS
Personally I've never done any of this on a production box. If I get a however-many-thousand-dollar box from HP or Sun or whoever, I expect them to ship me a working unit. All of the early-life failures I've had have all been in the first day or so of operation. Why waste time doing burn-in when you can just spot the problem when you install the OS or configure the machine?
Then again, all of our machines are automatically configured by Puppet, so if something dies right before it goes into production, we just rack up another machine and press the "go" button again...