In the IT world I just won the Lottery Twice....
Today we had a hard drive fail in a raid array. A few hours later we had another drive fail on a different server.... We started checking all the environmental logs and systems immediately. Humidity is 40%, temp is at 75*, no dust or other particulates flying around. We checked the UPS logs, no spikes reported. About 3 hours later another hard drive failed on a 3rd system....
To recap 3 HP DL380 G7's, these servers are all sequential serial numbers. The drives are not from the same lot though I bet the array controllers and boards are. HP will be out in the morning.... In the meantime we are hoping this does not become a habit... We have had 1 drive fail in this entire server rack in 2.5 years. Today 3 within 12 hours!
What else should we be looking for? Has anyone else had a similar problem?
Any help is greatly appreciated. This incident has consumed our spares.... If we have another fail we will be looking for HP to swap them.
Update: These are 146 GB 10k rpm SAS Drives and one 300 GB 10k rpm SAS Drive. HP original equipment.
These things happen... You'd be surprised what I've seen withe the same equipment at scale.
You did right by checking your environment for ESD, temperature and power issues.
Being ProLiant DL380 G7 units, the array controllers are embedded on the system board. Lot numbers aren't controlled too tightly there. I don't think this is anything beyond coincidence. However, this may be a good time for some firmware updates, as false drive failures are sometimes symptomatic of bad revisions.
Since you have support, let HP deal with the parts/replacement and move on :)
BTW - It would be helpful to detail the drive capacities and type involved (SAS, SATA, Nearline SAS)