Ping a Specific Port

Question

damko

Asked: 2010-02-12 03:29:45 +0800 CST2010-02-12 03:29:45 +0800 CST 2010-02-12 03:29:45 +0800 CST

Strange issues with different raid controllers. Can it be due to environmental issue?

772

We are working on a project which involves different hardware all hosted in a single rack. The machines are mainly IBM servers: 2 x206 (scsi), 1 x226(scsi), 2 x3400(sata) and another assembled machine with sata controllers. We are using several raid controller. Some machines have only one Serveraid controller, others have one or more controllers not always Adaptec ones. All the firmwares and bios are updated. All the servers and connected devices are under ups.

Over the last 4 months we experienced several strange behaviours in our hardware. Suddenly and randomly we loose 2 or 3 drives and the raid volumes stop to work. It can happen once a week but never at the same time of the day or week.

Most of the times a rebuild process fixes the problem, sometimes we loose the data. Very often we just need to unplug the raid controllers, restart the server and the problem is fixed.

At the beginning we thought it was due to firmware bugs but we performed an accurate update for every machine and raid controller and there is nothing else we can do on the hardware. We have really no hint on what's causing all these troubles.

We are starting to think that it's an environmental problem but we don't know if there could be something interfering with our hardware. Have you ever heard of something like that? Do you have any idea on how to investigate the problem?

2 Answers

Voted

dyasny · Answer 1 · 2010-02-12T03:45:17+08:00

dyasny

2010-02-12T03:45:17+08:002010-02-12T03:45:17+08:00

This can easily be due to firmware bugs, not on the controller, but on the drives. Seen that much too often to count.

1

joeqwerty · Answer 2 · 2010-02-12T05:34:48+08:00

joeqwerty

2010-02-12T05:34:48+08:002010-02-12T05:34:48+08:00

If I had drives from different vendors on RAID controllers from different vendors in servers from different vendors failing at an abnormal rate, I'd start looking at high temperatures and poor airflow in the server room as a potential cause of the problem.

0

Strange issues with different raid controllers. Can it be due to environmental issue?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?