Ping a Specific Port

Question

Nate

Asked: 2011-06-30 08:27:10 +0800 CST2011-06-30 08:27:10 +0800 CST 2011-06-30 08:27:10 +0800 CST

Is it normal for a SAS drive to have a few bad blocks, or should I replace my drive ASAP?

772

I have a drive—part of a RAID 1 mirror—that has two bad blocks. Adaptec Storage Manger e-mailed me when it detected the blocks. It shows 4 medium errors for that drive, but state is still “optimal”.

This is my first time using Adaptec RAID controllers. I don’t know if an occasional bad block is normal, or if I should immediately replace that drive.

Update: The drive failed later the same day!

The disk subsystem is:

Adaptec 6405 with ZMM
(2) Seagate near-line SAS drives (ST31000424SS)

The other drive hasn’t reported any bad blocks yet. I am running a consistency check.

Adaptec Storage Manager screenshot

4 Answers

Voted

Chris S · Answer 1 · 2011-06-30T09:43:00+08:00

When drives are used in an array, the controller will set Time Limited Error Recovery. This will cause disks to report medium errors if they can't immediately read the data. This doesn't mean that they will not recover from the read error, or that the sector is completely unreadable.
_{(Cheap SATA drives do not support TLER, and will cause the read operation to hang while the drive tries to recover the data; this is just one of many reasons cheaper SATA drives shouldn't bused in arrays; this of course doesn't apply to this particular question)}

If the disk determines that the sector is unreadable, it will remap the sector. The original bad sector will not be reported up the chain, so software running on the OS has no way of knowing. The only thing you can do is lookup the SMART report and see if/how many sectors have been remapped. Many sectors being remapped is a good indication of bad things to come. SMART may also report how many times the disk has experience a soft error vs a hard error.

In any case, SMART pre-failure prediction has been less than helpful; a Google SMART Study backs that up.

Mark · Answer 2 · 2011-06-30T09:18:45+08:00

Mark

2011-06-30T09:18:45+08:002011-06-30T09:18:45+08:00

Large drives have lots of extra space for moving bad sectors, I've seen hundreds of sectors replaced over the course of 2 weeks and then had the drive keep going for another month (RAID6 so we didn't rush).

If it keeps alerting you each day with a few more replaced sectors then I'd replace it before it fails. One burst of bad sectors when you first use the drive isn't scary at all but a continuing condition usually means particulates in the enclosure or a damaged read/write head.

4

James · Answer 3 · 2011-06-30T08:36:09+08:00

James

2011-06-30T08:36:09+08:002011-06-30T08:36:09+08:00

I have not used SAS drives, but I have had regular SCSI drives and IDE drives that get a few bad blocks and then work for years without any other problems. The S.M.A.R.T. status should tell you when a drive is declining and risking failure.

Also, as long as you are using RAID, other than RAID 0, then you are protected in case of a failure.

2

Nate · Answer 4 · 2011-06-30T15:21:01+08:00

I don’t usually answer my own question, but in this case I have a definitive answer: replace the drive ASAP. The drive in question failed later the same day.

In the early AM hours I had received three e-mails that looked like the following. That’s how I knew the drive had bad blocks, and was the only warning:

======================================================================================

ADAPTEC's EMAIL NOTIFICATION MANAGER (Instant)

======================================================================================

Event Type   : Warning
Event Source    : storage@HV2.domain.local
Date     : 06/29/2011
Time     : 05:29:03 AM PDT

--------------------------------------------------------------------------------------
Event(s) List With Description
--------------------------------------------------------------------------------------
Bad Block discovered: controller 1 (21a6e00).
--------------------------------------------------------------------------------------

**[Note]:   This message was generated by the Adaptec Storage Manager Agent.
Please do not reply to this message.

By the end of the day, it had failed.

Is it normal for a SAS drive to have a few bad blocks, or should I replace my drive ASAP?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?