Ping a Specific Port

Question

Soviero

Asked: 2012-07-12 21:07:29 +0800 CST2012-07-12 21:07:29 +0800 CST 2012-07-12 21:07:29 +0800 CST

What do these disk errors in syslog mean?

772

I just rebooted my monitoring server for the first time in a while, and the following starting filling the screen:

Jul 11 23:52:30 monit kernel: [   25.255908] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 11 23:52:30 monit kernel: [   25.256170] ata1.00: BMDMA stat 0x24
Jul 11 23:52:30 monit kernel: [   25.256278] ata1.00: failed command: READ DMA
Jul 11 23:52:30 monit kernel: [   25.256410] ata1.00: cmd c8/00:c0:20:68:35/00:00:00:00:00/e0 tag 0 dma 98304 in
Jul 11 23:52:30 monit kernel: [   25.256416]          res 51/40:9f:41:68:35/00:00:00:00:00/e0 Emask 0x9 (media error)
Jul 11 23:52:30 monit kernel: [   25.256809] ata1.00: status: { DRDY ERR }
Jul 11 23:52:30 monit kernel: [   25.256933] ata1.00: error: { UNC }
Jul 11 23:52:30 monit kernel: [   25.304388] ata1.00: configured for UDMA/66
Jul 11 23:52:30 monit kernel: [   25.304430] ata1: EH complete

. . . 

Jul 11 23:52:30 monit kernel: [   25.552451] sd 0:0:0:0: [sda] Unhandled sense code
Jul 11 23:52:30 monit kernel: [   25.552462] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 11 23:52:30 monit kernel: [   25.552475] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor]
Jul 11 23:52:30 monit kernel: [   25.552490] Descriptor sense data with sense descriptors (in hex):
Jul 11 23:52:30 monit kernel: [   25.552498]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Jul 11 23:52:30 monit kernel: [   25.552529]         00 35 68 41 
Jul 11 23:52:30 monit kernel: [   25.552543] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed
Jul 11 23:52:30 monit kernel: [   25.552559] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 35 68 20 00 00 c0 00
Jul 11 23:52:30 monit kernel: [   25.552587] end_request: I/O error, dev sda, sector 3500097
Jul 11 23:52:30 monit kernel: [   25.556607] ata1: EH complete

I already know I need to replace the HDD (Cost of Data > Cost of HDD), but I want to know for my own knowledge what's actually wrong with it.

Yes, our monitoring server has no RAID, just one HDD... Don't look at me...

5 Answers

Voted

mgorven · Answer 1 · 2012-07-12T21:12:18+08:00

Best Answer

mgorven

2012-07-12T21:12:18+08:002012-07-12T21:12:18+08:00

sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed

Looks like the drive has bad sectors and is unable to reallocate these (possibly because it's run out of spare sectors). The output of smartctl -a /dev/sda would give you more information on the state of the drive.

17

womble · Answer 2 · 2012-07-12T21:15:18+08:00

womble

2012-07-12T21:15:18+08:002012-07-12T21:15:18+08:00

Lassie's saying "arf! arf arf! arf!". Which is dumb, because this has nothing to do with Timmy or wells. This is why you don't take sysadmin advice from dogs.

The drive is giving you an "Unrecovered read error - auto reallocate failed", which basically means "I tried to read, I failed, I tried to recover (read the sector a few more times, apply some ECC, and move the data to a sector that isn't broken), and it didn't work". This probably means (as mgorven says) that the disk is chock full of reallocated sectors already, because the disk's been dying for a while, but I also think it can mean that it wasn't able to recover the sector at all (repeated reads + ECC failed to get a good-looking data block).

Either way, yeah, the drive's very, very cactus. Your data isn't looking real healthy, either.

12

Wolfgang Noichl · Answer 3 · 2014-04-11T11:08:11+08:00

Wolfgang Noichl

2014-04-11T11:08:11+08:002014-04-11T11:08:11+08:00

I know this is old, but just in case someone is still reading this post: "DD will also try to read the broken sector(s)" - gddrescue is useful here. It doesn't (okay, it does, but only once).

3

rackandboneman · Answer 4 · 2012-07-12T23:25:48+08:00

rackandboneman

2012-07-12T23:25:48+08:002012-07-12T23:25:48+08:00

Make a dd image or rsync copy of that disk now++, unless you have a full backup allowing a convenient restore of that box. And start looking for a compatible and working replacement disk.

BTW, UDMA/66, is that a ten year old PATA disk?

1

Pierz · Answer 5 · 2017-10-26T11:44:16+08:00

Pierz

2017-10-26T11:44:16+08:002017-10-26T11:44:16+08:00

As already mentioned it likely means your drive is nearing its end of life but not necessarily immediately - you should run an fsck on the disk and try to repair the errors (see smartmontools wiki for advice fixing bad blocks) and the disk may be ok for a while longer.

But you should start running smartd (which comes as part of the smartmontools package) and keep an eye on its reports and/or set up email notifications. Also you can add custom notifications of your own by creating scripts (in /etc/smartmontools/run.d/) that are called by the smartd-runner.

0

What do these disk errors in syslog mean?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?