I have a 500G SATA hard drive in my machine that all the sudden started giving me I/O errors, until Linux simply disconnects the drive. Reboot, and then it works for a random period before failing again.
The drive is within warranty, but I've had bad experience with shops that are unable to reproduce a problem, as the drive doesn't fail all the time. Then they simply send me a bill and the drive back.
What is my best course of action to make sure they can reproduce the problem?
Update: Those of you who have recommended the diagnosis tools, it's a good valid answer, expect as stated in my question, I'm running Linux and these tools do not exist for it. As for 'gaming' the store, it's not about that. The drive is well on its way of being completely unusable without any help from me. I'm just talking about speeding up the process.
Update 2: I don't really know why I decided to ask this here. I was hoping for suggestions like 'do a badsector test', 'try to stress the drive with copying random data to it with dd'. I will say this again, so stop suggesting it or suggesting me not to... I will not in any way terminate my warranty by messing with the hardware itself, that includes; bulk erasers, huge magnets, too much power or anything that will show up when the drive is eventually sent back to the manufacturer.
Does the drive manufacturer have a utility to check the drive?
Often they will provide a utility that you can boot with that will run some diagnostics - this should probably be your first step. Check there website and download if available
I think the best thing to do is to call them and discuss this situation - any form of 'gaming' them is likely to be pointless and very possibly counterproductive. These people are used to dealing with a range of customer's problems and I would imagine they'll be happy to help you if you ask.
I would recommend using SpinRite as well as the manufacturer's tools. I have previously used it to recover data on a dead drive. The great thing about SpinRite is that it can detect the rate of errors (errs per MB).
Usually when RMA'ing a drive, they make you include a status code of some kind from their diagnostic tools.
I'm not known for my huge amount of patience, so I'll just answer this myself. Maybe this will help someone later.
Badblock check
Write stress test
Read stress test
SATA disconnects
I have a USB-to-(S)ATA adapter that is capable of reseting the USB device if the disk stops responding at any point. This serves as a work-around when Linux disconnects the drive for too many i/o errors.
I would also recommend ensuring the controller on the bottom side of the drive isn't getting too hot. This sounds like a heat issue to me.
If you're able to eliminate heat as a cause, then I would call the manufacturer. I've never had a problem when talking with the manufacturer and getting an RMA first. When sending it in, I would also recommend including a detailed description of exactly what you've seen.
Give it a good workout: http://www.textuality.com/bonnie/
Few days of that should show if it really is about to kark it.
Bonnie is in most distro's repositories IIRC.
The best way to finish off a dying hard drive?
If you've got a rubber mallet, whack it with that - it'll break something internally, but not leave any marks.
Time tested solution - but only if it's under warranty!
Drive manufacturers usually provide diagnostic utilities that you can run before sending in the drive. Once you get I/O errors out of their utility, you can include the log and they'll be less likely to contest your problem.
I can strongly recommend to NOT fool them with the "tricks" one may have heard about (high voltage, microwave oven, bulk tape eraser). They are used to handle such things much more often than you or I.
Possibly they aren't really testing the drive thoroughly.
Give them your documentation of the problem. If that doesn't satisfy your agreement, you have a fundamental problem.
Based on your description, your problem could be an interaction between the controller and the drive. For example, your controller could be bad at handling a marginal drive. Or you could have a bad controller.
Ideally your agreement with the vendor would specify whether it is expected/guaranteed to work with your controller - or it would involve them taking responsibility for the controller (and driver) as well.
I have seen a lot of SATA drives that misbehave in the manner you describe - sometimes as part of the normal course of business, sometimes while in the process of failing. Sometimes it is admitted to be a firmware bug. 500GB drives were especially bad in my experience.
You will help your case significantly by repeating the problem with a different controller, since odds are there is no promise for the drive to work with any particular controller, or you would not be having this problem.