I have had a pair of 5 year warranty WD Velociraptors hardware stripped on an intel ICH8R motherboard controller for about 1.5 years.
The other day, the volume randomly failed during no specific activity and the RAID bios indicated one of the drives had failed.
I did extensive diagnostics with Spinrite and WD Diag on each drive and they picked up NO surface issues, no sector errors, and no SMART warnings.
I then recreated the volume with the same drives, restored from backup, and have been up and running fine for 2 weeks now with no issues.
What happened?
Are my drives okay? Can there be something unhealthy with one of my drives that the diags are not picking up?
You ran into the worst problem with stripe only arrays. RAID0 is completely unforgiving any IO interruptions. If any drive bobbles you will need to rebuild the array from scratch. This is why I almost always RAID level 1 or higher.
Many things can cause a drive to have temporary IO issues: power fluctuations, heat, vibration, and dirty connections are just a few. Dust in the system can buildup and cause airflow problems and heat buildup. Dust can also work its way into connections.
You may want to clean the inside of your machine to remove the dust and gunk that builds up and re-seat all of the drive connections. Measure the internal temperature, not just on the system board but near or between the drives. Add airflow if the temperature seems too warm. This should take care of heat and dirty connections as a problem.
Power problems are a different beast all together. If you have adequate power and filtering it shouldn't be a problem. If you are hanging the machine off of mains power without any sort of line conditioning or UPS you are just asking for problems.
Occasionally I have seen otherwise healthy drives/raid controllers dump drives simply because they did not respond to a controllers request in a reasonable amount of time.
Are your SATA cables tight and not blocked in anyway? Reseat them and check the ends for any bent, damaged, or crimped cables.
Are you running the latest BIOS?
Are you running the latest drivers (in Windows)?
I believe older versions of the drivers on that specific chipset had some issue related to RAID, though I can't find the specifics.
You may also want to try using ports 3-5 (see Intel's documentation) If all else fails, consider a 3ware raid controller.
That's a pretty impressive amount of troubleshooting that I have to admit, after all that I'd be astonished to think anything was wrong with the hard drives. But after reading your post a little further, I think I found the problem.
Now since you're going for speed rather than data redundancy I can see why using the on-board controller seems appealing but in reality nearly all on-board RAID controllers (especially for consumer grade motherboards) are crap. Highpoint, Intel, nVidia... all crap.
To Rik's point about power, that's actually a good point. Fluctuations in the power can have an averse affect on computers in general but also the hard drives. It might be easier & cheaper to use a UPS (uninteruppted power supply) for your computer to deal with the power issue.
Since you run RAID 0, I'd say there's always a risk of something going wrong. Good thing you have a backup image elsewhere. I'd have to say though I doubt anything is wrong with your drives. Running Spinrite, WDDiag and looking for SMART info is pretty thorough. In all likelyhood, I'd blame the on-board controller. I've run software RAID, on-board controller RAID (both years ago) and now hardware RAID and I can without a doubt say that software and on-board were ultimately a complete waste of my time. I can't speak specifically to RAID 0, but if I had to guess what the issue was, I'd look to the controller.
If money is not an issue, I'd say get a hardware RAID controller in addition to a UPS. 2 port RAID controllers aren't too expensive and ironcially enough, I never run RAID 0 so I can't even attest to how a better RAID controller (from 3Ware, Areca, LSI, Adaptec, etc.) would do but I'm more certain that a PCIe RAID controller from one of the manufacturers I listed would be less likely to randomly corrupt your stripped array.