We have an LTO-3 Tape drive in a Dell media library that we use for our tape backups. The article about LTO on Wikipedia states that:
LTO uses an automatic verify-after-write technology to immediately check the data as it is being written, but some backup systems explicitly perform a completely separate tape reading operation to verify the tape was written correctly. This separate verify operation doubles the number of end-to-end passes for each scheduled backup, and reduces the tape life by half.
What I would like to know is, do I need my backup software (Backup Exec in this case) to perform a verify on these tapes or is the verify-after-write technology inherent in LTO drives sufficient?
I would also be curious if Backup Exec understands the verify-after-write technology enough to alert me if that technology couldn't veryify the data or will it just ignore it making it useless anyway since even if the drive detecs a problem I would never know about it.
Great question!
Whilst I would say that yes you should test them, I'd say that testing the tapes/drives in themselves is important what is much more vital is testing the end to end restoration process.
I can't recommend enough regular full system restorations and service testing, it's the only way to know for sure that the entire system is doing what you bought it for. You don't have to look far on this site to see people who struggle to restore their service even though they thought they'd covered all the steps individually.
Hope this helps.
First of all this automatic verification is no substitute for end-to-end verification. I have seen drives shipped with a firmware bug that caused restore reading to be less reliable than verification reading.
The outcome of that was that you could write the tapes without any errors being reported, but upon trying to restore you would see reads getting errors or dropping in speed by several orders of magnitudes.
Most customers never noticed this firmware bug. According to the vendor because the customers didn't actually perform test restores. This particular bug got fixed. But I'm sure we haven't seen the last firmware bug, and some firmware bugs will only be discovered if you actually test real reads.
What happens when the verification fails is that the firmware automatically writes a second copy (and during restore the firmware transparently to the host returns only one of the two copies). This means that available capacity varies depending on drive health and media quality.
If too many write attempts fail in the verification read an error is reported back at the SCSI level. One would think an error reported this way is hard to miss at the software layer, but bugs in code paths that are only triggered by flaky hardware are notoriously difficult to test for.