A: Because the whole purpose of a RAID is to make sure that nothing in the world can interrupt that accidental rm -rf / (or DELTREE /X C:\), not even yanking the power chord in panic.
Q: But whats the difference between redundancy and a backup?
A: If you accidentally overwrite your PhD thesis with garbage, redundancy ensures that you have multiple copies of garbage, in case one gets bad. A backup ensures that you can restore your PhD thesis.
(And an archive ensures that you can retrieve multiple older versions of your thesis, and a version control system also tells you why you made a new version in the first place.)
Redundancy protects you against your hardware failing. It does not protect against user error, nor against malicious activity (e.g., crackers getting into your system).
The number one reason you want a backup is not because the physical media died (this is rare), but because of some error that caused the data to be lost or corrupted.
RAID doesn't protect you against a file being deleted.
RAID doesn't protect you against a file being overwritten.
RAID doesn't protect you from your system being compromised and all of your data being overwritten, deleted, or corrupted.
RAID doesn't protect you from your ops team accidentally paving a machine with important data on it.
RAID doesn't protect you from a foolish DBA running a drop command on the production server (mistaking it for a test environment).
RAID doesn't protect you if the building burns down.
P.S. http://ma.gnolia.com/. This is what can happen if you don't have good backups. Your site is snuffed out of existence (note: this tends to be bad for business).
Redundancy is great if one of your disks fails. It's no so great if your computer gets a virus, or you mistakenly delete a file, or you need to restore the disk to a previous version for some other reason. That's when you need a backup.
RAID helps you recover from failures, but backups let you go back in time.
It should also be mentioned that a hardware fault in the raid controller can easily corrupt the data on all attached disks. So while you reduce the danger from disk failures you add the danger of raid controller failures.
Even if a backup copies corrupt or bad data, the point of a backup is that you can and should have multiple copies. For instance, last hour, yesterday, last week, etc. You can get a similar effect from using rotating snapshots on your storage device.
But the other reason for backups is geographic redundancy. You should certainly keep copies of critical data in two different geographic locations. How separate those locations are depends on how critical the data is; keeping copies in two different buildings in the same city protects against fire or theft. Keeping copies in two different countries protects against bigger problems.
RAID can be a great way to mitigate risks due to hardware failures, but RAID won't help you when your users delete (accidentally or otherwise) their data. To recover data you need some archival facilities, either through local snapshots or online/offline backups.
In a RAID5 array, consisting of disks over 400Gb, if you lose a disk there's something like a 75% chance of having an unrecoverable read error while the array is being rebuilt. Think about that for a second and it becomes pretty obvious why someone will always remind you that "RAID is not a backup".
RAID gives you higher reliability and performance, but it's not infallible.
RAID guards against one kind of hardware failure. There's lots of failure modes that it doesn't guard against.
and more.
A: Because the whole purpose of a RAID is to make sure that nothing in the world can interrupt that accidental
rm -rf /
(orDELTREE /X C:\
), not even yanking the power chord in panic.A: If you accidentally overwrite your PhD thesis with garbage, redundancy ensures that you have multiple copies of garbage, in case one gets bad. A backup ensures that you can restore your PhD thesis.
(And an archive ensures that you can retrieve multiple older versions of your thesis, and a version control system also tells you why you made a new version in the first place.)
Redundancy protects you against your hardware failing. It does not protect against user error, nor against malicious activity (e.g., crackers getting into your system).
See: Why Mirroring is Not a Backup Solution for a hard-earned lesson.
The number one reason you want a backup is not because the physical media died (this is rare), but because of some error that caused the data to be lost or corrupted.
RAID doesn't protect you against a file being deleted.
RAID doesn't protect you against a file being overwritten.
RAID doesn't protect you from your system being compromised and all of your data being overwritten, deleted, or corrupted.
RAID doesn't protect you from your ops team accidentally paving a machine with important data on it.
RAID doesn't protect you from a foolish DBA running a drop command on the production server (mistaking it for a test environment).
RAID doesn't protect you if the building burns down.
P.S. http://ma.gnolia.com/. This is what can happen if you don't have good backups. Your site is snuffed out of existence (note: this tends to be bad for business).
Redundancy is great if one of your disks fails. It's no so great if your computer gets a virus, or you mistakenly delete a file, or you need to restore the disk to a previous version for some other reason. That's when you need a backup.
RAID helps you recover from failures, but backups let you go back in time.
It should also be mentioned that a hardware fault in the raid controller can easily corrupt the data on all attached disks. So while you reduce the danger from disk failures you add the danger of raid controller failures.
Asked in a comment to the accepted question:
Even if a backup copies corrupt or bad data, the point of a backup is that you can and should have multiple copies. For instance, last hour, yesterday, last week, etc. You can get a similar effect from using rotating snapshots on your storage device.
But the other reason for backups is geographic redundancy. You should certainly keep copies of critical data in two different geographic locations. How separate those locations are depends on how critical the data is; keeping copies in two different buildings in the same city protects against fire or theft. Keeping copies in two different countries protects against bigger problems.
RAID can be a great way to mitigate risks due to hardware failures, but RAID won't help you when your users delete (accidentally or otherwise) their data. To recover data you need some archival facilities, either through local snapshots or online/offline backups.
In a RAID5 array, consisting of disks over 400Gb, if you lose a disk there's something like a 75% chance of having an unrecoverable read error while the array is being rebuilt. Think about that for a second and it becomes pretty obvious why someone will always remind you that "RAID is not a backup".
RAID gives you higher reliability and performance, but it's not infallible.
Fire, theft, RAID controller fault, human error, the list goes on