I am wondering what's the purpose of a BBU. My first understanding was, that it enables the cache to write the data to the disc during a power failure. But some specifications say that a BBU can hold its data for up to 72h. I'd expect the data to be written to the disc within milliseconds (given, that the disc still has power, too).
So should a BBU not just protect the cache, but the whole disc for some seconds, too? Wouldn't that be even more secure, because the cache data is written to the disc instead of being around in the cache and waiting for power again? After a second or so, the disc could be shut down.
It doesn't power the disks, it just keeps the data in the cache for (in this case) up to 72 hours until you bring the machine back on line. When you power the machine back up it will write the contents of the cache back out to the disks.
All it does is protect against a power failure. If (for some reason) the machine loses power without cleanly flushing the data out to disk the battery keeps the cache contents alive until you can restart the machine.
It is not a UPS for disks, as the disks could be in an external disk array, or even on a different power circuit. Even a UPS could fail.
It works like this:
Most operating systems have a system call that allows a so-called "synchronous write". This means that during a write operation, if a write has completed then it's guaranteed that it was committed to disk.
Synchronous write is therefore non-cached. It blocks the application until it has completed. This kind of operation is obviously slower than cached write which keeps data in OS memory until disk is idle enough and then writes the data.
Some critical software, such as database software, perform synchronous writes for critical data because a half-written update in case of a power loss can be detrimental to the database integrity.
RAID controllers are notoriously slow with RAID-5 writes so this becomes a problem if your application software uses a lot of synchronous writes. For this reason, RAID-5 controllers are equipped with their own caches.
What the RAID controller does is it writes the data to its cache instead and LIES to the OS, telling it that it committed the data to disk whereas the data is actually still in RAID cache.
But what if power was lost while the data was still in RAID controller's buffer? You'd have a half-written and probably inconsistent data on your disks.
You may say that this behaviour defeats the purpose of a synchronous write... if it was ok to have a cached write then the app software wouldn't ask for a sync write in the first place.
The compromise is this: RAID controller still lies to the OS that it committed the data to disk, but to protect this critical data in case of a power failure, RAID controller has a battery that keeps the cache alive for some time until power can be restored.
So after the power comes back and the disks spin up and initialize, the controller still has that data in its cache thanks to the battery and can finish writing your transaction to disk.
Everyone's happy.
This is why RAID controllers usually won't let you enable write cache unless you have a functional and charged battery unit.
It's worth mentioning that some newer disk controllers now come with high-speed-flash cache that retains the data for far longer than the typical 72 hours, it is often quite a lot larger too (~1GB). If you need part details let me know.
Think of that BBU cache as adding a similar level of protection to that afforded by a journaled file system. It's there in order to allow transactions, simple writes in this case, to be completed if they are interrupted by a power failure. Once power drops the controller cannot continue to write, as that would result in completely unpredictable results. Instead, it holds the data as long as it can and will finish writing it if/when power resumes. What it does not do is act like a UPS for the drives.
Getting that $100 battery is a must, especially on a DB server, even though power failures are rare. Even if you have transactions enabled, and your server loses power before those changes have left the cache and are committed to disk, you will be left with an incomplete query, or corrupted data.
If your server crashes, hangs or someone pulls the power cable a BBU will protect you from corrupted or lost data, if you are using write-cache. Using an USV only protects you from power failure.
If you don't want to use write-cache, you don't need a BBU.
A RAID card can have 1 GB of cache; even though it will not usually be all used for a write cache, you can assume it will store quite a long queue of unwritten data.
Filesystems and databases assume that their synchronous writes are not re-ordered even in the case of a power failure. Normally a synchronous write will only return after the data is on disk, but this is relatively slow. RAID cards improve performance by grouping smaller writes together and re-ordering them to be less random.
If there was no BBU, a power failure under load would have disastrous results, with writes the RAID card had promised to be there being lost (like in case of a filesystem, you may have references to a new file or directory, but lack said file or directory, even though the filesystem specifically created the new file before any references to it to avoid this), requiring you to restore from backups or just hope your data is not too badly messed up. Even worse, if someone deleted a secret file and someone else created a world-readable file, it may happen that some of the contents of the secret file are found in the world-readable file. Once you break the assumptions the filesystem is built on, anything is possible.
Assuming a UPS guarantees uninterrupted power is naive; what if the machine crashes and you need to pull the power cord, or someone trips on it?
Consumer SATA disks (and SSDs) sometimes cache synchronous writes, too, but their caches are much smaller and consumer use is less demanding so they can usually get away with it.
Modern RAID controllers also have flash, to which they copy the contents of the write cache in the event of a power failure, so the battery does not need to last for more than a few seconds.