I have a Windows Server 2008 x64 Standard virtual machine that runs on a machine with a hardware RAID controller, a Perc 6/i, which has a battery on-board.
Doing everything I can for additional performance, I think I should disable this. Is this very dangerous though?
My understand is that Battery Backed Write Caching gives a performance boost to the host OS, telling it the write was complete when they are still sitting in flash waiting to be written.
However, I can't see how it would be detrimental to performance, but is there a gain (even if marginal) to enabling it / disabling it?
P.s. There machine has a backup power.
Here is a screen shot for clarification:
Whenever a Windows application is writing data to disk, this data is written to the host memory first.
A "normal" write request returns immediately after the data is in memory and a queue entry indicating that this data needs flushing to persistent storage is created. A mechanism called the Lazy Writer ensures that this queue is being processed periodically (1/8th of the queue is flushed by the lazy writer every second by default). This is the mechanism you are disabling by unchecking "enable write caching on the disk" - every write request would need to wait until it has been acknowledged as "written" by the storage device before it returns.
Applications with specific requirements for data integrity (databases, filesystem drivers) do have options for a more intelligent approach to caching. For writes which need immediate persistence (NTFS journal, database transaction logs) FILE_FLAG_WRITE_THROUGH can be specified with the write. In this case, the write call would not return before the data is actually committed to persistent storage. Unless you activate the "Enable advanced performance" checkbox which causes the cache manager to ignore FILE_FLAG_WRITE_THROUGH, return the call immediately and pass it to the lazy writer as every other "normal" write.
As you have two additional layers of caching1 (numer one is your host operating system running the KVM hypervisor, number two is your storage controller with a BBWC/FBWC), things are getting more complicated. Each of these layers would provide you with similar choices and as every write request has to pass through all of them, the weakest link of the chain will be effective to your data's integrity.
Application developers at large do know and understand the effects of caching and write through calls. So really critical data parts are written with FILE_FLAG_WRITE_THROUGH while everything not written with this flag can be considered as safe to be cached in volatile memory. The trouble starts when FILE_FLAG_WRITE_THROUGH is being ignored at any layer and the data is actually lost in the case of a power outage or a software failure. Such conditions usually result in corruptions in filesystems and transaction logs, leading to unpredictable results and maybe even requiring you to restore from a backup, so this obviously should be avoided. If your storage controller's cache is "battery-backed" or "flash-backed", it can be considered "non-volatile" to a certain degree, so it generally is considered safe to use its write-back cache even for write-through requests2.
The bottom line: it generally is safe to "Enable write caching on disk" unless you are dealing with broken applications which are not using FILE_FLAG_WRITE_THROUGH but need every write to be persistent. Disabling this would not hurt too much in your case as most calls should be handled by the storage controller's write cache and return nearly immediately (but you likely would have additional overhead from this though and the cache size would be limited by the controller's DRAM). You never should "Enable advanced performance" or "Turn off Windows write-cache buffer flushing on the device" on a system where you value uptime or require data integrity.
Further reading:
MSDN Libary - Windows File Caching
Smallvoid blog - description of the hard disk cache
1 actually there even is another layer of caching at the hard disk itself, but in most cases the write through request is honored no matter what the drive's cache settings are. Some flash drives are notable (read: broken) exceptions to this rule though - with flash SSDs, writes are usually cached and reported as written immediately but only committed to volatile cache - not just for performance reasons but also to coalesce writes and prolong the life time of the flash cells. The "enterprise" versions of flash SSDs usually have capacitors which would ensure the drive has enough power to flush the cache to flash cells, the "consumer" versions often don't - beware of those.
2 it obviously is not safe under all circumstances - if the battery is defective and it goes undetected, if there is a bug in the controller's logic handling the power failure case, if the power outage period exceeds the time the battery is able to provide power, if the supercaps or the flash cells of the FBWC go bust, data is going to be lost. But these occurrences are commonly rare enough to take the risk.
You can disable buffer flushing, safely (advance write caching) - which is WRITE BACK CACHE.. IF.. and ONLY IF you have THREE things:
The computer/server itself has a UPS backup. This prevents a sudden loss of power leaving unwritten or partially written data. Granted this situation could happen with NO write caching at all if the power went out JUST AS DATA WAS BEING WRITTEN... but the chances are rare. The UPS keep the computer up
However, that does NOT protect you from hard locks, BSODs that can not fully dump the cache to disk (only about 50% of BSODs can dump the cache), sudden reboots (something with the OS or hardware goes wrong and the system just instantly reboots).. and finally the INSTANT SHUTDOWN... this is Rare but usually caused by CPU or CPU or chipset overheat and the bios will just SHUT off the power to prevent damage
You can help to combat this IF the hard drives THEMSELVES have their own power backup. Any cached data in windows will be lost, but the cache on the drive ITSELF will be written to.
With modern versions of windows Vista, 7, 8.x, etc You have the option of READ ONLY, WRITE THROUGH, WRITE BACK (advanced). If you have a RAID controller, you can control the cache of the controller and HD itself usually as OFF, READ ONLY, WRITE BACK, WRITE THROUGH... however, the controller/hd write back cache will NOT enable unless WINDOWS has the same settings enabled.
BACKUP your computer. As long as you do weekly or monthly FULL backups with NIGHTLY incremental, and then a special, quick, incremental every 4 hrs backing up critical data (this backup should run no longer than 10 mins as to not affect performance). If you do 1, 2, and 3, I would say ENABLE WRITE BACK.... ESPECIALLY ON A HOME PC...
AS for a business.... I'd stick to WRITE THROUGH.. it wont give nearly the benefits of write back (write back speeds up writes up to 5Xs faster as it it no only tells the OS/programs that the data has already been written. The cache writes the data to the disk in the most OPTIMAL/FAST fashion to prevent read/write head thrashing, etc). So write back can do AMAZING fast things... but for a business, your best best is a RAID array.. either SIMPLE or complex (nested)
You could go RAID 0 for ABSOLUTE speed, but then if there's an issue you have downtime and must restore from backup... you can go RAID 5... and even if 1 drive OUTRIGHT FAILS, everything will keep running normally (but slower), until you replace the drive (most servers are hot swap-able.. and many home raid systems, let you have a spare drive - if you choose that option that can be added in instantly in case of failure)
Raid 5 is VERY fast on reads.. but takes a performance hit on writes because it has to write the CRC bits (stripes)...
Other options are NESTED raids..
For example.. you can take 3 drives and make them RAID 0... then take 2 more sets of 3 drives and make them raid 0... then you take the 3 raid 0 volumes and combine them into 1 raid 5 volume. This will allow ONE raid 0 array to fail, and it all keeps working.. you could have all 3 drives in one r0 array fail and you are ok... BUT if 1 drive in array A fails at the same time a drive in array B.... it's ALL gone.. AND it's restore from backup time...
You can also combine MIRRORING and raid 0 where you raid 0 2-6 drives for example... then make another raid 0 drive to match.. then MIRROR them. This will still take a small write hit for mirroring but mirroring actually speed up read speed as it grabs a chunk from drive/array A and grabs the next chunk from drive/array B at the SAME time.
My best advice to ANYTIME @ HOME is this:
MIRROR your C: (Windows drive).. assuming windows is on C... even if it's an SSD.. MIRROR IT! This won't help corruption.. backups are for that.. but it WILL save you a LOT of downtime on Hardware failure
THEN make yourself a D drive... make THAT your HUGE, FAST powerhouse array... and do it how YOU want it... my D: drive is a Raid 0 array of FIVE 4TB HDs... and yes if ANY fails.. ALL Data is lost...but I full backup once a month and incremental backup Daily and a special one every 4 hrs to a usb 3.0 raid box that can hold up to 6 4TB drives (and that's what I have in it and it's RAID 5).. I DO NOT RECOMMEND D being SSDS! HYBRID HDDSD are ok.. but 1) you won't find 4TB SSDs.. and if you do they cost a FORTUNE. Use RAIDING @ SATA 3 level MINIMAL.. and you can get HDDs faster than an SSD.
Then make yourself an E: drive... make it a VERY SMALL SSD... no more than 100-200GB.. or whatever the smallest is you can find
Now why all those drive letters... I'll explain:
C: is mirrored, slow to write, FAST to read.. use for WINDOWS only... don't install ANYTHING on this drive unless the program REQUIRES IT.. even if you install a program, game to D, it often puts stuff on C even if you don't want it to. The point is you don't want C: overloaded with loading windows then loading all the services PLUS preloading all the stuff your programs require PLUS PRE-CACHING it all from C:/ You could EASILY End up with a system that makes you wait 5-20 MINUTES to stop thrashing before you can use it. C: SHOULD be an SSD like 300GB in size.. this will help things boot faster. With everything installed on D:, as windows fires up, the things programs need to load will come from D: vs C: and take the load of C: ... also superfetch kicks in about 15 seconds after the login screen comes on.. and you DONT want it all thrashing C even if it IS an SSd... you want it ONLY loading windows stuff and let it thrash D because D: should be AT LEAST 3 or more drives in a RAID 0 or 5 config. This way you can log into windows right away and windows will be snappy
If you have a LOT Of RAM (16-32GB) AND you have
largediskcache
on in the registry, windows can EASILY take up to an HOUR to pre cache things BUT it will background that priority if you start using the PCD: Drive... put everything here.. it's your main repository.. have windows relocate the documents, pictures, music, contacts, etc folders to folders on D: With all your stuff on D: Your programs will load FAST and read and write FAST
E: is basically for a pagefile only.. but you can also configure it , if you have windows 8 for the drive for file history backup... this way you have your regular backups and windows file history that can instantly restore a corrupt, accidentally deleted file, etc
Swap files... you want one on EVERY drive.. why? Windows uses multiple swap drives sort of like a raid array. It will use them to read/write in parallel , AND if a drive is BUSY, it will EXCLUDE that drive drive from paging activities until usage goes down.
For C: drive, I recommend ONLY the minimum.. this varies depending on RAM (and I am assuming a 64bit OS as everyone should be on 64bit windows now a days).... for 8GB systems, the min is 400MB, 16GB=800MB, 32G=1.6GB.. the reason for this is the min size is REQUIRED for a small/mini dump in the event of a BSOD and lets windows write a file that can help explain what went wrong. You CAN opt to NOT use a pagefile at ALL on C: to improve the performance of The system drive. This is perfectly OK as long as you don't care about BSOD reports - especially if u never or RARELY BSOD
D: will be a BIG drive and FAST.. and it can handle paging activities even with you using it. However once drive activity hits 100%, it will scale back using the paging file... but this page file should be more like 8GB
E: make it 8GB
This will give you 16GB of pagefile space and it can use D&E in parallel. if D is busy, then it will just use E: and E: will be FAST since it's an SSD... even if E is an HDD or HDDSD, I still recommend this set up.
One last thing to consider.. since most of your activities will be using D:.. you can probably put a good size page file on C: as that won't slow you down if you are using an INTENSE program on D:
8,8,8 is a little on the overkill size no matter if you have 4.8.12.16.32.64GB of ram. I'd go 4,4,4 or 5.5.5 for 12 or 15GB pagefile.
Many people will say if you have 16+ GB of ram to NOT use a pagefile - WRONG!
You can set windows that way, but it WILL IGNORE you and set up temporary page files on all HDs, without you knowing it. Many programs are HARD CODED to swap idle code to the pagefile... no pagefile breaks this code. Windows is DESIGNED to put idle code into the paging system to leave more room for the HDD cache. So there really is NO WAY to FORCE all code to ram by turning off the paging files.
By actually USING page files... you are speeding up your system
The other option is READY boost.. many people argue against it.. but im a huge supporter... it will ONLY work on HDD or HDDSDs... it will NOT work on SSD drives... no point as an SSD it outright MUCH faster than the thumb drive....
But a GOOD/high quality USB 3.0 THumb drive that's 32-64 GB can READ/WRITE at 150-200MBytes/sec.... usually it takes MANY SATA 3 HDDSDs in raid 0 to achieve this speed level.
I use it... and it is only used for my D drive.. it's 64GB, you must NTFS format it, otherwise, it's limited to 4GB.. and 64GB is the largest a RB drive can be... Once I activate this, my system spends 2-3 HOURS loading it up... with slower devices, Windows only puts small, quick to read files... with a FAST RB drive, Windows 8.x will store ENTIRE files on there.. as in 1,2,3,5, etc GB files. My raid array can READ at 175MB/s and write at 150MB sec.. my RB drive is 200/200... so Windows LOADS it with 64GB of data.
I have 32 GB of ram.. so windows puts the most demanding files in RAM cache... then files that aren't so high priority (or if certain files keep getting kicked out of ram cache)..... in cases like this, windows may put a high priority file in BOTH locations.
People say that RB is a WASTE on systems like mine... WRONG.. when I change areas in Dragon Age Inqusition WITHOUT the RB drive, it takes about 60s to load the new area (when the data is NOT cached to ram).. but if it's already on the RB drive, but not in RAM, it loads in 30s. Unlike previous versions of windows, Windows 8.x makes the data on a RB drive PERSISTEN, and your cache will persist beyond shutdowns, reboots, hibernates, and sleep. However, if windows thinks it was removed for any reason, it wipes it and starts over....
In general this is all correct. Essentially the data to be written is stored in memory somewhere near the physical disk, be it on the disk controller, the RAID controller or the storage device controllers. Heck it could even be on a caching card before being written to the actual physical disk.
The default is usually an acceptable solution as unless the server is a database server or other high-disk traffic service a power failure is unlikely to affect too much.
There are usually two things to consider:
I have only ever used the second checkbox in an iSCSI setup with a dedicated SAN controller that had two on-board controllers as well as redundant power supplies all the way to the breaker. We were writing DB and VM data across the SAN so any loss of power was never a good thing.
To make things more clear, there can be several levels of cache, from top to bottom:
The cache is always a volatile memory, usually RAM, much faster than the disks. The problem to use it for writing is that, if the systems turns down for any reason, ranging from power loss to hardware failure, or even software crash, in the case of OS caching, the data in the cache will be lost. And losing data is always serious. The best case is corrupting a document you are working on. But can be tragical: for example corrupting any important OS file, DB file, or disk partition table info. Take into account that the problem is usually serious because you usually don't "lose the last changes", but get a corrupted file.
The snapshot in the question looks like the configuration for OS (Windows) cache / disk cache of a removable disk. So, you should disable it, unless you don't mind this kind of trouble. For example, if you're writing some kind of sequential streaming, like video data, you'll only lose the last part, and that can be acceptable. In most cases, it isn't.
The BBWC is always setup in the RAID configuration utility. For example in HP SSA or ACU, where you can:
Take into account that in case you're using a RAID controller, the physical disk configuration, including the disk write cache, is handled by the controller. The OS knows nothing about the physical disks, because it only sees the logical drive (LUN) and have no control or knowledge of what is behind it. In this case, if any kind of caching is offered, is OS caching. And is not battery backed.
NOTE: the BBWC works in this way:
This even allows to recover the data in the event of hardware failure, but you need to solve the problem before the batteries get empty, usually 2 or 3 days at most.
It's interesting to mention the case of HP FBWC, which quickly moves the cached data from the volatile RAM to a non-volatile memory, so that there is no problem if the battery gets empty. In this case, it doesn't matter how long it takes to solve the problem, the data will not be lost. Theoretically, you could even move the disks and controller to a different server, and keep your data safe.