Does anybody have experience with both Linux and Windows failover redundancy clusters, and if so which do you prefer for file server and/or web server?
For a little background. We set up and adminstered a Microsoft cluster for several years under Windows 2000. The cluster was a pair of web servers with a fairly large (for the time) RAID array for multimedia storage being served to the web - 100s of thousands of mp3 commercials being served to radio stations. We had a number of things we did not like about this scenario. First, the Microsoft cluster used a shared storage array. Even though it was hot swap RAIDed, all it took was a slight corruption to the NTFS file system on the drive and all of a sudden you were down for several hours while ChkDsk ran.
So on the next build we bought into a product called NeverFail - http://www.neverfailgroup.com/ This product replicates the data between the primary and secondary server automatically keeping it synchronized at the block writing level. This has eliminated the problems we had with shared storage. But it has introduced its own issues. Any restart requires a data resync where the system analyzes everything for synchronization. While the system is up and available during this sync, on a server with less than a terrabyte of mp3 files, this takes several hours. And a typical Microsoft patch session requires a couple of these resyncs. So it often takes us upwards of 2 days to patch the 2 machines. As a result we find ourselves putting off patching and not doing it as frequently as we should which is not ideal. And the process is touchy and has to be followed specifically.
So we are considering moving the main site with all of this content to a pair of LAMP boxes with Linux HA and DRBD.
So I am curious if anybody has experience administering both Linux and Windows clusters who might tell me what they experienced. Specifically we are wondering about resync time on restarts, etc, and overall experience administering such a Linux system.
While we have trditionally been a windows shop, we now have a guy who knows Linux in house and I am learning as well now and have added a number of Linux boxes to our system, so we are open to that from an Administering point of view.
I love the Linux HA stuff, and DRBD is now reaching a very high level of awesome. The Windows equivalents have never provided anywhere near the same level of stability and configurability in the situations I've run into them.
First a few observations, I'm sure others will have more data. DRBD has been around longer than the native windows directory sync tools, so it may be more robust. Second, Windows 2008's DFS/Replication technology has been rewritten to perform better. It hasn't been around as long as DRBD, but it promises to be able to replicate large directories between multiple servers. DFS/Replication doesn't to it at a block level like DRBD does, just the file/directory level. Full resyncs with DFS/Replication are online, rather than offline, so you won't have the same service outage you had with neverfailgroup.