I'm planning to move two separate vmware ESXi 4.0 hosts into a cluster to use the vMotion and HA capabilities of ESXi (essentials plus). Right now we have a dev vm host and a production vm host and want to be able to use the dev vm host as an HA host if the production vm fails. We use almost exclusively dell Servers in our shop.
I'm considering an Equal logic PS4000x SAN with 9.6 TB 10K drives or a PowerValut MD3000i or MD3200i series. I have a few questions. Which is a better solution for vmware hosts and adding storage? I'm getting mixed feedback from dell that we would need something more than a switch in between the SAN and the ESXi hosts to allow us to use the vMotion and HA capabilities. Is that true?
Which SAN options would allow us to grow the datastores and SAN storage the easiest in the future? We currently have a File server serving 2 TB and growing of data we would like to P2V. Are the Dell solutions a bad choice?
Should we go with NetApp hardware? Our budget is fairly flexible, not crazy, but the ability to easily and cost effectively add storage in the future is important.
As far as comparing the two SANs are concerned they are both aimed at similar sized environments but there are differences that you should bear in mind, some of which I'll discuss below. For your environment both will do what you want without a problem and will scale up to supporting 3-4 hosts and 10-15 or so average VM's without much trouble. You might get away without any switches for your SAN if your choose the MD3xxx models but I wouldn't recommend it.
There's no compelling reason in what you say you require that would require NAS functionality rather than an iSCSI SAN. An entry level NetApp NAS isn't particularly more suitable than one of these Dell entry level SANs in general but they are very capable shared storage solutions that would meet your needs and depending on specifics (e.g. if you wanted single instance storage support) might be more suited.
Support for VMware Cluster Services
Both of these arrays support all VMware cluster functions that require "shared storage" - (vMotion, HA, Fault Tolerance, DRS\DPM). From a storage perspective what you need is for all VMware hosts to have [redundant] connectivity to the SAN and both of these can provide that without any issue. Depending on the model and what you are trying to do you may not need any switches for your SAN but you probably will.
Note VMware HA\FT\vMotion\DRS etc all require separate cluster networking interfaces in addition to your iSCSI SAN connections. Ideally those should be resilient (two or more interfaces per function connected to separate physical switches) - those requirements are entirely separate to your SAN infrastructure - that should be separated from everything else as much as possible.
The PS4000 provides much better native integration with vSphere's vStorage API's than the MD3000i or MD3200i. This means you will be able to offload things like snapshotting, cloning etc to the array hardware rather than relying on software for these storage functions.
SAN Switches.
For single array environments you don't need any switches with an MD3000i\MD3200i. You can directly attach these, redundantly with one iSCSI GigE port from each SAN controller, to two separate host servers but you have then maxed out all connections and have zero expansion\scalability as far as adding more servers in future. I've only seen this done once but it works fine if you can live with those limitations and it does remove a layer of complexity and possible failure.
You cannot do this safely with an Equallogic array if you have more than one server - your servers have to be able to see all EQL interfaces (active and passive) for the architecture to work safely.
If you are opting for a proper switch based SAN you don't have to have two switches but I wouldn't touch a SAN that didn't have redundancy at the network fabric level. What happens when you have to move something or carry out a firmware update on the switch? I'd regard a SAN as a liability if I couldn't confidently walk up to one of the switches and power it off.
As a general rule you should not mix iSCSI SAN traffic and normal traffic on the same switch. If you have no choice then make sure you use VLANs to keep it separated at layer 2. If you fail to do this performance will suffer significantly and there are some nasty security problems that might bite you badly.
Avoid cheap switches, well avoid really cheap switches at any rate.
Equallogic
PROs
The Equallogic architecture is designed to scale out - EQL SANs increase in capacity and performance as you add arrays. The PS4000's are limited entry level models and only scale up to two arrays when you only have PS4000's but if you buy any PS6000's you can mix and match and then scale up just as if all the arrays were PS6000's. This scale out works very well from a performance perspective on Windows and is pretty good on vSphere 4 and getting better with 4.1.
The Equallogic architecture is very simple to set up, manage and monitor. Adding capacity is very simple ["Just plug in another array"] and because of the way the architecture works performance scales at the same time. The SAN HQ monitoring console is very useful, free and easy to install.
The Active\Passive controller solution works very well in my experience - remember that each controller manages 16 disks at most and can happily cope with the IOPs levels needed to saturate theses disks (2000+). The big performance challenge with the PS4000's are their limited (2) number of iSCSI ports and that keeps throughput below around 200Megabytes/sec for a single array. Add another array though and that number doubles - add in a PS6000 and you would get triple the aggregate bandwidth. I've pushed a mixed 10K SAS 7.2K SATA 4 node PS6000 Group to over 7000IOPs and 1.6Gbytes/sec under test conditions, in the real world your mileage may vary but they certainly do scale out.
Equallogic Arrays handle failed disks very well and one of the benefits of their conservative defaults are that you need to have a lot of failures or multiple failures in rapid succession to cause issues. I've never seen an Equallogic array fail in production by the way - and in testing it usually requires pulling 4 or more disks to force an array to go offline.
CONs
The Equallogic architecture is not well understood - there are some inflexibilities in the design that can be problematic if you don't factor them in up front - the limitation of one disk type and raid type per array being one, the requirement for lots of bandwidth between arrays in a multi-array setup is another.
Equallogic solutions use a lot of disk capacity. All SAN solutions do to some degree but with EQL defaults and recommended reserves for snapshots and replication it can be a bit of a shock: Take your 9.6TB PS4000 - at a guess this has been quoted to you with this capacity but it comes with 16x600GB drives and you wont get to use anything like 9.6TB of that raw capacity. For starters you get about 520GB usable storage from each 600GB disk in an EQL environment, then with Equallogic's default hot sparing policy (2 for RAID10 or RAID50) and selecting RAID-50 (two more disks worth of space used for parity) that translates into 6.2TB of basic user capacity. If you want to use their (very powerful) hardware snapshotting effectively you need to limit the capacity you plan to present to servers to around half of that - so usable capacity drops to 3.1TB. If you choose to go for best performance on this array and select RAID-10 you get a basic usable capacity of 3.6TB, or 1.8TB usable if you plan to use snapshots extensively. That gets even worse in larger environments where you use hardware replication between arrays - your usable capacity can drop as low as 1.2TB from an initial 9.6TB of capacity.
You are limited to a single RAID type and disk size per array and you cannot isolate logical volumes from each other if you only have a single array. Even when you have multiple arrays that can be hard to do well. So presenting storage to a database server where you might want to isolate DB, tempdb and log volume IO from each other isn't possible. With EQL you have to trust the array(s) to manage the IO isolation for you.
You can only buy the arrays with either 8 or 16 disks installed and I think Dell are discontinuing the 8 disk option. If you want to buy 5 disks now and add a couple more later you are out of luck.
In a multi-array environment if you lose one array all volumes in the pool that array belongs to will go offline unless you have made some very specific [performance limiting] design choices.
The real strengths of the EQL design only kick in when you have about 4 arrays or more in a Group.
MD3000i\MD3200i
PROs
These are more traditional monolithic Storage Arrays. Conceptually they are nice and simple and work just like most hardware RAID controllers, just scaled up - buy the quantity and size\speed of dsks you want; carve up disks into RAID Groups, then carve logical disks from those. Add more disks as you grow, and extending by adding MD1000\MD1200 disk enclosures is a simple process.
You get to choose how you want to mix and match RAID types within a single array if you like. On an fully stocked 15 drive MD3000i you could have a 5 drive RAID 5 pack of 600GB disks, a separate 3 drive RAID 5 pack of 600GB disks and a 6 drive RAID 10 pack of 15k disks for some dedicated high performance volumes. That's a bit harder to manage than the EQL approach but it allows you to make specific design choices that you can't with EQL. If you are working at the scale where you need under about 30 disks in total then the MD3000i is much more flexible as a result.
The MD32000i has double the controller bandwidth that the MD3000i has - unless the MD3000i is dirt cheap I would not choose it over an MD3200i for that reason alone.
Adding relatively large amounts of cheap (from a SAN perspective) storage is viable provided capacity and not performance is your primary concern. I've put in a couple of fairly large MD3000i installs for archive type storage - with RAID 6 and 2TB drives a single MD3000i and deliver ~72TB of usable storage. With that many SATA disks it's perfectly fine for archive\backup to disk type uses but you wouldn't want to be using that as primary storage for lots of virtual machines.
CONs
The MD3000i is pretty long in the tooth now - it's very noisy and power hungry. The MD3200i is a lot better in that regard - and almost certainly cheaper to run over its lifetime because of that.
Management capabilities are a bit basic - automated alerting is not handled by the array but by the Management application that you must keep running somewhere. Performance monitoring isn't as slick as that provided by EQL's SANHQ.
As you scale up past a single array the MD3000i\MD3200i expansion is via SAS connected MD1000\MD1200 disk trays - that doesn't scale well from a performance perspective when you start heading past 30 disks and maybe even earlier.
Scaling up performance is much harder with an MD3000i\MD3200i array - if you find you need to deliver (say) 1000 more IOPs to an existing volume you'll have to buy a lot of extra disks, move a lot of stuff around, build new RAID packs, present new volumes, migrate data and hopefully get to where you want before you run into limitations on the SAS bus. With Equallogic you would just add an array to the pool containing the volume in question and it will pretty much deliver that automatically.
I've recently implemented an EqualLogic PS4000XV, connected via 2 x Powerconnect 6224 switches to a pair of R710 ESXi hosts running off SD cards.
Performs very well for HA, vMotion etc. You can get MPIO working between the SAN and hosts fairly easily (just a bit of esxi remote cli futzing).
The only drawback of the PS4000XV that I've seen so far is the storage processor setup - You have a pair of SPs in the unit but only one is active at one time. The other one sits there with its ports offline. So you're losing some possible performance there, as you're talking via a max of 2 or 3 interfaces. If you fail-over to the other SP there's a time lag while it negotiates its ports and spins up. I'm assuming the higher spec PS units don't have this limitation and can provide an active/active SP configuration.
Dell's recommendation for the SAN is to configure it as a single RAID 50 array, so you end up with 6.2Tb usable.
Your expansion option with this model is basically 'buy another unit'. However you just plug it into the same storage fabric (switches), and it can all be managed through a single 'group' IP and console... so not much of a management overhead.
Regarding HA in VMWare - When you enable HA it only happens for a single VM... You don't tie two VMs together. The VM is spun up in lock-step across 2 hosts at once, and memory is synced between the hosts so that if one host fails, the other host instantly picks up the slack. So that method doesn't fit with 'use the dev VM as DR with HA'.
If you're planning to literally use your dev VM as a DR fail-over for the productin VM, Please think very carefully about the practicalities of that. Your DR configuration needs to be able to spin up as a production service with as little time-lag as possible (that's the whole point of DR), so how close to production is the dev box going to be at any one point in time?
EDIT To enable vMotion on ESX, you must ensure the CPUs running on your host servers are near-identical. If you run incompatible CPUs, you won't be able to vMotion/cluster/HA between the hosts. VMWare and vendors provide compatibility matrices to verify this:
The MD3000i is the older 3Gbps bus. I'd at least recommend the 3200 series to get 6Gbps if you're making a new investment.
If you have a bigger budget, I'd go with EMC. Otherwise, the Dell, IBM and HP devices are perfectly acceptable. EqualLogic makes some very nice stuff and they fall right in the middle with cost.
If you want the most cutting edge, consider 10Gbps.
Something more than a switch, like what? I can't imagine what unless we're talking fiber channel and then it would just be a host bus adapter in addition to the fiber channel switch.
I have both of these solutions: MD3000i, LeftHand.
You only need a gig switch between the MD3000i and the ESXi servers. I think the better long term solutions is the LeftHand because it can do either synchronous or asynchronous replication to another LeftHand. While you may not need this now, you may want it a year or two down the road. I feel like the LH is a bit more flexible when it comes to configuring LUNS.
Both units have great performance and are easy to get up & running with a ESXi cluster. Neither solution is a bad choice.
The MD3000i has built in hardware redundancy: power supplies, raid controllers, etc. The LH solutions gains redundancy by having two non-redundant systems mirror each other.
HA & VMotion work great on both systems.