We're currently looking at moving from a traditional Server environment to a SAN/VMWare environemnt.
I've been asked to gather performance statistics for our main servers - DCs, File Servers, Exchange to see if it is feasible for our environment or if we're going to run into SAN performance issues.
I've been running some scheduled baselines over 8 hours and including lots of counters but the resulting logs are too big to be useful - it takes about three minutes for perfmon to open them, or allow us to view diffeent counters.
Although I know generally which are useful to see performance what would be an adequate list to monitor that would give us a useful starting point, also what counters would be useful for this.
I'm thinking
- CPU Performance
- Disk/File
- Network Usage
- Active Diretory (GPOs, Logons etc)
But what counters would be the most useful, also are there any areas that we should specifically target for concern?
The big one that's likely to kill you is disk IO. Collecting both the transactions per second and the sectors read/written per second will give you a start on determining what you'll need on the SAN. Keep an eye on memory and pagefile usage, too, that can do bad things to your disk IO stats, and provisioning your VMs with some extra memory is simple.
Network is probably the next most important one, but that's pretty simple -- aggregate transfer and packets per second, make sure it's not too ridiculous.
CPU is the least likely bottleneck on a modern system, in my experience. I'd be inclined not to worry about it unless you've got multiple machines that are pegging their CPU consistently. Provisioning an extra VM server if you're running out of CPU is straightforward.
After a bit more reseach, I think this is a good generic list of counters :
Logical Disk
Memory
Network
Physical Disk
Process
Processor
System
For disk bound I like to monitor '\PhysicalDisk( ... )\Current Disk Queue Length' for each physical disk.
For your problem viewing things with perfmon: Although this might be out of the scope of what you are doing, I monitor windows counters with Nagios using the check_nt plugin, and nsclient++ installed on the client. I can then graph everything using n2rrd , I can also use rrdtool to create custom graphs.
All the stuff you listed is often run in a vmware/san enviornment. It is really just a question of how powerful the SAN and virtual server will need to be and the right architecture. If you are willing to spend the cash for an expensive san, the vendors should be able to tell you what you need.
Dependent upon your usage, disk IO and networks are like to be the biggest cause for concern in moving to a VMWare type infrastructure, especially if your VM's are being stored on the SAN, you should definitely be assessing network usage and disk IO for all machines you would migrate. Most servers for VMWare type usage should come with a nice number of NIC's however its still worth bearing in mind how many you will be able to use, as well as the speed of disks on the SAN. VMWare ESX supports the ability to not write all disk changes back to the VM immediately and therefore you can save on some performance that way.
Measuring performance we used RRDTool to access performance as Kyle said, this is really useful.
Virtual machines are not like typical servers, in that you run into problems in different areas. Most of the time, CPU isn't the bottlenecking resource, but RAM is. The things to really know before you go in:
Determining whether or not you can use file-backed disks or if you require direct-presented LUNs can take a bit of knowing. Direct-presented LUNs are where your storage array presents LUNs directly to VM's, which is made easier by using NPIV. You can do it without NPIV but it may be too perilous for your blood, all brand new Fibre Channel hardware should support it and ESX 3.5 certainly does. Direct presented removes a layer of abstraction between the storage array and the virtual machine pounding I/O, and in that sense it can provide better performance. However, direct-presentation is trickier to set up and has a higher start-up time in the "wrap your head around it" stage.
File backed disks are just plain easier. Plus, they can be moved between storage arrays pretty simply (for certain values of simple, where copying multi GB files is concerned), something that direct-presentation requires (usually very expensive) array-level replication software to accomplish. Low I/O load things work just peachy on file-backed, and even some higher I/O things as well. We're running a full Exchange 2007 install for over 3000 users on file-backed disks. The backups could be faster, but during the day the users don't notice any slow-downs.