Where I work we have a numerous "big iron" servers which are used used for hosting many virtual machines using a Xen Hypervisor. These are typically configured with 32GB RAM, Dual Quad core processes and fast disks with gobs of I/O capacity.
We're at the point in time where the existing hardware configuration is getting a bit long in the tooth and it is time to go out and source bigger, faster and shinier new hardware.
As mentioned above, the existing kit has been deployed with 32GB RAM and that has effectively limited the number of VMs that we can deploy to a host.
In investigating newer hardware though, it is evident that you can get more and more RAM within a single machine with 64, 72 or even 96GB within a single chassis. Evidently, this will allow us to get more machines to a given host which is always a win. Analysis completed so far suggests that the limiting factor will now be shifted to the disk subsystem.
The problem is now, trying to get some idea of where we're at... By virtue of the usage, we know that we're not limited in terms of I/O bandwidth, more-so, the number of random I/O operations which can be completed.. We know anecdotally that once we hit this point then iowait is going to sky rocket and the entire machine performance is going to go to the dogs.
Now this is the crux of the question I am asking, is anyone aware of a way to accurately tracking/trending existing I/O performance specifically with relation to the number of random I/O ops being completed?
What I am really trying to get a metric on is "this configuration can successfully handle X number of random I/O requests, and we're currently (on average) doing Y ops with a peak of Z ops".
Thanks in advance!
sar
does the job nicely here; it'll collect the number of transactions as well as sectors read/written per second, which can be used to then replay your IO workload with relatively decent accuracy (in terms of read/write ratios, as well as transaction size, which is the determining factor in how "random" your IO is). It's not perfect, but in my experience it does a good enough job to do the sort of estimation you're looking at.So, this looks like a monitoring and capacity reporting issue. If you're going to start measuring trending stats, I'd go across the board, so you can compare, correlate etc.
In terms of tools you have ganglia, zenoss, nagios, etc in the opensource world, and numerous other vendor products.
You can configure them to track, measure, and store the KPIs you're interested in, and then report on them periodically.
Given your queries on RAM usage, it would make sense to include the memory stats, swap usage, and CPU too, so you can compare them across the board for the same time period and see which is being limited, etc.
Once you're capturing data you can store it all in a nice big DB for reporting, possibly rarifying historical data, eg. store every 5 second metric for 6 months, then by minute, then 5, then per hour, as you go further back. That sort of thing can be scripted and run through cron, autosys etc.
Those reports will give you what management want - ie. something with pretty graphs.
And for daily management you can look at real-time information on a chart/figures through the console to see how you are performing at any given moment.
We use collectl as we can pull all the necessary information into a single file and replay the statistics at need. This will let you see the number of IOPS per recording interval, context switches, memory statistics. You can break this down per disk or just have an overall look at the system. Collectl also supports lustre.
This is a great tool to get an overview of total system performance. Good luck, from observations SATA disks typically top out between 200-300 IOPS when doing random access.
We record and graph disk I/O in the same way that we do all other metrics:
The data is pulled from hosts using SNMP. Our NAS/SAN boxes do this natively. We use net-snmp on all Linux hosts, which provides this information from USB-DISKIO-MIB.
The data is stored (in RRD format) and graphed using Cacti. Some Disk IO templates give us a transaction count and size, displayed in the usual current, average and peak format.
These metrics aren't necessarily as finite as using
iostat
/dstat
/sar
on a host. But it's fire and forget, which gets setup automatically when a new machine is commissioned, stored centrally and remains available for future reference.We use this data to alert us of unusual trends on an operational basis and always look back to it whenever performing capacity planning.
There's a couple of problems with this:
It's pretty difficult to separate and quantify random I/O from sequential I/O. Since the fundamental difference between the two is the physical location of blocks stored on the disk platter. You can make an educated guess from the size of transactions, on the basis that lots of small transactions probably relate to small files dotted about the disk. But there is no guarantee. It might be reading small quantities of data sequentially from a single file or adjoining blocks on the disk.
Recording the metrics will give you a very good picture of what your commitments are today, how they have change over time and thus how they will change in the future. What it won't tell you is what the ceiling is. At least not before it's too late. To determine this you need to do some maths (from your hardware specs), benchmarking (I'm fond of
bonnie++
myself) and it's helpful to have some logistical idea of what those domUs are doing/being used for.Depending on your storage backend (IBM SVC/DS8000) you may be able to pull statistics relating to random IOPS from it directly.
For pulling stats from the server, you can use nmon. It's free (as in beer). Originally developed by IBM for AIX, also runs on Linux.
If people use SAR I at least hope you're sampling your data ever few seconds. When I use collectl I sample once/second. As far as measuring how well you're doing at random I/O, use a tool like Robin Miller's dt (google it) and you can easily generate a LOT of random I/Os and then just measure with collectl to see how many you can do per second. A typical disk usually does a maximum of 200-300 I/Os/sec, based pretty much on rotational latency. The block size had minimal effect as waiting 1/2 revolution for the disk to be in the right location overwhelms everything else.
btw - iowait is one of the most misunderstood measurements. It has NOTHING to do with cpu load, it just means the cpu wasn't doing anything else while I/O was occurring. In fact if you're at 100% iowait that essentially means you about 100% idle!
-mark