I have a fairly large number of RAID arrays (server controllers as well as midrange SAN storage) that all suffer from the same problem: barely enough spindles to keep the peak I/O performance, and tons of unused disk space.
I guess it's a universal issue since vendors offer the smallest drives of 300 GB capacity but the random I/O performance hasn't really grown much since the time when the smallest drives were 36 GB.
One example is a database that has 300 GB and needs random performance of 3200 IOPS, so it gets 16 disks (4800 GB minus 300 GB and we have 4.5 TB wasted space).
Another common example are redo logs for a OLTP database that is sensitive in terms of response time. The redo logs get their own 300 GB mirror, but take 30 GB: 270 GB wasted.
What I would like to see is a systematic approach for both Linux and Windows environment. How to set up the space so sysadmin team would be reminded about the risk of hindering the performance of the main db/app? Or, even better, to be protected from that risk? The typical situation that comes to my mind is "oh, I have this very large zip file, where do I uncompress it? Umm let's see the df -h
and we figure something out in no time..." I don't put emphasis on strictness of the security (sysadmins are trusted to act in good faith), but on overall simplicity of the approach.
For Linux, it would be great to have a filesystem customized to cap I/O rate to a very low level - is this possible?
I would look into moving the high IOPS/low space req databases to SSD based arrays - those disks are small and provide excellent throughput. This is as simple an approach as it will ever get
LUNs, partitions and selective presentation of resources...
Just because a SAN is backed by 16 disk spindles doesn't mean that the server consuming that data needs to see the full capacity of the array. Same thing with direct-attached storage. There are times where I may have a large number of disks in an array, but I'll still right-size the LUN/partition/device presented to the operating system.
The example below from an HP ProLiant SmartArray controller shows 4 x 480GB SSDs in a RAID 1+0 array. That's 960GB usable. I carved a 400GB LUN out of that 960GB. The operating system only sees 400GB. And even that 400GB is partitioned into logical chunks that make sense for the application. The point is that you can control what the consumers of the storage space see:
But in the end, if the performance meets your needs and the organization can afford the current configuration, why do you deem unused space as "wasted"?
BTW - It is possible to throttle I/O in Linux on a block-device level using cgroups. But if there's a risk to your main applications or databases, why not separate the workloads?