Ping a Specific Port

Question

HTTP500

Asked: 2012-03-02 05:34:02 +0800 CST2012-03-02 05:34:02 +0800 CST 2012-03-02 05:34:02 +0800 CST

Can you detect partition misalignment from a VM?

772

First of all the back story -

All of a sudden (literally overnight) an instance starts throwing CPU utilization alerts. This is a rather lowly VM (1 vCPU, 2GB RAM) but all it does is very low NFS serving and Cacti polling and serving for a handful of systems. This VM is hosted at an IaaS provider on vSphere 4.x and sits upon enterprise kit (HP/NetApp SAN, etc.).

The last time I changed anything on this system was nearly 4 weeks ago. Looking over the metrics one of the provider's agents/processes used by McAfee (cma) consumed WAY more RAM than usual until a cron job I have restarted the service the weekend prior (the cron job is there because I'm convinced this agent has a memory leak). Anyway, the problem is that I cannot run Cacti (httpd/mysql/php cron job that runs poller.php) on this system anymore - the load will go up over 10 and iowait is really high (~ 90%). I've tried the following:

run Cacti with the McAfee service stopped
systematically updated php*, httpd/mod_ssl, mysql-server, after each trying to run Cacti
yum update to all latest packages, it's now RHEL 5.8 (x86_64)

The yum update (all) put the system over a load of 6 and took hours.

I asked the hosting provider if there was anything wrong with the storage layer but they said there wasn't. But this just doesn't compute. This got me wondering if maybe there could be a problem with partition misalignment since I've read that it can cause the kind of symptoms I seem to be experiencing. Now the provider would have created these VMFS partitions in the vSphere/vCenter client which I understand ensures that there is alignment. But can it get out of alignment over time? If so, is there any way from a VM/Guest that you can detect this? The mbrscan (NetApp) utility looks like it detects but that has to be run from the host's ESX console.

Thanks!

Edit: sfdisk output with uS added:

    [root@nfs1 ~]# sfdisk -luS /dev/sda

Disk /dev/sda: 13054 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sda1   *        63    208844     208782  83  Linux
/dev/sda2        208845 164055779  163846935  83  Linux
/dev/sda3     164055780 209712509   45656730  8e  Linux LVM
/dev/sda4             0         -          0   0  Empty

Update:

A reboot of this instance completely solved the performance problems. Further analysis by the Hosting Provider did indicate that there is some misalignment but in their opinion it would not result in the symptoms experienced. They say for example that misalignment in Windows VMs is greater. At this point we're going to wait and see if it happens again and if so change the sector offset.

2 Answers

Voted

Basil · Answer 1 · 2012-03-02T09:19:42+08:00

Basil

2012-03-02T09:19:42+08:002012-03-02T09:19:42+08:00

The only way to see alignment issues is to measure the master boot record. If you can do that from your VM, you can see whether you're misaligned.

That said, alignment problems magnify the number of IOs you do to the storage, but there would have to be some limitation in place to prevent you from doing this increased number of IOs. Netapp is particularly hard hit by this because they start limiting performance as soon as the number of "partial writes" that need extra attention by their back-end hits a certain level. Other systems just treat each IO the same way as the last one, so don't have that massive spike in storage latency that Netapp gets.

1

Eric Nicholson · Answer 2 · 2012-03-02T06:00:12+08:00

You should be able to find out the guest alignment with sfdisk on Linux. Just look that the the start sectors of your partitions. But, that will only tell you half the story since your provider can/should account for the default OS alignment at the storage layer.

So even if it seems misaligned at something like 63 sectors, the storage may have an offset into the LUN or data store to correct it to an aligned boundary. But at least you can take your new knowledge to your provider and have them confirm.

Update (for new sfdisk results): None of your partitions are aligned on the same 4KB or 8KB block boundaries, so it's quite likely that you are experiencing some misalignment pain. You need to ask your provider what block alignment the storage uses (e.g. 4KB) and what alignment correction they use, if any. If they don't have any alignment correction, you want all of your partitions to start at a sector count evenly divisible by 8 or 16. While you are at it, an even 1MB start offset (evenly divisible by 2048) allows for any underlying storage block size changes in the future.

Can you detect partition misalignment from a VM?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?