Ping a Specific Port

Question

Esa Varemo

Asked: 2012-03-26 10:45:32 +0800 CST2012-03-26 10:45:32 +0800 CST 2012-03-26 10:45:32 +0800 CST

Esxi with iSCSI SAN slows down with many multiple VMs running

772

I have a server with ESXi 5 and iSCSI attached network storage(4x1Tb Raid-Z on freenas). Those two machines are connected to each other with Gigabit ethernet, and a procurve switch in between.

After a while, if I have many(4-5 or more) vms running, they start to get un-responsive (long delays before anything happens). We are trying to find the reason behind this.

Today we looked at esxtop, and found that DAVG of that iSCSI LUN stays at 70-80. I read that +30 is critical!

What could be causing those high response-times?

1 Answers

Voted

Jeremy · Answer 1 · 2012-05-08T06:07:50+08:00

As you probably already know, DAVG refers to disk latency, and yeah, greater than 30msec is usually going to give you a noticeable decrease in performance and responsiveness. Latency can be caused by a lot of issues but first and foremost your disks must be able to handle the IO load you are throwing at them.

IO load refers not only to the # of IO's per second (IOPS), but also the pattern. Random (pattern) I/O is pretty much what you expect from virtualized servers, so your disk configuration needs to do well from a random I/O perspective. Unfortunately, RAID-Z doesn't fit the bill. According to Oracle:

The situation of random inputs is one that needs special attention when considering RAID-Z.

Effectively, as a first approximation, an N-disk RAID-Z group will behave as a single device in terms of delivered random input IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will globally act as a 200-IOPS capable RAID-Z group. This is the price to pay to achieve proper data protection without the 2X block overhead associated with mirroring.

Oracle says here that a RAID-Z set can handle about the same number of random IOPS as a single disk in the set. A single 7.2k disk can do about 80 IOPS (and that may be a generous number, depending on who you ask), so that means in RAID-Z your entire array can only do 80 random IOPS. Running 5-7 servers on that few IOPS is a recipe for terrible performance.

You would see far better performance if you configured your 4 drives in a RAID-10 set. If you need more than 2TB RAW capacity (which is what you'd get in RAID-10), do RAID-5. Either will give you better random I/O performance than RAID-Z in this case.

Esxi with iSCSI SAN slows down with many multiple VMs running

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?