Background
We had an incident where a Windows failover cluster suffered an interruption. A post-mortem showed that the node was "removed" as described in this article.
We've only recently migrated this cluster fully into our VMware environment, and it appears that the event described above may have been the cause of the outage.
The associated VMware KB article about this talks about increasing the Small Rx Buffers
and the Rx Ring #1
setting, but cautions that increasing these too much could drastically increase memory overhead on the host.
After an audit of the Network Interface\Packets Received Discarded
performance counters for our ~150 Windows VMs, 22 vNICs across 16 guests had some discarded packets.
A small enough amount that I'm not worried about taxing the hosts with additional memory usage, but I want to understand how memory is used for these settings and where the memory comes from.
Questions
- What is the relationship between number of buffers and ring size?
- How does one calculate the amount of memory used for given values of these settings?
- Because these settings are on the NIC itself within the guest OS, I assume they are driver settings. This makes me think that the RAM used might be paged or non-paged pool.
- Is this correct?
- If so, should I be worried about that?
- Are there concerns I'm not taking into account here?
We're trying to determine whether there is a drawback to setting these to their maximums on affected VMs, other than VMware host memory usage. If we're increasing risk of pool memory being depleted in the guest for example, we're more inclined to start small.
Some (perhaps all) of these questions may not be specific to VMware or virtualization.
They're related, but independent. The rx "ring" refers to a set of buffers in memory that are used as a queue to pass incoming network packets from the host (hypervisor) to the guest (Windows VM). The memory gets reserved in the guest by the network driver, and it gets mapped into host memory.
As new network packets come in on the host, they get put on the next available buffer in the ring. Then, the host triggers an IRQ in the guest, to which the guest driver responds by taking he packet off the ring, and dispatching it to the network stack of the guest OS, which presumably sends it to the guest application indending to receive it. Assuming the packets are coming in slow enough, and the guest driver is processing them fast enough, there should always be a free slot in the ring. However, if packets are coming in too fast, or the guest is processing them too slowly, the ring can become full, and packets may be dropped (as you've seen in your situation).
Increasing the ring size can help mitigate this issue. If you increase it, more slots will be available in the ring at a time. This segues into the second setting, "Small Rx Buffers", which is the total amount of buffers available that can be used to fill the slots in the ring. There needs to be at least as many buffers as slots in the ring. Typically you want more. When the guest takes a buffer off the ring to give to the guest network stack, it may not always be immediately returned back to the driver. If that happens, having spare buffers to fill the ring means you can go longer without dropping packets.
The Rx Ring #1 / Small Rx Buffers are used for non-jumbo frames. If you have a default NIC configuration, that's the only ring that will be used.
Assuming you're talking about non-jumbo frames, each buffer needs to be big enough to store an entire network packet, roughly 1.5kb. So if you have 8192 buffers available, that would use 12MB. A larger ring will also use more memory, but the descriptors are small (bytes), so it's really the buffers you have to worry about.
Yes, it's a non-paged pool. If the ring buffers were paged, it would likely result in dropped packets while the buffers were being paged back in.
I'm not sure this is relevant to your situation, but it might be worth noting that a larger ring will increase the cache footprint of the network rx path. In microbenchmarks, you will see that a larger ring usually hurts performance. That said, in real life applications, if a packet gets dropped, that's usually a bigger deal than a small performance gain in speed bursts.
Source: I worked at VMware.
I don't have a reply for point 1-2-3 but you can check with your virtual enginner about Vmware host config . If he is VCP he will understand the stuff :)
You really have to check your host because windows problems could be on the host not in the guest .
There is many hardware feature that could explain your problems , directpath io , rss , vcpu , power management scheme ...
I can give you some link that help your virtual team , or you :)
This link is about tuning the host http://buildvirtual.net/tuning-esxi-host-networking-configuration/
And this fat pdf :
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
And this one is about rss :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2008925
I am not in a position to fully search and point you to the right pages: so I am asking you to look for the details yourself... ( sorry )
In Fail over Cluster there are 4 settings which can be tweeked; and they will not affect buffers or paged or non-paged... It changes the way Fail over Cluster makes the decision to consider a node "removed". These settings are:
SameSubnetDelay SameSubnetThreshold CrossSubnetDelay CrossSubnetThreshold
They may not solve your problem, but tweaking these may get you out of trouble at the moment...
When back on Monday, I will check back to this post if you have further questions
HTH, Edwin.