We're getting some intriguing performance with our new FCoE environment, and I was hoping that people could let me know if what we're seeing is expected behaviour.
Our switching environment consists of 2 x Cisco Nexus 5672 switches, each having a Nexus 2348 UPQ FEX hanging off of them, single homed. On the FEX's, we have our ESX hosts (Dell R630) with Emulex CNAs in them. Our SAN is an EMC VNX 5300 that has a 10g FCoE card in it.
Regardless of whether the SAN is connected to the 2348 or 5672, write performance from VMs on the ESX hosts remains constant. However, read performance changes dramatically. When the SAN is connected to the 5672's, our average response times for 4k reads is around 0.25ms. However, connecting the SAN to the 2348 instead causes read response times to jump up to ~2.5ms. Looking at the stats in ESXTOP, all the additional response time is coming from the QAVG.
I understand that the FEX doesn't have local switching, thus requiring all packets to flow through the 5672 switches, but this additional 2ms of latency seems exceptionally high (especially considering I can ping between servers in < 0.02ms). All the reference architectures I've seen have the SAN / SAN switching fabric connected directly to the "core" Nexus switches, but no where have I read any reason why. I'm not opposed to connecting the SAN that way, I just want to understand why.
Long story short: Is this performance gap normal?