How do FC switches work, and how should I configure mine?
This is a Canonical Question about iSCSI we can use as a reference.
iSCSI is a protocol that puts SCSI commands as payload into TCP network packets. As such, it is subject to a different set of problems than, say, Fibre Channel. For example, if a link gets congested and the switch's buffers are full, Ethernet will, by default, drop frames instead of telling the host to slow down. This leads to retransmissions which leads to high latency for a very small portion of storage traffic.
There are solutions for this problem, depending on the client operating system, including modifying network settings. For the following list of OSs, what would an optimal iSCSI client configuration look like? Would it involve changing settings on the switches? What about the storage?
- VMWare 4 and 5
- Windows Hyper-V 2008 & 2008r2
- Windows 2003 and 2008 on bare metal
- Linux on bare metal
- AIX VIO
- Any other OS you happen to think would be relevant
Is there a command to test whether jumbo frames are actually working? i.e. some sort of "ping" that reports whether or not the packet was broken up along the way.
I've an ESXi host with an Ubuntu VM which mounts a Dell MD3000i via iSCSI. I suspect jumbo frames are not enabled on the switch, and can't easily get admin access to it. I have the option to connect the disk array directly to the ESXi host, but would like some way of confirming that jumbo frames are a problem first.
Once upon a time, I built my own SQL servers, and had control over drive configuration, RAID levels, etc. The traditional advice of separation of data, logs, tempdb, backups, (depending on budget!) was always a pretty important part of the SQL server design process.
Now with an enterprise-level SAN, I just request a specific amount of drive space for a new SQL server, divided into logical drives for data, backups, and fileshares. Certainly makes my job easier, but there is a part of me that doesn't feel completely comfortable that I can't really peek "behind the curtain" to see what is really going on back there.
My understanding is that the SAN team doesn't configure different "types" of drives any differently (optimizing data drives for random access vs log drives for streaming writes). Some of this may depend on the SAN product itself (we have an HP XP12000 and an HP XP24000), but I've been assured that the HP software does all sorts of dynamic performance configuration (watching for IO hotspots and reconfiguring on the fly to optimize those LUNs), so that the app teams and DBAs don't need to worry about any of that stuff. Something about "spreading the load of all the servers over a huge number of spindles" or something like that.
My questions/discussion:
Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?
If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)
Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)
Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?
We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?
Thanks for your thoughts (similar topic briefly discussed in this SF question)
Its standard practice to separate log and data files to separate disks away from the OS (tempdb, backups and swap file also) Does this logic still make sense when your drives are all SAN based and your LUNS are not carved of specific disk or raid sets -they are just part of the x number of drives on the SAN and the LUN is just space allocation