How does an ethernet switch work on a port-by-port basis? Specifications are given for the speed of the switch "fabric", but is that universal bandwidth through a central processor or are there optimal places to plug things in where the traffic won't have to hit the main bus?
Example: If I have two ports that will be talking to one another a lot, will it be better to put them next to one another or will it be better to put them on different blocks of ports?
As Chopper3 says it will depends on the switch, it also depends on feature used (vlan, acl, mulicast, etc.).
On a sigle card switch like Cisco 2950 or 2960, even if the hardware combine port 4 by 4 (from what I remember) putting 2 server on a group of 4 port or on 2 groups might reduce latency this would probably be imperceptible.
On a stack switch like Cisco 3750, putting both server on the same stack unit is better, you will have smaller latency and don't use bandwidth on the stack ring.
On a chassis like Cisco 6500, you definitely want to put both server on the same card.
In any case putting both server on the same group of port can only be better for performance. Keep in mind that if it's better for performance is worth for security because if you have an hardware problem on the group of port, one unit of the stack, on the line card, ... you will loose both server. So for HA or something like this, it's not a good idea.
The best is still to make test with your switch, measuring such small latency need special ethernet tester.
Depends a little on the switch but generally it makes sense for them to be on the same card if possible.
It all depends on the chip hardware. There are lots of research done on the design of switching fabrics because these things often trade area space and cost for 'switchability'. For example, a fully switchable switch is a cross-bar but that increases the cost exponentially with the addition of a port.
However, I feel that if your application requires optimisation down to that level, you may be better served by looking at other avenues instead. Maybe compressing your data so that it consumes less bandwidth or even running something entirely different.