I have a cluster with 32 machines. The first 25 machines are on the first rack and the rest 7 machines are on the second rack. Each rack has a 1Gbps Ethernet switch. The network communication between the different racks will certainly have a performance penalty (which I don't exactly know)
I used the network performance benchmark tool like 'iperf' to measure the network speed between the machines. There is no problem (all point-to-point connection between 32 machines can exploit the full bandwidth).
However, in my application (which is latency-sensitive with request/respond network communication architecture). The inter-rack network speed is 4~5 times slower than that of the intra-rack network speed.
Is there anything I can do here? Any well-known strategy to apply?
Well, I think you've identified your problem: link contention between the two switches.
Look, each of your switches has a multi, multi gigabit backplane. Meaning that, depending on switch capabilities, the switch can sustain multiple full-duplex gigabit transfers concurrently. However, your link between switches is only one single gigabit, full duplex. So that link gets saturated and then things slow down.
To confirm this is what's happening, add monitoring to your switches and inspect the stats for your uplink ports during your speed testing.
Once you've confirmed, you have a couple of options. First, consider using an 802.3ad LAG uplink between switches. This will not allow any one flow to exceed 1Gbit, however you'll be able to support multiple concurrent 1Gbit streams, the number of which is dependent on how many LAG member ports you're using.
Another option is to upgrade to switches that can support 10Gb uplinks.