Many of the links that connect NY and AMS are often saturated. That means that running a transfer over them (e.g., move 300GB at 1MB/s) would take an age if compared with what our connections can offer.
I came across to the problem in the past, like 3 years ago, when i was really newbie to coding and linux, i came to my conclusion that i will post on the bottom of this post. However, it's dirty and i don't like it. The script dosen't work as it is, since that it was written for a very specific environment, however it give you the idea.
My question is, do you know of anything better alternatives to transfer files across the ocean in a fast way?
#!/bin/sh
upto="$1"
filepath="$2"
remotepath="$3"
if [ ! -f ${filepath} ]
then
exit 0
fi
password=$(/all/script/password 10)
filesize=$(du -b ${2} | sed 's/\([0-9]*\)\(.*\)/\1/')
if [ $filesize -gt 5368709120 ]; then
parts="80"
elif [ $filesize -gt 2147483648 ]; then
parts="50"
elif [ $filesize -gt 1310720 ]; then
parts="20"
else
parts="2"
fi
splitsize=$(($filesize / $parts))
split -b "$splitsize" -a 2 "$2" /all/tmp/cup/${password}_
#UPLOAD
declare -a pwait
for tmpfile in /all/tmp/cup/${password}_*
do
scp ${tmpfile} root@${upto}.domain.com:/all/tmp/cup/ &
array_lenght=${#pwait[@]}
pwait[${array_lenght}]=$!
done
#ATTENDERE
for prid in ${pwait[@]}
do
wait $prid
done
#UNISCI FILE REMOTO
ssh root@${upto}.domain.com "cat /all/tmp/cup/${password}_* > ${remotepath} && wait && rm -f /all/tmp/cup/${password}_*"
#RIMUOVI ROBA DI TROPPO LA
#eval ssh root@${upto}.domain.com rm -f /all/tmp/cup/${password}*
#REMOVE HERE
rm -f /all/tmp/cup/${password}_*
exit 0
Assuming that your network are not saturated (contrary to what you're stating in the question), you should be tuning your link to deal with the (comparatively) high bandwidth delay product like Andrew mentioned. (The articles referenced at that link include some info on what to tweak, when, and why.)
If in fact your network links ARE saturated (moving the maximum amount of data they can) the only solution is to add more bandwidth (either more fiber trunks between the two sites, paying another carrier for transit to offload some of the peak period traffic, or if you're using "dedicated" links paying for a higher CIR/adding more circuits to the loop).
How can you tell the difference?
Well, if starting more streams gets you more speed you haven't saturated your link. You're probably getting hit by the relatively long round-trip time from the US to Europe (as compared to the round-trip time on a local network).
(There's a point of diminishing returns here as the overhead for more TCP connections will eventually cause other bottlenecks to show up.)
If adding more streams provides no net increase in speed (two streams run at half the net speed of one) your link is saturated, and you need to add bandwidth to improve performance.
Other stuff to consider
You should seek to minimize the data being pushed over the pipe, using
rsync
or similar protocols if appropriate (rsync works best with small-ish change sets to large-ish collections of data).Never underestimate the bandwidth of a FedEx overnight package with a couple of hard disks in it. Especially for initial syncs.
I would check the TCP/IP tuning options, for example window scaling, retransmission, the routing table as well icmp. If this is all working OK, and the networking stack on the OS is not Windows XP or Centos 5 or anything older than Vista, you should be OK the way that multi-threaded network connection is not required. Or, it would not improve better than 20%, so in fact, it would just defragment the filesystem and slow it down even more.
https://en.wikipedia.org/wiki/Bandwidth-delay_product
That's the basic theory, but there are additional factors: depending on your OS and TCP tuning options you might have large windows in play (large windows make it go faster), but then again some ISP's use "TCP Window Manipulation" as a shaping and congestion control tool (i.e. a box in the middle knows some link is congested and then attempts to quench the TCP source by editing the ACK packets in order to convince the source that the receiver's window size is small), so your large windows might not really be in play, even when you think you have them switched on.
There's one more thing that could be happening which is that as packet queues pile up in a congested router, it can start randomly dropping packets out of the queue (see Cisco Weighted Random Early Detection or WRED for short) but the guy using only one TCP stream tends to back off more rapidly than the guy using a bunch of parallel TCP streams so by using multiple parallel streams you can get a bigger "share" of the bandwidth on that congested queue (at the expense of others who abstain from this technique).
There's a fun tool called "tcptrace" which gives you visibility into what's going on, presuming you can capture packets at either end. Unfortunately you need to work with "xplot" which is a bit of a horrible program, but you can live with it.