Ping a Specific Port

Question

cedivad

Asked: 2012-09-30 09:40:52 +0800 CST2012-09-30 09:40:52 +0800 CST 2012-09-30 09:40:52 +0800 CST

Multi threaded network file transfer

772

Many of the links that connect NY and AMS are often saturated. That means that running a transfer over them (e.g., move 300GB at 1MB/s) would take an age if compared with what our connections can offer.

I came across to the problem in the past, like 3 years ago, when i was really newbie to coding and linux, i came to my conclusion that i will post on the bottom of this post. However, it's dirty and i don't like it. The script dosen't work as it is, since that it was written for a very specific environment, however it give you the idea.

My question is, do you know of anything better alternatives to transfer files across the ocean in a fast way?

#!/bin/sh

upto="$1"
filepath="$2"
remotepath="$3"

if [ ! -f ${filepath} ]
then
exit 0
fi

password=$(/all/script/password 10)
filesize=$(du -b ${2} | sed 's/\([0-9]*\)\(.*\)/\1/')

if [ $filesize -gt 5368709120 ]; then
parts="80"
elif [ $filesize -gt 2147483648 ]; then
parts="50"
elif [ $filesize -gt 1310720 ]; then
parts="20"
else
parts="2"
fi

splitsize=$(($filesize / $parts))

split -b "$splitsize" -a 2 "$2" /all/tmp/cup/${password}_

#UPLOAD
declare -a pwait
for tmpfile in /all/tmp/cup/${password}_*
do
    scp ${tmpfile} root@${upto}.domain.com:/all/tmp/cup/ &
        array_lenght=${#pwait[@]}
        pwait[${array_lenght}]=$!
done

#ATTENDERE
for prid in ${pwait[@]}
do
wait $prid
done

#UNISCI FILE REMOTO
ssh root@${upto}.domain.com "cat /all/tmp/cup/${password}_* > ${remotepath} && wait && rm -f /all/tmp/cup/${password}_*"

#RIMUOVI ROBA DI TROPPO LA
#eval ssh root@${upto}.domain.com rm -f /all/tmp/cup/${password}*

#REMOVE HERE
rm -f /all/tmp/cup/${password}_*

exit 0

3 Answers

Voted

voretaq7 · Answer 1 · 2012-10-11T14:44:51+08:00

Assuming that your network are not saturated (contrary to what you're stating in the question), you should be tuning your link to deal with the (comparatively) high bandwidth delay product like Andrew mentioned. (The articles referenced at that link include some info on what to tweak, when, and why.)

If in fact your network links ARE saturated (moving the maximum amount of data they can) the only solution is to add more bandwidth (either more fiber trunks between the two sites, paying another carrier for transit to offload some of the peak period traffic, or if you're using "dedicated" links paying for a higher CIR/adding more circuits to the loop).

How can you tell the difference?
Well, if starting more streams gets you more speed you haven't saturated your link. You're probably getting hit by the relatively long round-trip time from the US to Europe (as compared to the round-trip time on a local network).
(There's a point of diminishing returns here as the overhead for more TCP connections will eventually cause other bottlenecks to show up.)

If adding more streams provides no net increase in speed (two streams run at half the net speed of one) your link is saturated, and you need to add bandwidth to improve performance.

Other stuff to consider

You should seek to minimize the data being pushed over the pipe, using rsync or similar protocols if appropriate (rsync works best with small-ish change sets to large-ish collections of data).

Never underestimate the bandwidth of a FedEx overnight package with a couple of hard disks in it. Especially for initial syncs.

Andrew Smith · Answer 2 · 2012-09-30T14:10:50+08:00

Andrew Smith

2012-09-30T14:10:50+08:002012-09-30T14:10:50+08:00

I would check the TCP/IP tuning options, for example window scaling, retransmission, the routing table as well icmp. If this is all working OK, and the networking stack on the OS is not Windows XP or Centos 5 or anything older than Vista, you should be OK the way that multi-threaded network connection is not required. Or, it would not improve better than 20%, so in fact, it would just defragment the filesystem and slow it down even more.

2

Tel · Answer 3 · 2018-01-01T17:40:51+08:00

https://en.wikipedia.org/wiki/Bandwidth-delay_product

A high bandwidth-delay product is an important problem case in the design of protocols such as Transmission Control Protocol (TCP) in respect of TCP tuning, because the protocol can only achieve optimum throughput if a sender sends a sufficiently large quantity of data before being required to stop and wait until a confirming message is received from the receiver, acknowledging successful receipt of that data. If the quantity of data sent is insufficient compared with the bandwidth-delay product, then the link is not being kept busy and the protocol is operating below peak efficiency for the link.

That's the basic theory, but there are additional factors: depending on your OS and TCP tuning options you might have large windows in play (large windows make it go faster), but then again some ISP's use "TCP Window Manipulation" as a shaping and congestion control tool (i.e. a box in the middle knows some link is congested and then attempts to quench the TCP source by editing the ACK packets in order to convince the source that the receiver's window size is small), so your large windows might not really be in play, even when you think you have them switched on.

There's one more thing that could be happening which is that as packet queues pile up in a congested router, it can start randomly dropping packets out of the queue (see Cisco Weighted Random Early Detection or WRED for short) but the guy using only one TCP stream tends to back off more rapidly than the guy using a bunch of parallel TCP streams so by using multiple parallel streams you can get a bigger "share" of the bandwidth on that congested queue (at the expense of others who abstain from this technique).

There's a fun tool called "tcptrace" which gives you visibility into what's going on, presuming you can capture packets at either end. Unfortunately you need to work with "xplot" which is a bit of a horrible program, but you can live with it.

Multi threaded network file transfer

Other stuff to consider

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?