I'm launching a rsync simple command between two servers. Both servers have two eth interfaces on bonding. When I send a big file from one server to the other with rsync I reach 130M/s transfer rate.
But, and here is the problem, when I send a directory with lots of small files the transfer is 1M/s at its best.
I've checked both cpu loads(8cpu i7), and they are at 10% maximum.
Knowing that what makes all the transfer slow down is the open/close of the files, and this 'theoretically' goes on the cpu, I understand that this can be easily tuned. But I do not know how to tune that.
Any tip on how to make rsync use all CPUs?
Your problem doesn't have (almost) anything to do with the CPU.
Transferring big files is usually fast, since it can be done with sequential I/O.
Transferring lots of small files requires tons of horsepower on the storage side of things, since it requires random I/O. Low seek times, fast hard drives, lots of cache and a filesystem designed for huge number of files are a must. CPU does not help there, at least not much, just like you are observing. CPU's and OS are just waiting for disk I/O to finish.
All that faster CPU / more cores can do, that they can end up waiting for I/O faster. :-)
The latency of many many small random IO operation adds up:
In my experience is rsync a very good tool to hold things in sync, but not a very good tool to submit all data as fast as possible. Use it when bandwidth or storage capacity don't leave other options. If you can afford to tar all files up and transfer in one blob, you can expect increased performance (overall wall clock time used to complete to operation), if there are enough files.
There is a lot of network/disk overhead when dealing with lots of small files using rsync. With small enough files, your speedup factor may be less than 1.
Pay attention to the speedup factor using -v. If your speedup factor is below 1 even when you know you're already in sync, then you are experiencing quite a lot of overhead. The CPU is not the bottleneck.
What Janne said: you're IO bound, not CPU bound. Launch top (or better, atop/htop), notice how little CPU is actually used when transferring small files. Also note that your processes are in 'D' state, waiting for data to be available to them.
Additionally, I don't believe rsync is optimized for multi-core; most of what it does is sequential, and it would require very clever work to make it go faster in that respect.
It does, however, probably take advantage of up to 2 cores if you use ssh as a transport. It will be spawned as a separate process, and will do all its encryption and possibly compression work in a separate thread from the main rsync process. Said process has somewhat CPU-intensive tasks: CRC calculation and MD5 hashing (I believe that's what it uses).