I'm running a command like this on my 36 core server (EC2 c4.8xlarge/Amazon Linux).
find . -type f | parallel -j 36 mycommand
The number of files to process is ~1,000,000, and it takes dozens of minutes. It should run 36 processes simultaneously. However, from the result of top
, there are about 10 processes at most, and 70% is idle. ps
shows more processes,
but most of them are defunct.
I guessed it was because each mycommand
finished
so quickly, parallel
could not catch up spawning new processes. So I tried
parallel --nice 20
to allocate more CPU time to parallel
itself, but this didn't work.
Does anyone have an idea to improve this?
$ parallel --version
GNU parallel 20151022
So you are running around 600 jobs per second. The overhead for a single GNU Parallel job is in the order of 2-5 ms, so when you are getting more than 200 jobs per second, GNU Parallel will not perform better without tweaking.
The tweak is to have more
parallel
s spawining jobs in parallel. From https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaroundThis way you will have 50 GNU Parallel that can each spawn 100 jobs per second.
Eh, if I understood your questions you want to process all the files simultaniously?
parallel
will launch multiple instances ofmycommand
, not multiplefind
instances.You are trying to open a million files, 36 at a time. Even if your command could run at full power on one CPU, you'd still incur in the overhead of opening those files in the first place. I/O is one of the most time-expensive operations on computers. Your best bet would be to load as many of those files beforehand into your machine's RAM, and work in RAM as much as possible. Depending on how much RAM you have, this may improve performance significantly, because once a read is started, subsequent reads tend to leverage on caching if done immediately one after the other. You may also want to make sure your filesystem lays files down in a cache-efficient way, and also that it is a good fs when it comes to multiple subsequent reads.
I don't think
parallel
is going to help you much with this refactoring.