I am downloading files (in parallel) which have a very large amount of data (fastq files) into a directory. I am running out of space quickly. So I got the following script (from here, modified slightly) to compress files as they are downloaded:
inotifywait -m ./ -e create -e moved_to |
while read dir action filepath; do
echo "The file '$filepath' appeared in directory '$dir' via '$action'"
# compress file
if [[ "$filepath" =~ .*fastq$ ]]; then
pigz --best $filepath
fi
done
This helped in that I run out of hard drive space at a later time, but I'm still downloading files quicker than I am compressing. Is there a way to parallelize the compression process so that I am compressing multiple files at the same time? (I'm assuming the above code doesn't do that)
One way I can think of (perhaps) accomplishing this is by running the script from different terminals multiple times, but I'm pretty sure this is a very lousy way of doing this
I made something for you, I've named it Cerberus, from the guard dog.
https://pastebin.com/yiqajYfT
Your downloaded filenames must contain no spaces, so if they do at the time of download, rename them and remove spaces, else they will not be detected.
compile with gcc -ocerberus cerberus.c
you'll need a subdirectory into which the compressed files will go. your original files will be removed after compression, if you don't want this to happen, comment line 63. you can change the compression (workdirectory) directory name, compression program and compressed files extension in the definitions section, lines 9-11. if your filenames are longer than 100 characters, increase MAXNAME in line 12.
good luck!