To delete files recursively in our IBM GPFS cluster, we use simple unix command like :
rm /my/directories -fr
However deletions are very long to be done.
Problem is that our distributed apps (Spark-based) took like one hour
to be done. But then, it also took about an other hour
to drop temporary files generated by distributed apps like Spark.
So global workloads are very inefficient. May be it's because the rm
command has to list every sub-directories..
Anyway, do you known ways to efficiently drop an entire directory (and subdirectories) with GPFS ?
May be IBM give a special command to do that ?
I don’t think you can speed up this process as “rm” triggers lots of the metadata updates for the distributed file systems, and they take quite some time to complete. What you can try is to issue “mv” to some temp folder within the same file system (!!!) and do an actual “rm” in the background.
You can use gpfs policy which is much faster than 'rm'.
Here is an example, e.g., I want to remove all files under /gpfs2/mysql/performance_schema/
The policy file is:
RULE 'my_del' DELETE DIRECTORIES_PLUS WHERE PATH_NAME LIKE '/gpfs2/mysql/performance_schema/%'
Then I can run the policy with:
mmapplypolicy /gpfs2/mysql -P del.pol
You can refer to these two links for some explanation about policy and the DELETE rule:
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adv_polextip.htm
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adv_rule_syntaxdiagrams.htm
Actually there is a 'mmfile' tool under /usr/lpp/mmfs/samples/ilm. You need to first compile mmfindUtil_processOutputFile by :make -f mmfindUtil_processOutputFile.sampleMakefile
mmfile has the exact same syntax as 'find', but it uses GPFS policy so it will run much faster than find for GPFS file system. e.g, you can use: mmfind sub1/ | xargs rm -f to remove the files.
You may also follow me at @guanglei_li and you may get additional support at "https://www.ibm.com/mysupport/s/".