Is there a smart way of deleting old files from the hdfs /tmp directory? (Just to make sure, I am not talking about the unix FS /tmp)
Is there a smart way of deleting old files from the hdfs /tmp directory? (Just to make sure, I am not talking about the unix FS /tmp)
hadoop fs -stat "%Y" "/path/*"
Will output timestamps of everything in /path/. Use that along with a cut off as to what you consider too young and you can have this clean up in a shell script kicked off by cron.This might be smarter then parsing other things outputted by hadoop fs.
Here's (the source code of) a small tool that does the job: https://github.com/mag-/hdfs-cleanup/
I might write one on my own (or port the given one to Python) so I don't need to create a build chain for Golang in my company.
And one more for Ruby users: https://github.com/nmilford/clean-hadoop-tmp