We have a bucket with more than 500,000 objects in it.
I'm assigned a job where I've to delete files which have a specific prefix. There are around 300,000 files with the given prefix in the bucket.
For eg If there are 3 files
abc_1file.txt
abc_2file.txt
abc_1newfile.txt
I've to delete the files with abc_1 prefix only. I didn't find much in AWS documentation related to this.
Any suggestions on how can I automate this?
You can use
aws s3 rm
command using the--include
and--exclude
parameters to specify a pattern for the files you'd like to delete.So in your case, the command would be:
aws s3 rm s3://bucket/ --recursive --exclude "*" --include "abc_1*"
which will delete all files that match the "abc_1*" pattern in the bucket.
The behavior of these parameters is documented here
These instructions assume you have downloaded, installed and configured the AWS CLI tools
As a complement to @sippybear's excellent answer, I would recommend the following, if somebody has a bucket with a trillion objects and the pattern of the files one wants to delete includes "parent directories", e.g.
'my/path/to/topdir/abc_1*'
:Why?
--dryrun
, even if you promptly interrupt it (ctrl-C
); typos and other accidents happen and errors when deleting large number of files in a bucket can be very regrettable (even if you have proper backups)...Once you're happy with what you see is about to be deleted, then remove the
--dryrun
.