I'd like to graph the size (in bytes, and # of items) of an Amazon S3 bucket and am looking for an efficient way to get the data.
The s3cmd tools provide a way to get the total file size using s3cmd du s3://bucket_name
, but I'm worried about its ability to scale since it looks like it fetches data about every file and calculates its own sum. Since Amazon charges users in GB-Months it seems odd that they don't expose this value directly.
Although Amazon's REST API returns the number of items in a bucket, s3cmd doesn't seem to expose it. I could do s3cmd ls -r s3://bucket_name | wc -l
but that seems like a hack.
The Ruby AWS::S3 library looked promising, but only provides the # of bucket items, not the total bucket size.
Is anyone aware of any other command line tools or libraries (prefer Perl, PHP, Python, or Ruby) which provide ways of getting this data?
This can now be done trivially with just the official AWS command line client:
Official Documentation: AWS CLI Command Reference (version 2)
This also accepts path prefixes if you don't want to count the entire bucket:
The AWS CLI now supports the
--query
parameter which takes a JMESPath expressions.This means you can sum the size values given by
list-objects
usingsum(Contents[].Size)
and count likelength(Contents[])
.This can be be run using the official AWS CLI as below and was introduced in Feb 2014
AWS Console:
As of 28th of July 2015 you can get this information via CloudWatch. If you want a GUI, go to the CloudWatch console: (Choose Region > ) Metrics > S3
AWS CLI Command:
This is much quicker than some of the other commands posted here, as it does not query the size of each file individually to calculate the sum.
Important: You must specify both StorageType and BucketName in the dimensions argument otherwise you will get no results. All you need to change is the
--start-date
,--end-time
, andValue=toukakoukan.com
.Here's a bash script you can use to avoid having to specify
--start-date
and--end-time
manually.s3cmd can do this :
s3cmd du s3://bucket-name
If you download a usage report, you can graph the daily values for the
TimedStorage-ByteHrs
field.If you want that number in GiB, just divide by
1024 * 1024 * 1024 * 24
(that's GiB-hours for a 24-hour cycle). If you want the number in bytes, just divide by 24 and graph away.If you want to get the size from AWS Console:
By default you should see Total bucket size metrics on the top
Using the official AWS s3 command line tools:
This is a better command, just add the following 3 parameters
--summarize --human-readable --recursive
afteraws s3 ls
.--summarize
is not required though gives a nice touch on the total size.s4cmd is the fastest way I've found (a command-line utility written in Python):
Now to calculate the entire bucket size using multiple threads:
You can use the s3cmd utility, e.g.:
So trolling around through the API and playing some same queries, S3 will produce the entire contents of a bucket in one request and it doesn't need to descend into directories. The results then just requiring summing through the various XML elements, and not repeated calls. I don't have a sample bucket that has thousands of items so I don't know how well it will scale, but it seems reasonably simple.