Ping a Specific Port

Question

Garret Heaton

Asked: 2009-11-15 16:40:45 +0800 CST2009-11-15 16:40:45 +0800 CST 2009-11-15 16:40:45 +0800 CST

How can I get the size of an Amazon S3 bucket?

772

I'd like to graph the size (in bytes, and # of items) of an Amazon S3 bucket and am looking for an efficient way to get the data.

The s3cmd tools provide a way to get the total file size using s3cmd du s3://bucket_name, but I'm worried about its ability to scale since it looks like it fetches data about every file and calculates its own sum. Since Amazon charges users in GB-Months it seems odd that they don't expose this value directly.

Although Amazon's REST API returns the number of items in a bucket, s3cmd doesn't seem to expose it. I could do s3cmd ls -r s3://bucket_name | wc -l but that seems like a hack.

The Ruby AWS::S3 library looked promising, but only provides the # of bucket items, not the total bucket size.

Is anyone aware of any other command line tools or libraries (prefer Perl, PHP, Python, or Ruby) which provide ways of getting this data?

27 Answers

Voted

philwills · Answer 1 · 2015-09-10T06:04:47+08:00

philwills

2015-09-10T06:04:47+08:002015-09-10T06:04:47+08:00

This can now be done trivially with just the official AWS command line client:

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/

Official Documentation: AWS CLI Command Reference (version 2)

This also accepts path prefixes if you don't want to count the entire bucket:

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/directory

468

Christopher Hackett · Answer 2 · 2014-11-17T15:00:30+08:00

Best Answer

Christopher Hackett

2014-11-17T15:00:30+08:002014-11-17T15:00:30+08:00

The AWS CLI now supports the --query parameter which takes a JMESPath expressions.

This means you can sum the size values given by list-objects using sum(Contents[].Size) and count like length(Contents[]).

This can be be run using the official AWS CLI as below and was introduced in Feb 2014

 aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"

210

Sam Martin · Answer 3 · 2015-08-01T13:58:41+08:00

AWS Console:

As of 28th of July 2015 you can get this information via CloudWatch. If you want a GUI, go to the CloudWatch console: (Choose Region > ) Metrics > S3

AWS CLI Command:

This is much quicker than some of the other commands posted here, as it does not query the size of each file individually to calculate the sum.

 aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2015-07-15T10:00:00 --end-time 2015-07-31T01:00:00 --period 86400 --statistics Average --region eu-west-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=toukakoukan.com Name=StorageType,Value=StandardStorage

Important: You must specify both StorageType and BucketName in the dimensions argument otherwise you will get no results. All you need to change is the --start-date, --end-time, and Value=toukakoukan.com.

Here's a bash script you can use to avoid having to specify --start-date and --end-time manually.

#!/bin/bash
bucket=$1
region=$2
now=$(date +%s)
aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time "$(echo "$now - 86400" | bc)" --end-time "$now" --period 86400 --statistics Average --region $region --metric-name BucketSizeBytes --dimensions Name=BucketName,Value="$bucket" Name=StorageType,Value=StandardStorage

Stefan Ticu · Answer 4 · 2011-07-09T05:40:56+08:00

Stefan Ticu

2011-07-09T05:40:56+08:002011-07-09T05:40:56+08:00

s3cmd can do this :

s3cmd du s3://bucket-name

110

Christopher Schultz · Answer 5 · 2012-12-06T09:22:45+08:00

Christopher Schultz

2012-12-06T09:22:45+08:002012-12-06T09:22:45+08:00

If you download a usage report, you can graph the daily values for the TimedStorage-ByteHrs field.

If you want that number in GiB, just divide by 1024 * 1024 * 1024 * 24 (that's GiB-hours for a 24-hour cycle). If you want the number in bytes, just divide by 24 and graph away.

26

Hooman Bahreini · Answer 6 · 2019-08-29T22:08:24+08:00

Hooman Bahreini

2019-08-29T22:08:24+08:002019-08-29T22:08:24+08:00

If you want to get the size from AWS Console:

Go to S3 and select the bucket
Click on "Metrics" tab

By default you should see Total bucket size metrics on the top

25

dyltini · Answer 7 · 2015-04-24T03:22:31+08:00

dyltini

2015-04-24T03:22:31+08:002015-04-24T03:22:31+08:00

Using the official AWS s3 command line tools:

aws s3 ls s3://bucket/folder --recursive | awk 'BEGIN {total=0}{total+=$3}END{print total/1024/1024" MB"}'

This is a better command, just add the following 3 parameters --summarize --human-readable --recursive after aws s3 ls. --summarize is not required though gives a nice touch on the total size.

aws s3 ls s3://bucket/folder --summarize --human-readable --recursive

22

Brent Faust · Answer 8 · 2015-04-01T14:12:28+08:00

Brent Faust

2015-04-01T14:12:28+08:002015-04-01T14:12:28+08:00

s4cmd is the fastest way I've found (a command-line utility written in Python):

pip install s4cmd

Now to calculate the entire bucket size using multiple threads:

s4cmd du -r s3://bucket-name

15

user319660 · Answer 9 · 2015-03-10T07:43:09+08:00

user319660

2015-03-10T07:43:09+08:002015-03-10T07:43:09+08:00

You can use the s3cmd utility, e.g.:

s3cmd du -H s3://Mybucket
97G      s3://Mybucket/

9

Jim Zajkowski · Answer 10 · 2009-11-15T20:00:15+08:00

Jim Zajkowski

2009-11-15T20:00:15+08:002009-11-15T20:00:15+08:00

So trolling around through the API and playing some same queries, S3 will produce the entire contents of a bucket in one request and it doesn't need to descend into directories. The results then just requiring summing through the various XML elements, and not repeated calls. I don't have a sample bucket that has thousands of items so I don't know how well it will scale, but it seems reasonably simple.

6

How can I get the size of an Amazon S3 bucket?

AWS Console:

AWS CLI Command:

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?