Similar to what is described in this article[0], the company I work for uses a bastion AWS account to store IAM users and other AWS accounts to separate different running environments (prod, dev, etc.). The reason this is important is that we have multiple AWS accounts and in some unique cases these AWS accounts need access to a single S3 bucket.
A way to enable this to work correctly is to set a bucket policy that allows access to the bucket from the S3 Endpoint from a particular AWS Account's VPC.
Bucket Policy for
data-warehouse
{ "Sid": "access-from-dev-VPCE", "Effect": "Allow", "Principal": "*", "Action": "s3:*", "Resource": [ "arn:aws:s3:::data-warehouse", "arn:aws:s3:::data-warehouse/*" ], "Condition": { "StringEquals": { "aws:sourceVpce": "vpce-d95b05b0" } } }
Role policy for role
EMRRole
{ "Sid": "AllowRoleToListBucket", "Effect": "Allow", "Action": "s3:ListBucket", "Resource": [ "arn:aws:s3:::data-warehouse", ] }, { "Sid": "AllowRoleToGetBucketObjects", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::data-warehouse/*" }
Unfortunately this doesn't work until I've explicitly set the ACL for each object to allow full control to that object by the owner of the AWS account I'm accessing from. If I don't do this, I get:
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
My instance that I'm running this on (EMR) has the correct role:
[hadoop@ip-10-137-221-91 tmp]$ aws sts get-caller-identity
{
"Account": "1234567890",
"UserId": "AROAIGVIL6ZDI6SR87KXO:i-0eaf8a5ca52876835",
"Arn": "arn:aws:sts::1234567890:assumed-role/EMRRole/i-0eaf8a5ca52876835"
}
The ACL for an object in the data-warehouse
bucket look like this:
aws s3api get-object-acl --bucket=data-warehouse --key=content_category/build=2017-11-23/part0000.gz.parquet
{
"Owner": {
"DisplayName": "aws+dev",
"ID": "YXJzdGFyc3RhcnRzadc6frYXJzdGFyc3RhcnN0"
},
"Grants": [
{
"Grantee": {
"Type": "CanonicalUser",
"DisplayName": "aws+dev",
"ID": "YXJzdGFyc3RhcnRzadc6frYXJzdGFyc3RhcnN0"
},
"Permission": "FULL_CONTROL"
}
]
}
In the above ACL, the dev
AWS Account will be able to read the object but another AWS account, say prod
, will not be able to read the object until they've been added as a "Grantee".
My question: Is there a way to read/write objects to an S3 bucket from multiple AWS accounts without having to set ACLs on each individual object?
Note: we use spark to write to s3 using s3a.
While I have not found a way around setting ACLs on a per-object basis, there is a way to enforce that ACLs are correctly set on upload using a Bucket Policy. This example policy shows how to allow an AWS account to upload objects to your bucket and requires that the bucket owner is granted full control of all uploaded objects:
}
The key is the explicit deny which checks for the
x-amz-acl: bucket-owner-full-control
header (mentioned by Michael-sqlbot in the comments) and fails any upload where this is not set. When using the AWS CLI to upload files this requires the --acl bucket-owner-full-control flag to be set.Example:
Hopefully AWS will provide a way to address ACLs more gracefully at some point.