Initially I launched a brand-new Windows Server 2016 server EC2. I assigned a S3 full admin IAM role to this instance when launching it. I installed CLI on it. I started a CMD window, and typed in "aws s3 ls". It lists all my buckets. All working fine.
I then created an AMI from this instance. I launched a new instance from this instance with that S3 full admin IAM role. "aws s3 ls" still works.
Then, after a number of days, when I repeat the above process (launching an instance from the same AMI), "aws s3 ls" will stop working, with the following error:
Unable to locate credentials. You can configure credentials by running "aws configure".
It happened many times. Every time I rebuilt a new Windows Server, install CLI, assign the S3 full admin role to the instance, it works. After a number of days, when I launch a new instance from the exact same AMI, "aws s3 ls" will stop working.
It is so mysterious! Can someone shed some light on this please?
The IAM role does not get bundled with your AMI.
So when you launch your new EC2 instances from your AMI, you must assign the IAM role to the new EC2 instances that are launched. Without the role assigned at launch or afterwards, the CLI cannot find the credentials.
I had the same problem recently with a Windows EC2 instance deployed from my own AMI. As @Kugel (https://serverfault.com/users/138998/kugel) has mentioned in their answer (Previous answer of this), one can see that the NextHop values when is executed the Get-NetRoute command doesn't changes in the new instance, and the instance remains the previous values for the existent network configurations in the base instance from where the AMI was generated.
Then, to resolve this issue, I have created new Net Routes with the new NextHop values and deleted the oldest ones. Taking as example the output given by @kugel, the creation of the new routes will be:
Repeat the previous command for the rest of the Destinations prefixes (169.254.169.123/32, 169.254.169.249/32, 169.254.169.250/32, 169.254.169.251/32 and 169.254.169.253/32)
Once the new net routes has been created, you can delete the oldest with the command:
The previous command will delete all occurrences that has as NextHop that IP.
References for commands:
Set new route: https://docs.microsoft.com/en-us/powershell/module/nettcpip/set-netroute?view=win10-ps
Delete routes: https://docs.microsoft.com/en-us/powershell/module/nettcpip/remove-netroute?view=win10-ps
I found this question while having the same issue. What ended up solving it for me is this answer which I have quoted below so as to not have an answer with just a link.
As the top comment on that answer points out:
This happens to me all the time with new Windows 2019 instances. I create an instance configure it, snapshot it and then when I launch the AMI in a different subnet or AZ the IAM roles are broken.
What I found it is that the routing is gets horribly broken. You see, AWS uses magic IPs set up on the machine to retrieve various things, including IAM credentials.
When you snapshot a machine the Windows Route table gets snapshontted with it and when you launch an AMI the Routing table is not modified and ends up pointing to the wrong gateway.
Here's my example:
Please note how default gateway is
10.2.6.1
and is correct for this subnet. However all the magic routes still point to previous gateway10.2.4.1
from the old AZ.To fix this you have to manually fix the routes. If you have this in an Auto-Scaling group you might have to write a powershell that runs after boot to fix routes.