I am troubleshooting an AWS RDS Postgres instance that has been restarted by AWS several times in the last few days, very likely due to resource constraints. It's a testing DB that usually doesn't do much but we recently put some higher load onto it. I found that the DB's EBS volume (200GB gp3) depleted its throughput credits and that the times of the DB restarts coincided pretty well with the EBSByteBalance% metric reaching zero. Then when the DB gets restarted, the volume apparently gets a fresh set of burst credits as can be seen in the screenshot below:
The credits now drop slightly slower as we have eased the load on the DB but they are still dropping. When I look at the current read and write throughput metrics, they seem to sum up to just about 5 to 7 MiB/s with occasional spikes:
Based on the information found here at Amazon RDS DB instance storage the baseline throughput for a gp3 volume below 400gb should be 125MiB/s. So can anyone help me explain why the EBSByteBalance% metric keeps decreasing in this scenario? Thanks!
Okay, I followed @Tim's advice and contacted AWS support. They clarified the following:
So what happened was that the T4G DB instance class also has an I/O and throughput limit that in our case was just around 10 MB/s. I was not aware of this and had a very hard time finding these performance numbers online. But for anyone wondering in the future they can be found here: https://instances.vantage.sh/rds/ They also confirmed that under resource constraints the RDS instance may reboot and see this as the obvious explanation for the behaviour we witnessed.
So the mystery is solved in our case. Hope this helps someone in the future