Background:
- Postgres 10.9
- DB runs as a docker container on the dev hosts. (t3.large, gp2 500GB storage)
- DB runs in RDS for staging and production. (m5.2xlarge, gp2 1TB storage)
Everything works great, been like that a long time, and my db alter timings seem to always be faster in prod/staging versus dev (as expected).
Problem / Question:
I have a specific index create that is taking 20x longer in RDS (which is more powerful) than it does on the local dev host. Every other case I've seen in the last couple years, the RDS host is faster because it has more compute power and higher I/O speeds.
- Data and schema is identical between the instances. Using pg_dump + pg_restore to load the dev databases with fresh data nightly.
- The table is relatively large (30 million rows) compared to other tables in my db (mostly less than 1mil)
It's a simple index operation:
CREATE INDEX idx_email_records_created ON email_records(created_at);
On the local linux dev box:
db=> CREATE INDEX idx_email_records_created ON email_records(created_at);
CREATE INDEX
Time: 68523.557 ms (01:08.524)
On the RDS host:
db=> CREATE INDEX idx_email_records_created ON email_records(created_at);
CREATE INDEX
Time: 1490902.929 ms (24:50.903)
I've checked all the normal stuff: CPU load (plenty free in all cases), Memory (plenty free in all cases). Locking/table usage, etc.
The dev hosts are restored with a fresh clone of prod db nightly, so there is no discrepancy in the number of rows.
I've checked max_parallel and experimented with things like ALTER TABLE email_records SET (parallel_workers = ##);
but nothing seems to make a difference.
Any help is appreciated
Here is a summary of the root cause:
Timings were gathered after an RDS snapshot restore. Apparently it's normal for the instance to be slow after a fresh restore. I had never hit this issue before, but I've never needed to work with such a large set of data right after a restore before.
In my case, I just did whole table select for the offending table to sort of "pre-pull" the data from the initial restore. Then I ran the create index and it finished blazing fast.
So, in summary, RDS snapshot restore will have a performance lull after a fresh restore.
More info: https://cintia.me/blog/post/lazy-rds/