Greetings,
I am working on a piece of software that logs entries (and related tagging) in a PostgreSQL database for storage and retrieval. We never update any data once it has been inserted; we might remove it when the entry gets too old, but this is done at most once a day. Stored entries can be retrieved by users.
The insertion of new entries can happen rather fast and regularly, thus the database will commonly hold several millions elements.
The tables used are pretty simple : one table for ids, raw content and insertion date; and one table storing tags and their values associated to an id. User search mostly concern tags values, so SELECTs usually consist of JOIN queries on ids on the two tables.
To sum it up :
- 2 tables
- Lots of INSERT
- no UPDATE
- some DELETE, once a day at most
- some user-generated SELECT with JOIN
- huge data set
What would an optimal server configuration (software and hardware, I assume for example that RAID10 could help) be for my PostgreSQL server, given these requirements ? By optimal, I mean one that allows SELECT queries taking a reasonably little amount of time.
I can provide more information about the current setup (like tables, indexes ...) if needed.
Change some configuration settings in postgresql.conf: Writing: increase the wal-settings Reading: increase shared_buffers and set work_mem to an amount the planner can do its magic.
There several settings, you have to check and test.
RAID 10 is great for writing, reading will be fine as well. A good RAID-card with a lot of RAM and battery backup, will help a lot.
And, RAM, RAM and RAM. You can't have too much.