I have a heavy postgres query. It takes over ten minutes to run. I'd like to upgrade my hardware to get it run faster. I think more RAM will help, but my motherboard is full so I'll need a whole new motherboard. I don't want to invest unless I know I'll get better results. Here's what I see in iotop:
Total DISK READ: 46.81 M/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 27 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [kswapd0] 2514 be/4 postgres 46.81 M/s 2.45 M/s 0.00 % 18.36 % postgres: postgres db1 127.0.0.1(55328) SELECT 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] 4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0] 5 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0] 6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] 7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0] 8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1] 9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/1:0] 10 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
Notice how the kswapd0 is hitting the hard drive the most (99.99% IO). However, the DISK READ and DISK WRITE and SWAPIN of kswap0 are all zero. What is kswap0 doing? Is it really hitting my hard drive? Would adding more RAM to this system help at all?
You are approaching this from the wrong angle. Only after optimizing your SQL should you consider throwing more RAM/CPU/Disk (I/O bandwidth) at the problem -- and addressing the problem as you've identified it.
First ask Postgres to
EXPLAIN
(orEXPLAIN ANALYZE
) how it is performing the query.Optimize the ever-loving kittens out of that, and then if you still have performance problems investigate further to determine where the bottleneck is (sonassi gave you some good suggestions on stuff you should look at in addition to
iotop
).If you are running your web stack and DB on the same server, now is a good time to split them up as well...
are you 110% sure there are no problems with your IO-sub system - eg all hard drives are fine [smart / raid self test], write-back cache is enabled [if you have battery backed cache on raid card]?
isn't postgres saturating the IO by 48MB/s read transfer?
kswapd are doing part of the job required to deal with the virtual memory on linux. So having it at 100 % could suggest that it is doing too much work, although it could also be that it is just starving from resources taken by your PG process (e.g. having no RAM left for buffers IO).
I would check how much ram is used/unused through the usual tools.
Regarding the lack of stats besides the IO column for kswapd0, I am not sure the following explanation is correct, but it may be that the kernel cannot trace swapin/swapout in kernel space (which is where kswapd lives), because Linux simply does not swap anything out for pages allocated in kernel space.
You've failed to mention any other important pieces of information for diagnosis.
It is likely that the I/O from
kswapd
is just being pseudo reported from that ofpostgres
. One of the processes will probably be in theD
state, but all the data from the commands above would confirm. Although it is curious thatswapin
is empty.