Some users are complaining of poor performance on a brand new server. The only running on this machine is Oracle 10.2.
At first glance everything looks ok: load is minimal, nothing in logs. The only thing i can find is that vmstat is complaining of lots of swapping in and blocked processes. Where should I begin troubleshooting this one?
# vmstat -S 5 5
kthr memory page disk faults cpu
r b w swap free si so pi po fr de sr rm s0 s1 s2 in sy cs us sy id
0 0 0 12420128 16679384 0 0 672 809 809 0 0 -0 2 2 -0 875 572 652 0 0 100
0 7 0 1926560 5871472 0 0 4396 11463 11463 0 0 0 0 0 0 1796 662 1731 0 0 100
0 2 0 1925984 5934624 0 0 19058 13657 13657 0 0 0 0 0 0 4877 1336 6145 0 1 99
0 3 0 1925984 6126144 0 0 12691 13821 13821 0 0 0 0 0 0 3708 1055 4537 0 1 99
0 5 0 1925984 6093776 0 0 6033 15628 15628 0 0 0 0 0 0 2215 745 2386 0 0 100
I'm sorry, but your vmstat output isn't really showing any swapping. First, the Solaris definition of "swapping" is when an entire process is swapped in our out due to extreme memory pressure. These are your 0 si and so columns. You shouldn't really see this in all but the most pathologically loaded systems. The pi and po columns can show "paging" activity. The activity that is normally called "swapping" on other systems is called "paging" in the Solaris terminology. But you need to run "vmstat -p" to look at the api/apo (anonymous page-ins and anonymous page-outs) numbers - this is what people usually refer to as "swapping". The pi/po columns include what is essentially normal filesystem activity. (e.g. memory mapped IO).
To identify the device in question, iostat will help you. Something like "iostat -dxzn 1". You'll likely see some 100% busy devices, since you have processes blocked on disk IO (the vmstat b column).
I'm too much of a newbie to add a comment, so I'll "answer"...
Try running "iostat -xcn 1" to see what your hot block devices are. Are you using raw volumes or formatted? ZFS or UFS (I'm assuming this is Solaris 10...)? What's your disk layout?
You are swapping a bit.. have you tuned up your semaphores and such? Using projects or /etc/system (again, assuming Solaris 10..)?
I'd generally start at getting more specifics from users. What exactly is that which is not working fast? Even if you get back "everything" try to identify at least one specific case, transaction, job, ... something to hang on and trace.
Then trace it through Oracle (for example trace 10046 if it is not too complex), check what server resources are used (which tables, on which disks ...), find out where time is spent, ...
In my experience when I saw 100% busy disks in
iostat -znx 3
, it was either oracle badly configured or an un-optimal execution plan doing full table scans or such.The other way is looking at what data is on busy disks (tablespaces or redo or what?), then look for either IO expensive transactions or long ops or something.
BTW - did you recalculate statistics and such when you moved to the new server? Oracle's dynamic optimizer is a bitch in both senses.
Well, if you're swapping, you're going to increase IO wait generally, which is going to lead to sluggish behavior. What is load at generally? Load can "feel" different depending on what's causing it. Load spikes due to heavy CPU utilization hampers performance, but it seems like if IO wait is heavy, a load of 15 feels like a 1000.
Looking closely at the numbers you posted shows 0 pages swapped in and out. Even with
-S
if your machine is swapping, you should not see a 0 in those columns.My bet is your machine is doing Memory Mapped IO, which has similar paging characteristics, but is not indicative of thrashing.
I don't have access to a Solaris machine at the moment, so I'm basing this on a copy of the man page for vmstat(1)