Problem
I am observing that poll()
calls are taking milliseconds when immediately called after a write()
operation, yet taking only microseconds when immediately called after a read()
operation.
This is in code that uses the ACE C++ communication library.
I'm trying to understand why there is such a significant difference in the system CPU time taken to call poll()
after a write()
call compared to a read()
call.
System
[root@host ~]# lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Release: 5.9 Codename: Tikanga
strace
write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000048> poll([{fd=234, events=POLLIN}], 1, -1) = 1 ([{fd=234, revents=POLLIN}]) <0.006036> read(234, "\4\0S\0\0\0\0\0\1\0\0\0 write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000192> poll([{fd=234, events=POLLIN}], 1, -1) = 1 ([{fd=234, revents=POLLIN}]) <0.004996> read(234, "\4\20S\0\0\0\0\0\1\0\0\0\200\2\0\0\0\0\3\0\1\0\0\0\0\10\1\0\200\2\0\0"..., 256) = 256 <0.000076> poll([{fd=234, events=POLLIN}], 1, -1) = 1 ([{fd=234, revents=POLLIN}]) <0.000022> read(234, "\2\0\0\0\241\352\1\0\0\0\0\0\25\0\0\0\7\0\0\0\242\1\0\0\25\0\0\0\251\1\0\0"..., 640) = 640 <0.000337> write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000096> poll([{fd=234, events=POLLIN}], 1, -1) = 1 ([{fd=234, revents=POLLIN}]) <0.004102> read(234, "\4\20S\0\0\0\0\0\1\0\0\0!\1\0\0\0\0\3\0\1\0\0\0\0\10\0\0!\1\0\0"..., 256) = 256 <0.000037> poll([{fd=234, events=POLLIN}], 1, -1) = 1 ([{fd=234, revents=POLLIN}]) <0.000157> read(234, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\10\1\0\0\0\0\0\0\0\0\0\0"..., 289) = 289 <0.000047> write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000036> poll([{fd=234, events=POLLIN}], 1, -1) = 1 ([{fd=234, revents=POLLIN}]) <0.001882>
Discussion
Even though the ACE library has an option to use it there is probably little benefit in moving to epoll()
because poll()
is only being called with one file descriptor in this case.
As the system becomes loaded with 600 odd threads this poll()
call is consuming around 98% of the system call CPU time of the process involved. So I'm looking for ways to make this more efficient. It does appear, however, that calling poll()
after a write()
is the issue as this is consistently in the order of 100x slower than after a read()
call.
The difference between the time between one syscall and another is simply the difference of time between the precedent and the next one. It is simply a measurement purpose timestamp so that users will know how much time a syscall takes after another and to find out whether the app is hanging on one particular syscall.
However I am a bit befuddled to see that reads are quicker here. Simply because writes are often async but reads tend to be in straight line and if we are hopping across different filesystem blocks, disk seek can make it slow. But see the revents (requested events - third argument in poll() call) is saying POLLIN meaning data to be read. Perhaps that might explain the apparent slowness. Albeit, I will need the entire strace to be sure.
I believe I may be having issues believing what
strace
tells me. Looking at the source code ofstrace
we can see that when the system call is about to be entered the current time is captured, and the current time is captured when the system call has exited.In other words strace is measuring the wallclock time between entering and exiting the system call. Which means that it isn't measuring how much CPU time has been consumed by the system call. Blocking calls, then, can appear to consume most of the system time by the application - however this is wallclock time and has no correlation with the amount of system CPU used.