Ping a Specific Port

Question

MikeyB

Asked: 2012-04-28 13:34:00 +0800 CST2012-04-28 13:34:00 +0800 CST 2012-04-28 13:34:00 +0800 CST

Unexplained periodic humps in system CPU usage on W2K3E

772

We have a Windows 2003 R2 Enterprise 64-bit server running a custom workload suffering from an odd performance problem. The pared-down version below suffers from a smaller hump, but it's qualitatively the same.

We've reduced it down to a simple trivial app doing nothing more than:

listening on a socket
joining a multicast group
listening for packets coming in on that group
reading and discarding the packets

The test application itself is a slightly modified version of the Boost ASIO multicast receiver example so there's not really much that ought to be going wrong. Actual code(!) below…

Every so often while running this program under load, the CPU for this process will ramp up with all of the processing happening in kernel code:

perfmon graph (only CPU 6 is shown here. For the duration of this test (3h17m) all other processors are idle)

As you can see from the graph, when the load spikes hit all of the processing time is happening in kernel code. The time spent is mostly spent in Deferred Procedure Calls (max 16.8%) and handling Interrupts (max 8.5%). It looks like there's some sort of deferred cleanup happening, but we have no idea what it could be.

As far as we can tell it's happening on W2K3E-64 only.

It is happening on different hardware (HS21, HS22, HS22V, HP DL380).

Running the test application on Windows 2008 demonstrates the problem to a much smaller extent (more often but smaller humps).

How can we fix this or where should we look next?

Actual code from the example:

void handle_receive_from(const boost::system::error_code& error,
    size_t bytes_recvd)
{
    if (!error)
    {
        ++m_receivedPackets;
        m_receivedBytes += bytes_recvd;
        m_last64TotalBytes += bytes_recvd;
        if ( ( m_receivedPackets & 0x3F ) == 0 )
        {
            printf( "Received %u bytes in %u packets. The average size of the last 64 packets was %u bytes, and the last byte received was %x.\n", 
                m_receivedBytes, m_receivedPackets, m_last64TotalBytes / 64, m_buffer[ bytes_recvd - 1 ] );
            m_last64TotalBytes = 0;
        }

        m_socket.async_receive_from(
            boost::asio::buffer(m_buffer, max_length), m_senderEndpoint,
            boost::bind(&receiver::handle_receive_from, this,
            boost::asio::placeholders::error,
            boost::asio::placeholders::bytes_transferred));
    }
    else
    {
        std::cerr << "An error occurred when performing an asyncronous read." << std::endl;
        m_socket.get_io_service().stop();
    }
}

3 Answers

Voted

Kyle Brandt · Answer 1 · 2012-04-29T12:44:11+08:00

Kyle Brandt

2012-04-29T12:44:11+08:002012-04-29T12:44:11+08:00

"It looks like there's some sort of deferred cleanup happening, but we have no idea what it could be."

This could be garbage collection, but I am not sure if garbage collection shows up as privileged time. If this is a .NET application, you can look at the .NET CLR Memory performance counters (Gen 2 in particular is expensive).

In this end guessing possible issues seems a bit backwards. Your best bet would be to profile your application and see what it is doing during this to see what the calls the application is making. You might be able to get away with just using Process Monitor to watch the syscalls.

1

wfaulk · Answer 2 · 2012-05-07T07:30:22+08:00

wfaulk

2012-05-07T07:30:22+08:002012-05-07T07:30:22+08:00

I assume that the system is receiving multicast packets. Can you try and prevent it from receiving the packets and see if you see the same problem?

What about joining the multicast group but then not listening for packets?

You say its happening on different systems, but what about the actual NIC hardware? It's possible that it's the same on those different systems.

Update: If all of the systems are using Broadcom NICs, it's possible that the problem is with the NIC. In particular, the Microsoft-supplied Broadcom drivers are lousy; the ones at Broadcom's web site are much better.

1

Joseph Kern · Answer 3 · 2012-05-09T19:13:33+08:00

You could look at two things: your thread quantum and what's causing your DPC (delayed procedure calls).

Thread quantum is very easy to address (probably a red-herring, but may as well check it out);

Right click on My Computer
Select Properties
Select the Advanced tab
Select "Settings ..." under Performance
Select the Advanced tab in the new windows (now we're DOUBLE advanced!)
Under Processor Scheduling which is selected? "Programs" or "Background Services"?

Most likely Background Services is selected, try selecting Programs. This will decrease the amount of time between interrupts and allow more threads to be run in the same amount of time on the processor. You get more interrupts, but less processing time.

Delayed Procedure Calls are a little more difficult to diagnose;

As stated by @wfaulk this usually points to a driver issue. There's a handy tool called DPC Latency Checker which will help you diagnose these issues. Even though this is happening across multiple hardware platforms they all may still share a driver in common. Run DPC Checker and follow the instructions on their site.

Three follow up questions:

Are you using teamed-NICs? They use the TCP/IP stack to communicate with each other and may cause serious DPC issues.
Do your NICs support TCP Offload? Is it enabled?
(A complete shot in the dark) Is your test server part of a domain? GPOs get refreshed every 90 minutes by default ...

Unexplained periodic humps in system CPU usage on W2K3E

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?