I'm having an issue with Nonpaged pool using 129GB of RAM on HP ProLiant DL380 Gen9 running Windows Server 2012 R2. I used Poolmon.exe to trace out that RaCT is using 127GB but I can't find any info on that RaCT Tag. I have looked through pooltag.txt file. Please let me know what is RaCT. Also, if there is any other tool to track down memory leak that would point at particular driver or kernel related libraries.
We recently had an issue on our live server that caused our Web App to stop responding. All we were getting were 503 errors until we rebooted the server then it was fine. Eventually I traced it back to the httperr.log and found a whole lot of 1_Connections_Refused errors.
Further investigation seemed to indicate that we had reached the nonpaged pool limit. Since then we have been monitoring the nonpaged pool memory using Poolmon.exe and we believe we have identified the tag that is causing the problem.
Tag Type Allocs Frees Diff Bytes Per Alloc
Even Nonp 51,231,806 50,633,533 684,922 32,878,688 48
If we use poolmon.exe /g it shows the Mapped Driver as [< unknown >Event objects].
This is pretty much no help at all. My team has spent considerable time researching this problem and haven't been able to find a process to narrow this down to a specific application or service. I get the sense that most people seem to solve the problem by killing processes on the machine till they see the nonpaged memory reset. This is not exactly what you want to see when working on a production machine.
If I open up Task Manager and view the process list. I see MailService.exe with an NP Pool value of 105K this is 36K higher than the value of the process listed second. As we have had some problems with our Mail Server in the past (which may or may not be related to this issue) my gut feeling is that this is causing the issue.
However, before we go off restarting services, I'd like to have a little more certainty than just a "gut feeling".
I've also tried using poolmon.exe /c but this always returns the error:
unable to load msvcr70.dll/msvcp70.dll
and it doesn't create localtag.txt. My colleague had to download pooltag.txt from the internet because we can't figure out where it is located. We don't have win debugger or the win DDK installed (that I can see). Maybe the above error is given because we don't have either of these installed - but I don't know.
Finally I tried:
C:\windows\system32\driver\findstr /m /l Even *.sys
This returned a fairly sizeable list of .sys files and again wasn't at all helpful with the problem at hand.
So my question is this: Is there any other way to narrow down the cause of this memory leak?
UPDATE:
As suggested below, I have been logging the Pool Nonpaged Bytes for the last day or so to see if any process is trending up. For the most part all of the processes appear to be fairly static in their usage. Two of them look to have ticked up slightly. I will continue to monitor this for the next few days.
I also forgot to mention earlier that none of the processes appear to be using an excessive number of handles either.
UPDATE 2:
I have been monitoring this for the last couple of weeks. Both the Nonpaged Bytes Pool for individual processes and the total Nonpaged Bytes Pool have remained relatively stable during that time. During this time Windows was updated and the server rebooted so I am wondering if that has solved the problem. I am definitely not seeing the consistent growth in the Nonpaged Bytes Pool now that I was prior to this.
Is there a version of poolmon available for Windows Server 2008 64-bit? This KB article says it only applies to versions up to Server 2003. Is this tool (or something equivalent) available for Server 2008?
(I'm new to the Windows Server world, and looking for tools to help track down an apparent kernel-space memory leak on some servers running particular web services. I would also welcome any suggestions for other tools to use.)
I've read the KB articles about poolmon but they don't tell me how to analyze the numbers. My first guess is to look for drivers where the value in the column "Diff" is very high. Is that correct?
In my case, that would be these processes:
Tag Type Allocs Frees Diff Bytes Per Alloc
Ntfr Nonp 2690737 2528557 162180 10379976 64
Ntfn Nonp 1397933 1304230 93703 3750928 40
NtFs Nonp 2385330 2291634 93696 3749056 40
File Nonp 13789939 13704656 85283 13203912 154
So that would mean the Ntfs driver has a memory leak which I doubt :) So what should I look for?
I'm hunting of a memory pool leak using poolmon. In the KB article, they explain how to capture the output manually using cut&paste. Isn't there a way to automate this?
Since the tool doesn't seem to support it, my idea was to run two command prompts (one for paged and one for nonpaged pools), and use a tool to make an automatic screenshot. If this was possible, which tool would you suggest? Is there a tool that can cut the text out of a command prompt without manual intervention?