We are experiencing issues with instability on some of our Windows 2003 32-bit TS'.
After a lot of Googling my suspicions are that it is running out of Page Table Entries (PTE's).
From what I can gather this is a problem when using /3gb switch on Windows 32-bit servers, and with TS' you can easily hit the limit.
How can you verify that this is what is happening? I have no experience with perfmon and limited experience with Process Explorer, and I don't really know what I am looking for.
More info: Always, the task manager process list is empty when this happens, also memory counters are blanked. The server typically have only around 65 users when this happens, but they run MSO and different accounting software. Some of which is pretty badly written and bloated. Common memory usage per user is 200-600 MB, but our servers never run out of available RAM. A few printers are installed on the servers, sometimes up to 20. The servers have been running smoothly with 70-80 users a few years back, but have been scaled down as its seemed to stabilize them.
Windows Server 2003 x86 kernel memory by default is grossly underconfigured for a heavily used terminal server.
To view the actual in-use values on the running system, you can use SysInternal's Process Explorer, under View > System Information. If the system is configured to use the maximum amount of Paged Pool and Nonpaged Pool, the Paged Limit will be 512 MB and Nonpaged Limit will be 256 MB.
To show this level of detail, the proper symbols must be loaded under Options > Configure Symbols:
If either the Paged Physical or Nonpaged are approaching the limit, there will be system instability. The registry values that configure these maximum limits are located at:
It's worth noting that having a large amount of physical memory may not be helpful, as x86 windows can only use a rather small amount for kernel memory space, and it cannot grow beyond what is shown in the limit. (x64 kernel memory limits are far less constraining). The limit is calculated dynamically at system startup time based on available memory and registry settings.
You can get more detail about what is using the kernel memory with the following Windows Debugger commands:
!vm
- shows information similar to the process explorer kernel memory limits.! poolused n
- displays information about paged/nonpaged pool usage. This can sometimes be helpful if a driver has a memory leak that is consuming excessive kernel memory.!poolused command
http://msdn.microsoft.com/en-us/library/windows/hardware/ff564700%28v=vs.85%29.aspx
!vm command
http://msdn.microsoft.com/en-us/library/windows/hardware/ff565602%28v=vs.85%29.aspx
70 to 80 users on a 32bit TS seems like a lot to me. Our planning number has always been 50 to 65 users. Howw much RAM is in the servers?
Using the /3GB on a TS is going to cause performance and stability problems. I've seen it first hand. It starts with not being able to load user profiles and quickly escalates from there. My suggestion would be to remove the /3GB switch and see what affect that has on the performance and stability.
You can find a description of how to determine kernel memory usage in my article Windows x64 – All the Same Yet Very Different, Part 2: Kernel Memory, /3GB, PTEs, (Non-) Paged Pool.
In short, by using WinDbg in combination with LiveKD you can establish a live debugging session with the local machine. From there it is easy to query for things like free PTEs, usage and maximum of (non-) paged pool.