I have two instances of mssql 2005 and am using CA XOSoft replication. The master is a failover cluster and the replica is a standalone server. They are all running Server 2003 sp2 x64. Same patch levels on all servers. This setup has worked great for several months until we recently restricted the RPC ports on both nodes of the master(5000 - 6000 using rpccfg.exe). We have to implement egress filtering, thus the limiting of the ports.
We began receiving login errors for sql windows authentication and NETLOGON Event ID: 5719:
This computer was not able to set up a secure session with a domain controller in domain due to the following: Not enough storage is available to process this command. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.
We also see group policies failing to update and cluster file shares go offline at the same time. The RPC ports were set back to default when we started seeing these problems and the servers rebooted, but the problems persist. The domain controllers are not showing any errors. Running dcdiag and netdiag shows everything is fine.
We have noticed that the XOSoft service ws_rep.exe is using a lot of handles(8 - 9k), about the same number that sqlserver is using.
As soon as xosoft replication is stopped the login errors cease and everything functions correctly. I have opened a ticket with CA for XOSoft, but I'm not sure that the problem is actually xosoft, but that it is the one bringing the problem to light.
I'm looking for tips on debugging RPC problems. Specifically on limiting the ports and then reverting the changes.
Check your non-paged pool memory. You're probably bottoming out with the application running.
As to your question, I find rpcping to be an invaluable tool in these situations. Here's an example of how it's used with Exchange:
http://support.microsoft.com/kb/831051