We've got an EMC NX4 SAN box serving a CIFS share to a number of Windows Server 2008 R2 app servers. The app servers are using the CIFS share to serve lots of image files (~2500 ops/sec on the share), however neither the SAN nor the app servers are showing any obvious signs of stress.
Once in a while an app server will, apparently all of a sudden, drop the connection to the SAN. Any .NET code trying to serve a file from the SAN fails with:
System.IO.IOException: The specified network name is no longer available
If I RDP to the app server and try to access "\san-name" through explorer, I get the same error. All other app servers can access it just fine. I can also access "\ip-of-san" just perfectly, pinging works as well.
A reboot of the app server fixes the issue, but that's a somewhat drastic measure to the problem, given that it seems like the SAN is working fine and the computer can access it - it just looks like the "\san-name" access has barfed up.
This has happened to two different app servers during the last week, so I don't suspect a single app server of being the cause. Ignoring the cause for now - how would I restore the "\san-name" connection without rebooting the machine? And can I somehow query what went wrong?
Event logs shows nothing (besides related ASP.NET errors caused by the issue), neither on app servers nor on the SAN.
Update:
Based on the suggestions I'll try a restart of the Workstation service the next time and see if that helps the issue. Definitely not a fix, but way faster to do than to reboot the whole machine as I've currently been doing. Any way to query the status of the connections that the Workstation service maintains?
Update 2:
Confirmed that restarting the Workstation service "fixes" the issue. Next step is to try the reg change to heighten the MaxCmds value. Won't be able to confirm whether it's the issue, can only assume if it runs for a lengthy period without issues.
This sounds like it's the MaxCmds have run out. Here are two good articles about that: here and here.
Here's now to change it. Create a file called update.reg and place the following in it:
Save and then double click and accept the prompt. A reboot is required.
maybe restart the workstation service on the app server!
I've had cases like this before, though not with an EMC back end. For userland applications, force-closing the connection to the remote server and reopening it will bring it back, though you may have to try a couple of times before it gets its act together. For serverland applications, recycling the Application Pool for that service works. If that fails, recycling the Workstation Service can avoid a reboot, but it's almost as drastic.
On the source :
Could you give more details on the software installed on the apps server ? On the net you will find that its usually a problem with an AV but since you don't run any... maybe another kernel-mode app like a backup software ?
Is the firewall active ? Have you checked event logs on the DC for the faulty app server ?
You should also sniff CIFS network traffic when the problem arises to see what happens.
The only times I ran into this error were when the server/workstation somehow "lost" its link with the domain. Re-forcing domain membership did the trick (netdom /resetpwd). Can you access other network shares (from the RDP session to the app server) when the problem arises ?
Can this be issue with name resolution. Can you check with your DNS Server? If that is not allowing to resolve the name and after rebooting your app server it would allow to access.
I had same issue when some workstation user complain that they were not able to access application stored in another server, we had done the same by trying to access with server-ip that would worked but not with name so we have checked DNS. We have made change in Application to access another server to using IP address as we have static IP network.
Let me know if my suggestion works for you.
I ran into a similar issue. I was not able to map a share to windows server 2012 from a windows 2003 server.
The network group had implemented an AD policy that had isolated the lower windows versions to an AD container that did not allow lower version of TLS to connect to servers running higher versions of TLS. Moving the server back or disabling the policy to connect with lower version of TLS corrected this issue.
Here are some errors I came across in the system log:
Hope it helps to resolve your issue.