I first must apologize for the vagueness of this. It is rather hard to pin down, which is why I turn to posting this.
The environment is Windows 2012 R2 Citrix 7.16 servers, multi-tenant (which is the reason for App-V being used).
First a few things about the application.
- The application is sequenced in the latest App-V 5.1.
- The application registers an exe file on a network share, during sequencing.
- The applications client part consist mostly of registering this file. There are not local files on client.
- The share is read/execute (share and NTFS permissions)
- The application worked fine until some time about 6 months ago. Then all new packages exhibit this behavior.
- Does not happen when application is not virtualized in App-V.
A bit about how it manifests:
The error always occur after normal working hours, mostly a few hours after. (Our working theory is that perhaps something during a user logout/idle session auto-logout or a session reconnect that triggers this.)
The error is essentially that users can't start the application. Nothing happens.
It is easy to spot this error because in Task Manager all application instances affected have no icons. Like Task Manager can't access/read out the resource, however we have tested access and the file and share is "open for business".
If we then proceed to kill all instances of the application, then users can start the applications again.
Perhaps relevant is that there are other applications in the packages that can be running. So the Virtual Environment has not shut down for all users, and the package has been "in use" all the time.
This technet article might be relevant - perhaps this is related to the shared resource of a cached file. Very important though: This does not happen when the application is not virtualized in App-V.
We're going to try to close all sessions that go idle/disconnected from the different tenants with this issue, and see if that helps, but it is still not a very good fix.
Other than that I just hope someone somewhere has experienced something similar and found the root cause, or that someone a smarter and more knowledgeable about the core technologies in use here can perhaps understand what is happening, or give me some ideas as to what we can try next.
We found some error messages in appv event log today (error 0x7A602510-0xF), which lead to this dead end.
Tried aggressively logging out users yesterday to eliminate issues with session reconnect. No luck. Just two users logged on and active, and a third one triggered the error, no reconnect, no other disconnected/idle sessions.
This ars-thread looks to be the liveliest and most relevant one I've seen so far (thanks @TrententTye!). Will try accessing the application-file a few different ways, FQDN, IP, perhaps mapped drive. Also the user kttii writes that Win2016 might have fixed the issue for them. And lastly some WannaCry-patches from May 2017 are mentioned, which actually align pretty well with when we started getting the error.
A humongous thank you to everyone who have retweeted and contributed on twitter! You guys are amazing.
edit: Found error message and technet dead end.
edit2: @TrententTye contributed this ars-thread which looks to be the same issue. Going on from 2010/Win2003 to 2017/Win2012!
I'm answering this myself as we found the bug and I made a workaround.
This is the bug, we know that now: https://support.microsoft.com/en-us/help/2536487/applications-crash-or-become-unresponsive-if-another-user-logs-off-a-r It has also appeared a few times when App-V is not used, but 98% of the time it is when the application is virtualized.
This is the workaround:
1 Create a scheduled task on the RDS/Xenapp server where the bug manifests. Set it to start at boot or shortly thereafter. It must start before any users start the app. This is the scheduled task:
Application:
PowerShell.exe
Parameters:
-command "& 'C:\Program Files (x86)\Script\ReadLockFilesInFolder.ps1' '\\server\folder\'"
2 Save this as a PowerShell script:
The script works by opening the files non-exclusively and holds the magic first handle, thus preventing it from being released.
Notes:
The script releases the handle after 24 hours. The script only locks files in the first folder. Throw in a "-recurse" behind Get-Childitem to recurse down through all folders.
This has now worked well for us. I can also confirm that it does not occur on Server 2016, as the KB describes. I hope this helps