A previously nicely functioning Remote Desktop Server Farm ahs stopped working two days ago. The setup is as follows:
- DNS resolves "myfarm.mydomain.local" to the IPs of all the farm member servers
- All farm member servers are configured as farm members of farm "myfarm" on Broker MYBROKER
- All farm members are members of the local session broker group on MYBROKER
- Clients are configured to connect to "myfarm"
(All servers involved are virtual Windows2008R2 boxes). Suddenly people started getting the following error (translated from German) and failed to connect.
Unable to connect to terminal server.
The terminal server farm "myfarm" that you are trying to connect to is redirecting you to server "farmmemberX.mydomain.local". Remote desktop cannot verify that this server belongs to the same server farm. This can occur if there is a server on your network with the same name as the server farm. ?It cannot be verified if both remote computers belong to the same remote desktop server farm. You must use the farm name, not the computer name, if you want to mak a connection to a remote desktop server farm.
Contact a network administrator to obtain support if you use a RDP connection that was prepared by the administrator.
If you want to connect with a specific farm member, use "mstsc /admin"
Sometimes it seemed also that they were simply rejected as if having entered wrong credentials (which they have not, most have saved their credentials)
Question 1. Can you explain what is behind this, specifically regarding the "it could not be verified": How does this verification take place if it works? After all, the redirection would not even be attempteed if it were not initiated by broker ...
What we tried: Sometimes it helped to replace the name the client connects to with something else (i.e., we added a name "foo" to DNS that resolved to the same IPs and had users connect to "foo"), but this was far from consistent.
Later we noticed that always the same few servers appeared as "farmmemberX" in the above error message. We experimentally removed these from the farm (at the members themselves and in DNS) and were thus able to reduce the broken eight server farm to a functional two server farm. As this would not be sufficient tor out user load, I wanted to clone one of these; in order to do so I first shut it down and later restarted it - from which moment on it was as bad as the other six servers. Apparently restarting the RDP servers was the fatal thing to do ... According to the logs, this particular server had not been restarted for about two months. So virtually any change made in the last two months could be relevant. Among these are
- We added our first Win2012 AD server into our otherwise Win2003 AD structure
- I recall there were a few cases of IE10/SSL/TLS related security problems that would require manual intervention (regedit and stuff), but am still trying to remember what these might have been
- Tons of windows updates
- Things like invalidated certificates came to my mnd, but I found no such thing
Question 2. Could any of these things cause this problem?
Currently, we dropped the server farm completely, i.e., we only have "poor man's load balancing" via DNS round robin (and we especially miss the reconnect-to-previous-session feature of course)
Main Question. How can I get my farm into working condition again?
EDIT: I should have mentioned that a few clients were lucky and had not peoblems with the RDP farm: those who were still running Windows XP and its older RDP client ...
EDIT after comment: It seems that the main blamed updates KB3002657, KB3035017 were either not installed, or had been installed days before the problem started on the relevant servers (clients, RDP servers, broker, DCs), but I'll try with unsinstalling them anyway ...
UPDATE Some more info:
- I enhanced the event logging on the broker. According to that log, all is fine (no warning) and the session redirect completes normally. It is just that after soe timeout the target session is removed. I tried (failed) in quick connections and in that case, the broker even logged that it tried to reuse an existing session.
- If the target RDP server is set to "RDP-Sercurity" instead of "negotiate", the redirect works (except that the expectable annoying error messages occur to the client)
- I tried a completely new farm (i.e., a different broker with different hosts) and the problem can be reproduced in this system as well. This may suggests that the problem is client-side.
UPDATE with info as requested per comments
If I set security to "TLS 1.0" (instead of "negotiate") at the RDP hosts, the problem persists. If I set to "RDP" the farm works - but everybody has to enter their password twice. In the error situation, for some reason I now often simply get "No connection could be established with the given credentials" instead of the original error. This is accompanied with a login failure event 4625 with status 0xc000006d, substatus 0. Before you ask: All DCs have their clocks in good sync; no LanMan compatabilty settings have been configured in the registry.
The certificates on the RDP host client settings that worked were issued by the still trustworthy internal CA (trusted by all as per GPO) and valid until at least four months in the future. For testing I cheanged these to "automatic" certificates and back, without success.
The original German error message text reads
Von Remotedesktopverbindung kann keine Verbindung mit dem Remotecomputer hergestellt werden.
Vom Remotecomputer "FARMNAME", mit dem Sie eine Verbindung herstellen möchten, werden Sie zum Remotecomputer "FARMMEMBER.DOMAIN" umgeleitet. Es kann nicht überprüft werden, ob diebeiden Remotecomputer zur gleichen Remotedesktop-Sitzungshostserverfarm gehören. Sie müssen den Farmnamen, nicht den COmputernamen, verwenden, wenn Sie eine Verbindung mit einer Remotedesktop-Sitzungshostserverfarm herstellen möchten.
Wenden Sie sich an den Netzwerkadministrator, um Unterstützung zu erhalten, wenn Sie eine RDP-Verbindung verwenden, die vom Administrator bereitgestellt wurde.
Wenn Sie eine Verbindung mit einem bestimmten Fammitglied herstellen möchten, um es zu verwalten, geben Sie "mstsc.exe /admin" an der Eingabeaufforderung ein.
To find out if some recent cleint-side update is at fault, I started with a fresh Windows 7 box and tested after each bunch of updates. It seems the introduction of the first better-than-XP client already causes problems now - but the first such client versions give a different error message (not that it makes sense):
Die Verbindung kann nicht hergestellt werden, da es sich bei dem erreichten Remotecomputer nicht um den angegebenen Computer handelt. Dies kann durch einen veralteten Eintrag im DNS-Cache verursacht werden. Verwenden Sie statt des Namens die IP-Adresse es Computers.
It seems rather difficult to find the needle in the haystack here, but I believe this is a configuration error somewhere. Following this should give you a working baseline configuration:
<domain>
for<farmname.domain>
pointing to each of your session hosts in the farm<sessionhost.domain>
with a Subject Alternative Name of<farmname.domain>
and install / enable them for RD services on each of your session hosts<farmname>
at<connectionbroker.domain>
(all settings in Administrative Templates/Windows Components/Remote Desktop Services/Remote Desktop Session Host/RD Connection Broker):Distributed COM Users (built-in)
<farmname.domain>
Good luck!
Thanks for all suggestions, but noe of them matched. I have no idea why the observed and decribed pattern of malfunction (and temporal development of malfunction) occured, but the culprit is hidden in what I described as
It's KB3002567. An update that soon after its release became known as "breaking RDP" - or in fact breaking everything. Ironically, a quick research after the first encounter of our problems had already revealed this (at least the RDP problems, as that's what we had googled for), so we marked KB3002567 (and a few other suspicious ones) for uninstall on our WSUS (cf. my optimistic remark to that end in the OP) and otherwise frozen update synchronization for the time being. What we failed to notice was that the Windows server 2003 version of this update considers itself as not uninstallable. Thus while we noticed during a test update how the patch got removed successfully e.g. from a Win2008 server, we thought that the removal has occured sucessfully as well on our AD servers (Win 2003) over night (as they begged for nothing, update-wise, on the next day). Since the probelm persisted, we assumed that the update had not been the problem after all (and indeed rdp was not totally broken - we managed to workaround the problem at the expense of user comfort). The Win 2012 version on the other hand was automatically uninstallable. As a consequence, it depended on which server was used for authentication whether RDP worked or didn't work. We wrongly concluded that server reboot made the previously "installed" problem appear - when in fact the reboot just happened to switch authentication server priorities. We also wrongly concluded that our AD migration tests were the cause of problems and demoted and removed the 2012 server, then starting to look for any problems this playing with AD might have had. Since the problem had steadily increased in intensity anyway, we were not too suspicious when we noticed that failure often had turned into failure always at the same day we got rid of the 2012 AD server (though the connection is obvious in hindsight).
When our search kept coming up with the same useless suggestions (check that time diff between servers is less than 5 minutes - check, it's just fractions of seconds; check that all relevant group memberships are set - this really gets boring when doing it a second time; check DNS entries - there's really little in DNS that could have gotten wrong unnoticed; check that KB3002567 is not installed - hey, our WSUS took care of that, didn't it?) we began tearing our hair out. When then another hint towards KB3002567 appeared, we finally scanned through the list of installed updates on our Win2003 AD server (heck, that's really become simpler with modern OSes) to surprisingly still find it installed. Uninstall manually, reboot, everybody happy immediately!