Note: I've read How Often Do Windows Servers Need to be Restarted? but this question pertains to our Remote Desktop server specifically.
We have a Windows Server 2008R2 server - a VMware ESX VM - licensed for Remote Desktop Services, 25 users that also does RRAS (SSTP). On an average weekday, during working hours, there are between 8 and 12 logged-in, active users with an additional 4-6 "disconnected" users. It has a 12 GHz CPU hard reservation and 16 GB RAM, also entirely reserved. The CPU reservation is expandable to 24 GHz max when needed.
Many of our users rely exclusively on the server to work. They also complain bitterly about its performance but many are unwilling to change working habits or software to improve its performance. Specifically:
- Users refuse to log off instead of disconnect
- Users insist on using Lync 2013 instead of Lync 2010 (Lync 2013 is a notorious resource hog)
I cannot overstate the significance of their refusal to log off. Disconencted users continue to hog RAM while disconnected, which means that at any given time, we have up to 16 instances of certain programs running.
I've also noticed through experience that leaks/zombies tend to add up the longer a Remote Desktop server has been running. After a reboot the server is fresh and much faster, even when comparing performance after many users have logged in. I've also read that regular reboots can be helpful.
So I have proposed regular reboots of the VM - I would like to do it weekly, say on Saturday evening - as I feel these reboots would solve a lot of the problem.
I would like to know, if you are a Windows admin,
Am I right about the fact that garbage/zombies/leaks accumulate with session time, even after a user disconnects/reconnects?
How often do you restart a similarly-utilized Windows Server with Remote Desktop Services?
Generally, I'm opposed to the idea that a Windows server should be rebooted on a regular schedule EXCEPT in relation to TS/RDS servers. We reboot ours every day. It clears up old sessions, releases in use resources (CPU, RAM, file handles, etc.), so my opinion and suggestion would be that you do configure a daily scheduled reboot of your RDS servers.
Note that this answer is only my opinion. There's no statement of fact here.
Setup the appropriate group policies to auto-logoff them. You can separately control an idle timeout and logoff. That should certainly minimize some of the issue during the day.
I restart my 3 server TS farm daily at 3:00am. Because, yes crap can build up over time when you have lots of people using a single system. We have 3 servers shared between 60-90 people depending on the day, time of year.
I probably don't need to reboot this frequently, but we started using terminal services with Windows 2000, and our printer drivers were horrible at the time. The print spooler would basically fail after a day or two of being up. So we started rebooting nightly since we didn't have any leverage to get the Printer manufactures to fix their crappy drivers.
Depending on your cash, time, and the savviness of your users, another idea could be to stand up a second server. You'll still need to reboot occasionally, but you seem to be reaching the limits of a single server.
You should be able to use the same client CAL's (licensing's not my strongest area), and depending on your virtualization solution an additional VM may already be covered by existing licensing.
Even without additional VM resources and with the extra OS overhead, you may find the system handles better as two separate 6 GHZ CPU and 8GiB memory VM's, assuming you can split the load evenly. There are three potential methods:
Set a long TTL on your round-robin entries if you don't want clients leaving disconnected sessions on one server once their DNS cache expires and they acquire the IP of the other server. Alternatively make the hostname of the computer they've connected to obvious (e.g. make it part of the background), and ask them to re-connect to that hostname if they want to resurrect their session.
† If they will always be using the same desktop, simply modify the hosts file on the local desktop. If they move between machines, write a script (distributed via group policy) to parse the host file such that the DNS entry they currently use for the server points to the IP of the server that particular user should be using. Replace the line containing that DNS name if it already exists, or add it to the end of the file if it does not.
I am familiar with the "user type" that refuses to logoff. However, they seemed to have no issue understanding that the Server would be rebooting nightly so any unsaved work would be lost. This is on Server 2008 R2 TS Supporting About 20 users on a single machine.
> Users refuse to log off instead of disconnect
You have a management/HR issue here rather than a technical one. If people staying logged on are affecting other people's work (by reducing performance unnecessarily) then there are only really two solutions:
Make it a technical issue and arrange for an increase in resources (more RAM, SSD in place of spinning metal, ...) if possible so that the issue goes away that way. Of course there are limits to what you can achieve by throwing new resources at a single machine but it might work.
Persue it as a people management problem and find some way of encouraging (or failing that enforcing) appropriate discipline. Of course this may be outside your direct responsibility so it could be quite tricky depending on your office's politics...
We had a similar problem with people never restarting their desktop machines meaning that security updates were sometimes queued for months. Security policy stated that "patches for know security issues should be installed in a timely manner, immediately in cases where exploits already exist in the wild, unless sufficient mitigations can be proven" so in the end it was simply enforced by group policy: all non-server Windows machines will reboot overnight on a Tuesday if there are pending updates, no exceptions. If anyone argues against this there are two easy counters: if we don't follow that policy we'd loose our ISO-this-that-and-the-other accreditation next time there is any audit which is important to the business, and our contracts with our clients make statements about security policy too (as we sometimes handle their data we have to assure them that their data is safe with us) so without that enforcement we are in breach of some very expensive contracts.
> Users insist on using Lync 2013 instead of Lync 2010 (Lync 2013 is a notorious resource hog)
Is there a specific reason why, other than they want newer shinier things? If there is a feature they genuinely need then there may be little you can do about this angle.
If a chat application is the main resource problem, I wonder if there is a way to kill just instances of that program in the idle sessions instead of killing the whole sessions?
> they lose work every time I reboot without sufficient notice, ie to reboot at all they need to know by "noon" of that day
You don't state the nature of the work so this is very dependent on what that is, but they may be failing at due diligence (i.e. not doing their job properly).
If they are not saving documents regularly then they are putting their work at risk, not you. What would happen if there was a power out or other fault that took the server down? Would they blame you also?
Of course if they are actively working at the time of the reboot or are needing to leave long running processes going unattended then there might be a genuine scheduling issue that you need to work out between you.
With the risk of sounding like a sales person - we use ShutdownPlus Rolling Restart . We've got it set up to try and restart our servers every night. It works pretty good - you can set it up to only restart servers after everybody has been logged off. It'll restart the loop if someone is still using the RD server a X number of times. The tooling can also log off users for you, if you'd like. Or even powercycle your VMs @ ESXi.
I'm using it with a couple of GPO which logs off disconnected users after a couple of hours. And disconnects active sessions after a certain idle time of course. It's a pretty graceful method, aside from the occcasional rogue program which keeps sessions from closing. We've worked around those though. The way we've got it setup now every server tries to reboot every hour from 22.00 to 7.00, untill it succeeds of course. Effectively, users reboot at least 2/3 times a week, which is fine by me.
Unfortunately this isn't a free program, but it does the job pretty good. I'm implementing a powershell script which'll hopefully update the servers before rebooting as well.
Straight answer to Microsoft server reboots YES/NO. Oh if life were that easy! It does depend on the applications running on the server. But here is a simple guide but NOT a hard and fast rule.
Physical Server Running Windows server **x Version** (Auto Reboot & Schedule) 95% can be rebooted once every fortnight without any real concerns. (Check the patch being applied is relevant and required). Ensure you fully test the patch on your test server(s) before releasing to the live/production systems.
VMWare Virtual Servers running Windows Server x Version - Reboot once a fortnight (See above comment if patches are applied)
Physical VMWare Server NEVER/Rarely and only if required never scheduled. (Normally very Stable if kept up to date) VMWare patches/updates will require a reboot.
VMWare running Windows SQL (Limit reboots, Apply Windows patches MANUALLY ONLY! restart IF patch requires it and then only after you have stopped ALL clients connections) Check connections have reconnected once server is back up. SQL Servers can take quite a while to reboot, so plan this out of hours.
Reminder: Before making ANY changed to a VMWare (Windows Server) SNAPSHOT it! if the system crashes after Service Patch or updates applied or applications fail to start you can quickly get the server backup and running with limited down time. Remember to make notes of errors so you can find the fix do not leave the system alone because it failed as it may fail in the future.
Hope that helps and goes a small way to clear things up.