I've only basic Windows Server knowledge and I've inherited responsibility for a Terminal Server installation with 20-30 concurrent users (Windows Server 2003).
There are intermittent problems with performance - ultimately due to to the low spec of the server I think (triple core, 4Gb memory using PAE). I'm trying to see if I can keep things running before a major upgrade later on this year.
One thing I've noticed is that processes from various sessions often consume 100% of a core. I think the freezes happen when this occurs on several sessions at once. Is there anything I can do to limit the CPU use of individual sessions? Alternatively is it possible to reserve one core so that it is not used by individual sessions but is available to handle logins etc instead?
You would want to look up resource quotas (memory and cpu quotas) like in http://kurtsh.com/2007/07/16/howto-throttle-the-cpu-on-desktops-terminal-servers/ or http://technet.microsoft.com/en-us/library/cc732553.aspx although they may be particular to Win2008. This should give you a starting point to search, though.
When we ran terminal services, we found that certain applications and practices could mitigate the type of issue you see, namely the resource hogging (we were running back on 2000, though...things seemed to have improved over time.)
Some users like screen savers on the terminal. Restrict them to not allow.
Create policies for idle logoff.
Monitor certain habits such as running flash animations in a loop. We had someone drive a terminal to the ground because they had The Weather Channel on a radar loop that leaked memory.
Use performance monitor to check for other constraints; poorly written AV software can bog you down when it launches a per-user monitoring instance, for example.
This is one of the few times where a fragmented disk can be bad since you have ~25 users with their own caches of tiny files scattered around the disk. Check for fragmentation and do an off-hours cleaning.
~25 users on a system like you described will indeed bog it down; that was about our limit on systems with terminal services before it affected other users. You can't expect miracles in tuning it at a certain point.
What RAID level are you running? Slow disk subsystems can cause bogdowns. Especially if you have a morning rush of logins. Upgrade that and you may see a decent speed boost, although you said you're trying to nurse this one to last until a major upgrade...
Monitor for unauthorized software installations. Doesn't take much for a certain type of software to hog everything.
Use a utility like procmon and the procmonitor from sysinternals (free) to find what could be bogging your system down. There's no reason it should be freezing; it may be slow, but not locked up. Those utilities may help narrow down the root cause. It's been a lifesaver for us at times when we'd otherwise be left scratching our heads.
That's about all I remember off the top of my head...hopefully others will have better advice on alleviating your resource shortage on the system.
NOTE - if you're having this freeze when there are low numbers of users as well as high numbers, there could be a particular application or action causing the problem. In addition to the suggestion of the sysinternals tools, I'd start looking for patterns; anything in the logs? Who is logged in at the time, and what were they doing? Can you track what users are doing, if the screen freezes and leaves up activity? What time of day does this happen? Can you get users to send you a note of what they were doing when it goes splat?
We had an issue with servers spontaneously rebooting on our TS cluster. Even Microsoft was at a loss to explain it. Turned out to be a particular application running that shouldn't have caused a reboot, but did. Removed it from the servers and our reboots went away. But it literally took months to figure it out!