I have a development box that is (nearly every night) experiencing the same symptoms:
Can't RDP due to the "Black Screen of Death"
Can't issue remote powershell commands because the "Remote Registry" service is stuck in a "Starting..." state. Can't kill processes with "Stop-Process", "Get-Process" returns error:
Get-Process : Couldn't connect to remote machine.
- Can't restart the RDP service (remotely) because the "Remote Desktop Services UserMode Port Redirector" service is stuck in a "Stopping..." state and you get the following error when you attempt it:
Error 1061: The service cannot accept control messages at this time.
Can't use "Invoke-Command" due to security restrictions.
In fact if you attempt to reboot ANY service on that box remotely is get's stuck in the "Starting..." state.
(SQL Server) Still able to connect but is experiencing various issues related to SQL Agent
I can still reboot the box via AWS but it'll just do the same thing the next night, the event log doesn't contain anything that indicates what is occurring on this server. I do see these sort of messages in the system log:
The server {XXX} did not register with DCOM within the required timeout.
Don't really tell me allot of than my server is "broken"...
I'm not sure how one can identify the root cause of this issue. Has anyone experience this sort of issue before, if so what solution did you come to?
Update 1
The issue got significantly worse and developed into a situation where I was being booted out of RDP every 15 minutes and needed to Force Stop the instance constantly, I replaced the server with a new one using the old data drives(but a new C drive) and still experience this issue on my new server every other day.
I've enabled remote powershell and can use that even when I can't RDP, I will gather performance metrics and attach to this ticket.
Update 2
Ok I'm making progress, remote straight up freezes when I try to make a remote connection to the system that is locked up, the following command(powershell) just never completes:
$session = New-PSSession -ComputerName DXYZServerNameHere
I enabled the local admin account on the server so I can connect to it without running against my centralized AD server, that doesn't work either, still get the black screen of death.
I have a monitoring product running against the server, sometimes it stops reporting when the server becomes unavaliable, other times it's keeps reporting mostly idle numbers (CPU under 10%, plenty of memory, no super heavy HD usage).
One of my co-workers figured out you can restart the server from the "Server Manager" on another system, doesn't really fix the issue though.
I experience this issue less frequently on other servers, generally I just reboot them to fix this short term.
Sort of out of ideas at this point, I'll continue to search around and see if anyone else has any ideas...
0 Answers