I have an Exchange 2016 Server which bsods with about 14 days inbetween. The server is virtual and exists in a clustered vmware environment with storage via iSCSI. None of the other Windows servers we have running (including the passive copy of Exchange) bsods. The passive Exchange is beeing backed up and clears the transaction-logs as it should on both the passive and active node.
- I have tried installing the latest critical patches (none of the optional yet)
- I have tried migrating the VM in question to a new host.
Here is what BSoD viewer gives me of information:
052716-21921-01.dmp 27.05.2016 10:22:16 CRITICAL_PROCESS_DIED 0x000000ef ffffe000`de10d080 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e3a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e3a0 C:\Windows\Minidump\052716-21921-01.dmp 8 15 9600 138 150 27.05.2016 10:22:47
051516-25765-01.dmp 15.05.2016 10:11:06 CRITICAL_PROCESS_DIED 0x000000ef ffffe001`0ad80900 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e3a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e3a0 C:\Windows\Minidump\051516-25765-01.dmp 8 15 9600 138 150 15.05.2016 10:11:41
042816-19328-01.dmp 28.04.2016 22:36:50 CRITICAL_PROCESS_DIED 0x000000ef ffffe001`3da4f900 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e8a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e8a0 C:\Windows\Minidump\042816-19328-01.dmp 8 15 9600 294 472 28.04.2016 22:39:45
041916-23859-01.dmp 19.04.2016 08:43:53 CRITICAL_PROCESS_DIED 0x000000ef ffffe001`23101900 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e8a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e8a0 C:\Windows\Minidump\041916-23859-01.dmp 8 15 9600 294 472 19.04.2016 08:47:04
I saw a post with the same problem on a diffrent site, but none actually answered the problem and the post aged out.
Do anyone have any pointers on how to fix this? Would I have to install ANOTHTER Exchange server and migrate into? This would be very unfortunate..
Your storage system is failing or too slow to keep up. If IO has been stalled for too long, Exchange thinks that storage is dead and kills Wininit to force hard reset.
See https://technet.microsoft.com/en-us/library/ff625233.aspx and scroll to the end. It's the same for 2013 and 2016.
I have experienced it firsthand when using Windows Server Backup to backup Exchange. When backup begins, it will do consistency check on all databases in parallel. This caused Exchange to BSoD after a few minutes when storage dropped out.
First solution is to disable ATS heartbeat to storage array https://kb.vmware.com/kb/2113956
Text is too long to copy but TL;DR: Your storage array connection may be dropped under heavy IO when ATS heartbeat of 8 seconds times out, that will cause IO timeout in VM, causing Exchange to BSoD.
Secondary solution is to add storage controllers to VM and distribute database disks between controllers. In my case, single pvscsi controller would choke badly under 6 databases, but when disks (including OS disk etc) were distributed over 4 pvscsi controllers, issues disappeared. I don't have a reference for that, just personal experience on vSphere 5.5 U3.
You can issue a command to disable the ESE forced reboot, the cause is well explained by Don's answer.
I did it lately for a customer with a single server with ESXi, as the IO was overkilling the Exchange. (its still killing it, as it take age to simply open a management console in example, but at least it doesn't reboot..)
In there you need to use the correct Exchange version.
See here for Exchange version; https://technet.microsoft.com/en-us/library/hh135098(v=exchg.150).aspx
See here for further detail; http://www.tecfused.com/2014/11/exchange-2013-dag-bsod/