I have an Dell R510 running esxi 5.1 with 16GB Ram; 1 CPU (Xeon L5520 @ 2.27HGz; 4 core) running a single Server 2012 Machine. I came in this morning to this
core dump. I held down the power button and rebooted. It came back to a "VMWARE Recovery" screen and after 10-20 minutes, I came back and it was back to its usual screen (Vmware info; ip address management info, etc), and the single server on the machine had booted itself back up. The issue is that ever since, it seems incredibly sluggish. The Server seems OK; the performance monitor never has anything maxed out. The most used it seems is network; I have only a single NIC in use (we have about 10 users total access data files on the server and it is running DNS, AD and DHCP services). I've replaced a switch in between thinking it may have been the culprit, but still have the same results. Sporadically, my RDP session to the server will be lost but eventually come back up.
Any ideas on where to look for the culprit for the slowdown? Any ideas on what I could try to increase performance? Is a single NIC enough for 10 users filesharing (we use Quicken files mostly, and office xls/word) nothing too intensive.
I tried once to add another NIC and "team" them, but it broke everything and I had a hard time getting back to change things since the "teamed" nic decided to use dhcp and I couldn't get to the dhcp server to find the address (since the server is the dhcp server).
Thanks for any ideas.
UPDATE
I have since taken the server off of the network and moved it to another spot. Since it's been off the network, it has been running fine (n.b. there was one more core dump same as in pic before I took off network). Could this be the result of an external mapped iSCSI datastore? I had 2 on the host and there were errors in vmkernel log about it not being able to log on, but at least some of the errors were known OK:https://kb.vmware.com/s/article/1031221 but not sure about others. Here are a couple of shots of the vmkernel errors: https://imgur.com/a/DOA3umn
Also, here is a pastebin of the latest from vmkernel.log. I also have syslog, usb, vmkeventd, vmksummary, vmkwarning, vprobed, and vpxa.log files I can upload. https://pastebin.com/rmp3k1G5
Update 2
I've left the machine running over the weekend, and as of now, it's still running just fine. No panic, no crash. Wondering if somehow the external NAS devices mapped as datastores via iSCSI could be the cuplrit? Especially concerning since I have them on a couple of other hosts as well...
Stack trace ends with megasas that means command to your MegaRAID (you probably have some RAID array) may have unexpectedly failed or it's a driver bug. I'd recommend updating MegaRAID firmware together with ESXi driver (and probably ESXi itself while you're at it). ESXi 5.1 is ancient but you can still get updates and async drivers from VMware site.
My gut feeling says that something is wrong with storage - this could also explain sluggishness, caused by very high disk latency (you should see it in PerfMon or Resource Monitor or vSphere Client monitoring). Check RAID and disk status in megacli/storcli or reboot to WebBIOS to check array and drive status.
My first guess would be "storage".
How are disks and volumes configured? Which kind of RAID is in use?
I'm pretty sure you have a rebuilding or degraded disk array; that would explain the slowness.