Ubuntu 16.04 has been hanging on me ~1x per day. This happens when I am in the middle of web browsing or using a desktop application, not when booting. When it does, the mouse pointer will still move freely, but clicking or keystrokes have no effect on my system until I do a hard reboot.
What is the best way for me to debug this?
Here is some information:
selah@selah-Precision-Tower-5810:~$ uname -a
Linux selah-Precision-Tower-5810 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Also, in case it is relevant, I have a "very big" monitor, a Dell 42" at 3840x2160 resolution.
selah@selah-Precision-Tower-5810:~$ lspci | grep VGA
03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)
UPDATE:
Following Artyom's advice I found the following message in my error logs:
Apr 27 09:47:25 selah-Precision-Tower-5810 kernel: nouveau 0000:03:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Apr 27 09:47:29 selah-Precision-Tower-5810 kernel: nouveau 0000:03:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Apr 27 09:47:33 selah-Precision-Tower-5810 kernel: nouveau 0000:03:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Which has let me to this bug which describes similar behavior: https://bugs.freedesktop.org/show_bug.cgi?id=93629
This is the bug of Nouveau video driver (kernel extension). For details, check the bugs at bugs.freedesktop.org or at GitLab, especially: #93629, #99900 and #100567 (which are related to
SCHED_ERROR
/CTXSW_TIMEOUT
).To debug the freeze, you can use Magic SysRq key, for example:
Note: Consider holding ⇧ Shift (depending on your keyboard).
Other things to try during freeze:
Note: Consider holding ⇧ Shift (depending on your keyboard).
If nothing works, you should perform a safe reboot by Alt-SysRq-REISUB, which is:
Alt-SysRq-B: immediately reBoot the system.
Note: If above hard reboot combination won't work, the freeze could be caused by defected hardware, not video drivers.
Note: If some SysRq options doesn't work, due to "This sysrq operation is disabled" error, enable by:
See: Configuring SysRq in Linux.
After reboot, check your
kern.log
for details, especially call traces generated by above kernel commands. This can help to find the right bug report for it, and find the solution. Check the followingkern.log
example.You can check the latest crash log by:
Suggested solution:
Enable persistent logging
Reboot
Make sure persistent logging is enabled by browsing
/var/log/journal
and checking if a random named directory exists.After the incident
List system boots
Extract the boot with the incident
or just
Inspect the log.