I created a MAAS server and a node. When I boot (through PXE) the virtual machine corresponding to this node, it starts loading Ubuntu. A few minutes later, it is stuck in what seems to be an infinite loop, printing regularly the following message:
[ 239.617011] INFO: task touch:1060 blocked for more than 120 seconds.
[ 239.618857] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
The same message (only the number at the beginning change) is repeated, over and over. I kept it running for probably more than half an hour, and the result is the same.
This results in “Failed tests” status for the concerned node.
The error appears both when using default installer and fast installer.
The file /var/log/maas/maas.log
contains no errors.
Where can I gather more information about the cause of this issue?
I encountered the same message and issue when I used PXE to install Ubuntu, which happened on precise, quantal...trusty.It could not be reproduced every time, and the reproducing rate is about 1 out of 10 or smaller. So I do not think this is relvant to maas.
Could you reproduce this issue every time?
This looks like a kernel issue. Please refer to this http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
and try:
add there 2 lines:
save, exit and reboot.
@tai271828's answer does not work for me but gave me an idea instead:
To make hung_task became a panic, so kernel can reboot the whole system while encountering a panic.
Following are kernel parameters for rebooting if task hung appears:
When panic, Kernel will reboot system at 3 seconds
Kernel will panic if task hung
Task hung will timeout to 30 seconds
You can make these parameters persistent in: /etc/sysctl.conf
If you are also having this trouble during pxe installing, you can add these kernel parameters in the boot menu option after "APPEND", e.g.: