I am trying to debug an unsecessful/hung-up system start (upstart) on 14.04.2 LTS. root is an ext4 filesystem in a luks container. filesystems are in clean state.
The boot process stops after upstart-socket-bridge, (not necessarily after that specific service, e.g. when cups-daemon was installed, it stoped after that). init -v
isnt very helpful either. The only log entry that isnt merely logging the start/stop of various services is one about udev right before init.
Begin: Running /scripts/init-bottom ... done.
udev exit failed --rc=2
(Edit) Remounting the root rw initially appeared to be always leading to a clean boot, but fact is, its kinda unpredictable and i had failed and successful boots either way. wut?
Observation: Everything appears to be fine, the system simply doesnt remounts the root writable or continues the boot.
Q: How do i figure which service is at fault for getting the boot process stuck?
Update: Spawning a second shell via getty one can run initctl list
after it hangs up, these are the running jobs
mountnfs-bootclean.sh start/running
udev start/running, process 438
upstart-udev-bridge start/running, process 432
plymouth start/running, process 122
resolvconf start/running
ssh start/running, process 767 <-- this one was manually started
mountall start/running, process 337
mountkernfs.sh start/running
mountnfs.sh start/running
bootmisc.sh start/running
upstart-socket-bridge start/running, process 745**
cryptdisks start/running
mountdevsubfs.sh start/running
mtab.sh start/running
network-interface (lo) start/running
network-interface (eth0) start/running
plymouth-ready (startup) start/running, process 315
plymouth-upstart-bridge start/running, process 316
mountall-bootclean.sh start/running
network-interface-security (network-interface/eth0) start/running
network-interface-security (network-interface/lo) start/running
Update 2:
- Reinstaling upstart and all its dependant packets (is a pain and) has no effect.
- Using the second console, i can just use
init 5
to get the stuck system to continue boot normally. - the system now got stuck even if i manually remounted the root rw (or used the rw kernel parameter) - my initial observation that forcing the root writable works around the issue is incorrect
Workaround:
It appears to be ureadahead
s fault. Purging it resulted in 5 clean boots withouth any issue. I'll just leave the question (and the 100 extra rep) open for anyone interested or knowing an answer for the original question: How, if not by random trial, i could have figured this out.
For reference, the (unsuccessful) debug steps i tried, which my however be useful to others:
sash
, then change your kernel command line (use the e key in grub or edit grub.cfg/cmdline.txt) and addinit=/bin/sash
, reboot, examine the situation on that shell and only then useexec init
to continue bootinginit
with the-v
switch to increase loggingmount -o remount,rw /
before executing init) - this allows for more logging/var/log/upstart
getty -n -l /bin/bash 38400 tty2 &
- this helps examining the status the system is in (e.g.ps -Af
,iotop
)initctl list
to figure out which services are in which state