Recently, I updated Ubuntu 10.04 Server LTS. The server reports itself as
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"
There are multiple machines here with the same install and hardware, and all work fine except for one. This one has problems during the boot process:
- When GRUB loads, it stops at the main menu and waits for keyboard input.
- After pressing a key, GRUB reports
error: no argument specified
and continues (either with or without keypress). - Network doesn't come up: eth0 and eth1 are redefined as eth4 and eth5, and eth5 doesn't come up.
ifup eth5
works fine at the command line.
I'm still working on the last - but I suspect the first two are part of the same problem. Checking differences between a working /boot/grub/grub.cfg
and this one don't turn up signifcant differences: only (previous) kernel versions and root UUIDs.
A significant difference between this and other similar (working) systems in our data center is that this system is a Dell R710; others are Dell 2950s. We've working R710s all over that are running Ubuntu Server 10.04 but none updated as currently as this one.
This problem has just recently surfaced; I've updated and rebooted three systems including this one in the last 48 hours. Only this one had problems - but only this one stopped at the GRUB menu. The others all come back automatically - so did I just miss seeing this error perhaps?
Grub is version 1.98:
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-===============================-=================================-============================================
ii grub-common 1.98-1ubuntu12 GRand Unified Bootloader, version 2 (common
ii grub-pc 1.98-1ubuntu12 GRand Unified Bootloader, version 2 (PC/BIOS
I found this thread on the Ubuntu forums but it seems to be related to Natty (11.10). Some reports suggested this bug showed up in an upgrade from Lucid (10.04) to Maverick (11.04) or from Maverick to Natty; I'm only running Lucid LTS. This thread is also related to GRUB 1.99 and this version is 1.98.
The thread suggests that the line
search --no-floppy --fs-uuid --set 857d5af9-23cd-4d9b-908b-cc075e866758
should instead be
search --no-floppy --fs-uuid --set=root 857d5af9-23cd-4d9b-908b-cc075e866758
Can someone help resolve this issue? Am I just seeing this error here because GRUB stops at the menu? Why would GRUB stop at the menu on one system and not others?
UPDATE: While the actual version is 1.98 the real version of GRUB is GRUB 2; this also holds for version 1.99.
The more I think about this, the more convinced I am that other systems are probably reporting the same error but continuing on; the real problem here (with GRUB anyway) is the fact that the system stops at the menu. I can't test the other systems for comparison as they are back in use already.
UPDATE: I rebooted the machine again. The message first shows up (in bold) before the menu appears. After selecting the appropriate option, it shows up (on a clean screen) as described above with a Press any key to continue...
message. After about 5 seconds, the screen clears again and booting continues (minus networking).
The GRUB2 installation reports itself as version 1.99~rc1-13ubuntu3
which I think is different from what the package reports. I'm not sure if this is just the way it is or if things have been changed.
UPDATE: More on the two versions of GRUB2 mentioned. The first is the installed package which is 1.98-1ubuntu12; the second is the version reported at the GRUB2 menu which is 1.99~rc1-13ubuntu3. According to the changelog for the GRUB2 package, version 1.99~rc1-13ubuntu3 was introduced on April 21, 2011, in Natty (note, not Lucid) and version 1.98-1ubuntu12 was introduced June 17, 2011, in Lucid Updates. This system has never had Ubuntu Natty or Oneric on it; it did however, have Red Hat Enterprise Linux 5 on it. However, the version of GRUB used by RHEL 5 (at time of writing) is 0.97-13.5 which is GRUB Legacy.
So where did version 1.99~rc1-13ubuntu3 come from? I strongly suspect that this is the source of my problems, and a complete reinstall of GRUB2 to the MBR would fix my problems. However, I've had to return the system to full operation; thus, testing will have to wait.
The problem appears to be that there are two different versions of GRUB installed. One is active and loaded into the MBR; the other is "in the wings". The files in
/boot/grub
are populated from elsewhere and are not owned by any package. The files in/usr/lib/grub/i386-pc
are supposed to be the same, but in this case are not.I created this program to check for differences:
On other (working boot) systems, I get this kind of output (abbreviated):
...snip...
On the system that is not working as expected, the following output is seen from the same script:
...snip...
To fix this, I copied /boot/grub and reinstalled grub-pc thusly:
This works and the packaged grub then replaces the original in /boot/grub and, presumably, in the MBR.
I've not yet tested the results of this in a system reboot; caveat lector.