Basic working system details:
I used the Ubuntu 12.04 server CD to install a server.
I have 4 disks. On all disks I did the following, similar to this howto :
- created a 2GB swap partition
- created a 256 GB /boot partition
- created a 64 GB RAID10 partition (for root)
- created a big RAID10 partition taking the rest of the space
I formatted the boot as ext3. I set up RAID10 on the root and big partitions. I formatted the root one ext4. I created a logical volume on the big one, and formatted it ext4.
The resulting system works fine, and boots fine.
Problem details:
Then I decided to document a failure procedure. As the first step, I decided I would reinstall grub.
# grub-install /dev/sda
warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
error: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
# grub-install /dev/sdb
warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
error: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
So it looks like it failed, but also seems like it gave up and didn't make changes. So I rebooted. The boot failed. It just hangs with a black screen with a blinking cursor about 4 lines down. If I boot holding down "Shift", I get the word "GRUB" to the left of the cursor, but no interactive prompt.
At this point, I used boot-repair-disk to generate this report: http://paste.ubuntu.com/966531/
Note in the above report, it says that the bootloader does not point to the correct sector for core.img. (sda is the virtual cd; sdb is the boot disk; sdc is a mirror of sdb, but boot is not mirrored, just a separate unrelated partition is there and formatted ext3; sdd and sde have space for boot but it is not formatted)
Then I booted from the Ubuntu server CD, started the rescue system, and issued the following commands, which completed without error (where sda is the virtual CD, and b,c,d,e are the disks which were a,b,c,d in the previous grub commands):
# parted /dev/sdb set 2 bios_grub on
# parted /dev/sdc set 2 bios_grub on
# grub-install /dev/sdb
# grub-install /dev/sdc
At this point, I used boot-repair-disk to generate this report: http://paste.ubuntu.com/966561/
Note that in the above report, the problem about core.img is gone. It seems to point to the correct sector.
Now if I try to boot, I get a grub prompt. If I run "set", I see that root is found and set. If I run "ls /" I see my root directory from the raid volume, including the vmlinuz kernel file. If I type "ls /vmlinuz" it says "error: file not found." It says the same error if I use the "linux" command to try to load the kernel. The vmlinuz file is not listed if I use "ls -l /".
Overly verbose details, in case you want to follow:
I noticed there is also no /boot/grub/grub.cfg, so I ran
# grub-mkconfig -o /boot/grub/grub.cfg
But the problem remains.
If I use the "gptsync" tool, there is no change in this behavior.
The boot-repair-disk won't repair the system, because it wants me to boot with an EFI enabled bios. I briefly looked into this, but I don't know how that works. I found a UEFI shell in my boot options, but I don't know anything about it, and don't see how to change the startup from there (eg. to boot the CD from that EFI shell).
I have also read this page, but Ubuntu doesn't come with the "grub" command, so I can't follow it exactly. I could simply install that command, but I am more curious to find out how the Ubuntu installer managed to install it rather than having a different setup. Did it use blocklists?
Here is the output of parted, while booted on the boot-repair-disk (where here the sdb is the first hard disk, sda when booted from disk, and "boot" changes to "bios_grub" in the 2nd paste link):
Model: ATA Hitachi HUA72303 (scsi)
Disk /dev/sdb: 3001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 2000MB 2000MB linux-swap(v1) swap1
2 2000MB 2256MB 256MB ext3 boot1 boot (this says bios_grub in 2nd link)
3 2256MB 66.3GB 64.0GB root1 raid
4 66.3GB 3001GB 2934GB data1 raid
Here is an unrelated super old virtual machine for comparison (for anyone unfamiliar with boot-repair-disk): http://paste.ubuntu.com/966799/
Here is the latest paste from the problem system, after running the above grub-mkconfig, and also setting "bios_grub" back to "boot". http://paste.ubuntu.com/966808/
Comparing the two, this looks interesting:
sdb2: __________________________________________________________________________
File system:
Boot sector type: Grub2's core.img
Boot sector info:
Mounting failed: mount: unknown filesystem type ''
md/bcserver8:0: ________________________________________________________________
File system: ext4
Boot sector type: -
Boot sector info:
Operating System: Ubuntu 12.04 LTS
Boot files: /boot/grub/grub.cfg /etc/fstab /boot/grub/core.img
It looks like the raid has the boot files, and the sdb2 is not formatted. (despite this, the system booted before running grub-install). From the rescue CD, "mount -t ext3 /dev/sdb2 /boot" fails. But it makes sense that this would confuse things, since grub uses partition 2 explicitly (the 2 in the parted command that set bios_grub on).
So I did something like this:
# mkfs.ext3 -L boot1 /dev/sdb2
# mv boot boot_on_root
# mkdir boot
# mount /dev/sdb2 boot
# rsync -avHP boot_on_root/ boot/
# parted /dev/sdb set 2 bios_grub on
# parted /dev/sdc set 2 bios_grub on
# grub-install /dev/sdb
# grub-install /dev/sdc
Then rebooted, and I have the black screen again, no prompt. http://paste.ubuntu.com/966848/
So at this point, my guess is that when bios_grub is set, grub is not installing to the MBR, and not to the ext3 file system on ext3, but on the partition itself, as if it was EFI... which would obviously mess up the ext3 file system there. Aand from my brief reading about EFI, it sounded like EFI assumes the first partition is the boot, but in my case the first is swap, and also it should then be FAT rather than something unmountable... so since that makes little/no sense, I'm still completely lost without a clue. [EDIT:now I have a clue... skip down a bit for update]
And now when I click repair in boot-repair-disk, it asks something else. Last time the error was hidden under the window and I had to drag the other away to see it. This time the main window is gone, and the new window says:
GPT detected. You may want to retry after creating a
BIOS-Boot partition (>1Mo, flag). Do you want to continue?
So I clicked yes, and it said it repaired successfully, and created another paste: http://paste.ubuntu.com/966862/
But I still have a black screen with a blinking cursor.
Now my theory is that boot got overwritten by a non-fat non-EFI thing which is just grub code that would have otherwise been in sectors 0-63 before. I luckily ran into a very clear statement on this page, which probably completed my understanding of what all this means. And then after I found that, Jeremy posted an answer which if true, confirms that this is the missing key concept. http://blog.psych0tik.net/2011/08/grub-embedding-blocklists-and-bios_grub-partitions/
Questions:
What is going on? Why should grub fail to boot? Why does it say "file not found"?
Why doesn't grub want to install without this setting I set with parted (which was not set by the Ubuntu installer)? I thought all I needed to install it was a separate /boot that is not in LVM nor software RAID, since my root is in RAID and the partition table is GPT.
How does the Ubuntu CD installer install it without this problem, and without the bios_grub setting?
I would also consider using EFI. If this is a good idea, and there is a standard way to set it up, I am always up for learning new things.
The quickest answer that would make me happy, even without answering all my questions, would be a set of commands that I could run from the rescue CD to fix the bootloader in the same manner that the install CD did it. It would be also extra nice if I could run them with the booted system, instead of the CD.