rlucas.net: The Next Generation Rotating Header Image

Linux Software RAID and GRUB – Recovering From a Failure

A couple of weeks ago, I had the bright idea to move an internal server here at Voyager from my office into a data room. I issued the customary sudo shutdown now and proceeded to move the box.

I was dismayed not to see it boot right back up afterwards. Ouch! I had specifically configured it with software RAID because the hard drives in the old spare box were a bit dodgy. Turns out that was a good idea, since one of the drives had failed (apparently the one that had the GRUB bootloader appropriately loaded on it).

I was faced with two 20 Gig HDDs, only one of which worked exactly right, and a computer which failed to boot off the remaining HDD. A quick trip to uBid and $70 later, I had two 60 Gig drives ready to use (20 Gigs are darn near impossible to find). I knew enough about partitions and whatnot to get this much done:

– Got a bootable rescue CD with a good set of utils (PLD Linux) downloaded and burned (it’s good to have one of these handy, rather than trying to burn them as-needed — see below under “Tricky stuff” for my unfortunate experiences with that).

– Trial-and-errored the two old HDDs to find which one was failing. Removed the bad one and replaced with New HDD #1.

– Used cfdisk, the curses-based fdisk, to read the exact size and type of the good RAID partition from Old HDD. Used that information to create an identical physical partition at the beginning of New HDD #1, including the same (Linux Software RAID) partition type.

– Used dd, the bit-for-bit copier, to copy the verbatim entire partition from the Old HDD’s main partition, /dev/hda1, to the New HDD #1’s identically situated partition, /dev/hdc1, both of which were unmounted at the time.

– Swapped out the Old HDD with New #2, and repeated the last couple steps to make a new partition on New #2 and copy New #1’s first partition to it.

– Used mdadm --assemble to put the two identical RAID partitions — /dev/hda1 and /dev/hdc1 — back together into a RAID array and let it re-sync them until mdadm reported them to be in good health.

– Used GRUB to re-install the MBR on both HDDs. This was a damn sight harder than it sounds (see below).

All in all, it was a far cry from sliding a replacement hot-swap SCSI into a nice hardware-based array — but at $70, a fraction of the cost, though hardly timely (my use of this server is as a document archive, web proxy, cron-job runner, and general workhorse for background processing and speculative projects for automated information-gathering tasks — none of which are mission-critical for us at Voyager).

Tricky stuff:

– Windows XP apparently doesn’t come with ANY ability to burn ISOs. WTF, Microsoft? Operating system ISOs are just about the only legal thing I have ever wanted to burn to a CD, and that’s the one thing you won’t support? (Well, duh, really.)

– The latest Knoppix (5.0?) just plain barfed. It may have been the speed at which the dodgy ISO burning software I downloaded burned it (errors?). In any case, burned about an hour of my life trying different “nousb” and similar switches to no avail.

PLD Linux‘s rescue disk was small and booted perfectly (though I took care to burn it at a low speed).

– BLACK MAGICK: When booting from the rescue disk, to get mdadm to appropriately deal with the raid, there weren’t any md devices in /dev on which I could mount the RAID. I needed a couple of times to create the node /dev/md0 by issuing the commands:

mknod /dev/md0 b 9 0  

Which, as I understand it, is “make node /dev/md0, type block, numerical type #9 (the magic number for RAID?), and the 0th such block.” Then, since mdadm refused to automatically find and mount the drives for /dev/md0, I had to find the UUID for the RAID volume thus:

mdadm --examine --scan  

And then copy the UUID (thanks, GNU Screen!) into the command:

mdadm /dev/md0 --assemble --uuid=<WHATEVER>  

– Getting GRUB installed on the hard drives was, in the end, easier than I thought but was rocky due to the complexity of the issues involved and me not understanding them fully.

If you search for “software raid grub” you’ll find a number of web pages that more or less get you there with what you need to know.

For me to get GRUB to work, I did the following.

– First, I had the /dev/md0 partition (the “RAID partition”) holding my / (root) partition, with NO separate /boot partition. That means I had to make each of /dev/hda1 and /dev/hdc1 (the “RAID constituents”) bootable. Much of what you read follows the old advice of having a separate / and /boot, which I did not have.

– Second, I had to boot from the rescue CD, get the RAID partition assembled, mount it, and chroot into the mount point of the RAID partition. Like:

mknod /dev/md0 b 9 0 mdadm /dev/md0 --assemble --uuid=<WHATEVER> mkdir /tmp/md0 mount /dev/md0 /tmp/md0 cd /tmp/md0 chroot .  

Note that since the partition names and numbers were the same on my New #1 and #2 drives as they were on the old ones (hd[ac]1), there’s no problem with the old legacy /etc/mdadm/mdadm.conf and it can tell the kernel how to assemble and mount the RAID partition (important for below).

– Then, once chroot‘ed into the RAID partition, I ran run-time GRUB (“run-time GRUB” being when you run grub as root on an already booted machine for purposes of installing stuff or whatnot; this is opposed to “boot-time GRUB” which looks pretty damn similar but is what is run off of the master boot record — MBR — of a bootable partition onto which GRUB has been installed) off of there, which used the existing Ubuntu menu.lst file. For some reason, that file ended up binary and corrupted. Therefore, I had to scrap it and come up with a new one. Here’s the meat of my new menu.lst:

# Boot automatically after 15 secs. timeout 15  # By default, boot the first entry. default 0  # Fallback to the second entry. fallback 1  # For booting Linux title  Linux root (hd0,0) kernel /vmlinuz root=/dev/md0 initrd /initrd.img  # For booting Linux title  Linux root (hd1,0) kernel /vmlinuz root=/dev/md0 initrd /initrd.img  

– Using that menu.lst, the commands I entered in the GRUB shell were as follows:

device (hd0) /dev/hda root (hd0,0) setup (hd0) device (hd0) /dev/hdc root (hd0,0) setup (hd0)  

The rationale behind this is that the first three lines install the current menu.lst (that is, whichever one it finds in /boot/grub/menu.lst) onto the MBR of /dev/hda, the first bootable HDD, and the second three lines install onto the MBR of /dev/hdc, the second HDD, but fake out the installation there of GRUB to act as though it’s on the first, bootable hdd (hd0).

Do you get it? After chrooting, I fired up run-time GRUB, which automatically looks in its current boot/grub for menu.lst. I told it to put MBRs on both /dev/hda and /dev/hdc to make boot-time GRUB behave as specified in the menu.lst. The menu.lst lines say “use hd0,0 (e.g. hda1) as the root directory, find the kernel (vmlinuz) and initrd there, and once the kernel get loaded, tell it to use /dev/md0 as the real root directory, which it can do because it reads /etc/mdadm/mdadm.conf or /etc/mdadm.conf to figure out its md devices.

What puzzled me at first was, "how does the kernel get loaded when the root is /dev/md0 and you obviously must run mdadm in order to assemble and mount /dev/md0?" The answer is that when you do the installation commands listed above, it tells the boot-time GRUB to act as though (hd0,0) (AKA /dev/hda1 or /dev/hdc1, depending on whether the BIOS points to hda or hdc for booting) is its root directory. So, boot-time GRUB, all the way up through the loading of the kernel, treats /dev/hda1 (or hdc1) as its root, and only at the stage where the kernel is loaded enough to check mdadm.conf and run mdadm does it then do a little "chroot" of its own. If I've got this completely wrong, please [rlucas@tercent.com email me] and tell me I'm a bonehead (and include a link to your, more informed, writeup).

There's an elegance to the whole Linux software RAID thing, but it took a darn long time to comprehend.

Leave a Reply

Your email address will not be published. Required fields are marked *