Tuesday, April 14, 2009

Free RAID!

Hey all,

I have for some time wanted to dabble in the dark arts of RAID, but have never had the money to purchase hard drives for the purpose. Recently, thanks to tax handouts by our government to stimulate the economy, I purchased a brand new system that featured, among other things, a one terabyte hard drive. While one hard drive is nowhere near enough to RAID, it does mean I have enough space for the next best thing - RAID within a VMware appliance. While I wouldn't get the speed one would expect from a real world RAID, it does give you a glimpse into the often murky world of maintaining RAID arrays - plus some (free) VMware products will allow you to see the array doing its work in real time. Did I mention you could have all of this for free?

For those of you who are wondering what I'm babbling on about, here's a quick overview:

RAID stands for "Redundant Array of Inexpensive Disks". It is a way of either spanning data across several drives (to give you more space), saving the same data to several drives at once (for redundancy), storing parity data for error correction either on one drive or spanned across the array, or any combination of the above.

Now, let's get the ball rolling. For this little project we'll be using OpenBSD as it has a well documented out-of-the-box software RAID solution, albeit with a little fiddling.

Go to VMware's website and look for VMware server. You'll have to register for the website but once you do, you'll be able to easily register for any other free VMware product. Get your license key and download. VMware Server 2 has a web-based interface whereas Server 1.0 has the older graphical interface. The latter is best for a standalone machine, so get that. Now, go to the OpenBSD website and download the latest version. Look for "install44.iso" - this contains all of the packages you need.

Now, create a new virtual machine and select "Custom", with "Other" as the OS. Name it something memorable, like "OpenBSD RAID". Now, give it one IDE hard drive and make it a gigabyte - you will need to compile the kernel later. Set it for NAT networking - much less stress in terms of networking unless of course the network you're on is also NAT based. Before you fire it up, edit the virtual machine.

We're going to use SCSI for this as it allows many more hard drives than ATAPI/IDE does. Add as many hard drives as your heart desires and space allows - just be aware that you don't accidentally put one on the same ID as the controller or the VM won't even turn on. If it does put it on either SCSI 0:7, SCSI 0:15, SCSI 0:23, etc edit it and move it to the next address. For this example, I have put in 7 hard drives of 512mb each.

Now edit the CDROM and click "Use ISO Image" - point it to the OpenBSD ISO you just downloaded. This will allow you to install OpenBSD.

Now, fire up the VM. You'll be greeted with a prompt asking to Install, Upgrade, or Shell. Type i and press enter.

We're going to install OpenBSD on the first hard drive as a master copy. The following will create a partition on the first hard drive:

Available disks are: wd0 sd0 sd1 sd2 sd3 sd4 sd5 sd6 sd7.
Which one is the root disk? (or done) [wd0]
Do you want to use *all* of wd0 for OpenBSD? [no] y (enter)

Initial label editor (enter '?' for help at any prompt)
> a a (enter)
offset: [63] (enter)
size: [#########] 900m (enter)
FS type: [4.2BSD] (enter)
mount point: [none] / (enter)
> a b (enter)
offset: [########] (enter)
size: [#########] (enter)
FS type: [swap] (enter)
> q (enter)
Write new label?: [y] (enter)
Available disks are: sd0 sd1 sd2 sd3 sd4 sd5 sd6 sd7.
Which one do you wish to initialize? (or 'done') [done]
The next step *DESTROYS* all existing data on these partitions!
Are you really sure that you're ready to proceed? [no]
y (enter)

A little wait while the root partition gets formatted, then you'll be greeted with the network setup. When greeted with the computer name, just name it whatever you want. Best to use DHCP for the network. It will then tell you what the VM's IP address will be for all perpetuity (unless you move the VM to another computer) - note this for later when we log in with SSH. You can give it a domain if you like, or just leave it with the default.

With the network config out of the way, we can now start installing packages.

Password for root account? (will not echo) ******** (enter)
Password for root account? (again) ******** (enter)
Where are the install sets? (cd disk ftp http or 'done') [cd] (enter)
Available CD-ROMs are: cd0
Which one contains the install media? (or 'done'): [cd0] (enter)
Pathname to the sets? (or 'done') [x.y/i386] (enter)

Now it will give you a list of packages to choose from. I recommend against installing anything like X or anything of the like until you have built your RAID array as you will need as much space as you can get for building the custom kernel. In that spirit, type -gamexy.tgz to remove the games package (change xy for the major and minor version number), then type 'done' and press enter. Press enter again to begin installation.

Once all packages have been installed, Type 'done' to indicate that there are no more packages to install.

Start sshd(8) by default? [yes] (enter)
Start ntpd(8) by default? [no] (enter)
Do you expect to run the X Window System? [yes] n (enter) (if you're going to run X, say yes to this. It changes a system setting that allows the X window system to detect hardware.)
Change the default console to com0? [no] (enter)
What timezone are you in? ('?' for list) [Canada/Mountain] (enter your timezone here) (enter)

After this it'll finalise settings and make the new system bootable. Type halt and press enter to reboot when it says so. No need to remove the virtual CD - VMware by default is set up so it boots off the hard drive first, but will skip it for the CDROM if there's nothing on the hard drive to boot off.

The system will boot up and come to the login prompt. Once this comes up, minimise VMware server and fire up your favourite SSH client. Login as root and the password you nominated during setup.

First thing you want to do is recompile the kernel with the RAID stuff. This is one of very few occasions to recompile an OpenBSD kernel - RAIDframe isn't included in the stock kernel as it makes the kernel too big for a stock release, and few would use it anyway. Go to the OpenBSD website and locate your closest mirror. Find the kernel source for your version and download it (it'll be called sys.tar.gz - do not get src.tar.gz as this will give you the source code for everything but the kernel!).

Now, we unpack it. Best to unpack it in /usr/src so that it drops it off in /usr/bin/sys.

Now we create a custom configuration file. OpenBSD documentation recommends creating a "wrapper" configuration file that uses the defaults from the GENERIC config file. There should never be a reason to alter the GENERIC config file. Type the following:

cd /usr/src/sys/arch/i386/conf/

include "arch/i386/conf/GENERIC"

pseudo-device raid 4

cd ../compile/GENERIC.RAID/
make clean && make depend && make
mv /bsd /bsd.old
mv bsd /

In one fell swoop, we have configured, built and installed the new kernel. Reboot and cross your fingers.

If all went well, you are now booting off your freshly-built custom kernel. The only downside is that OpenBSD, in all its wisdom, has opted not to pick up the 8th hard drive. Only 8 drives appear on boot, so it could well be a limitation in OpenBSD. If anybody can shed some light on this, please leave a comment.

Next we prepare your hard drives. Type fdisk -i sd0 to initialise the disk's MBR (probably a good idea even if you don't intend to boot off of it), then type disklabel -E sd0 to fire up the partitioner:

# disklabel -E sd0 (enter)
# Inside MBR partition 3: type A6 start 32 size 1048544
Treating sectors 32-1048576 as the OpenBSD portion of the disk.
You can use the 'b' command to change this.

Initial label editor (enter '?' for help at any prompt)
> a d (enter)
offset: [32] (enter)
size: [1048544] (enter)
FS type: [4.2BSD] RAID (enter)
> q (enter)
Write new label?: [y] (enter)

Now that your first hard drive is done, we need to mirror this across each drive. This is actually quite simple to do. First, we make a copy of the disklabel for the first disk:

# disklabel sd0 > disklabel.tpl

Then we write that disklabel to each drive in turn (just copy and paste everything after the hash/pound sign):

# for i in 1 2 3 4 5 6; do echo "y" | fdisk -i sd$i 2>&1 >/dev/null; disklabel -R sd$i disklabel.tpl 2>/dev/null; done

And if you wish to check each disk in turn to make sure, each one in line should have exactly the same setup.

Now we come to the fun part - configuring the RAID. If you want to create a 4 disk array in raid 5 with 3 spares, here's what you do:

cat > /etc/raid0.conf <<EOF
START array
#One column, 4 drives per column, 3 drives spare
1 4 3
START disks
# A list of the drives to use
START spare
# A list of drives to keep spare
START layout
# 128 bytes per stripe, 1 stripe per parity unit, 1 stripe per
# reconstruction unit, in raid 5
128 1 1 5
START queue
# This establishes a FIFO queue of 100 requests
fifo 100


Ok, now that we have a configuration file done, we can start the ball rolling on the array:

raidctl -C /etc/raid0.conf raid0
raidctl -I 100 raid0
raidctl -iv raid0

You might get some worrying-looking errors, but they're fine to ignore.

Now we partition the RAID disk. For the purpose of this explaination I will create a single volume with swap:

# diskpart raid0
ksh: diskpart: not found
# disklabel -E raid0
Initial label editor (enter '?' for help at any prompt)
> a a
offset: [0]
size: [2096896] 1572608
FS type: [4.2BSD]
> a b
offset: [1572608]
size: [524288]
FS type: [swap]
> p
OpenBSD area: 0-2096896; size: 2096896; free: 0
# size offset fstype [fsize bsize cpg]
a: 1572608 0 4.2BSD 2048 16384 1
b: 524288 1572608 swap
c: 2096896 0 unused 0 0
> w
> q
No label changes.

All right, now we create a filesystem on the new RAIDed partitions, clean up the kernel compile and merge the filesystem with the RAID:

newfs /dev/rraid0a
rm -Rf /usr/src/sys
mount /dev/raid0a /mnt
cd /mnt
dump -0f | restore -rf -

First, we create a filesystem on the raided partition, then we remove the /usr/src/sys directory where we compiled the kernel. After this we mount the raided partition and dump the contents of the filesystem into it. While this is taking place, bring up the VMware server window and see the activity in the first 4 hard drives. All four are working.

Now just one final thing to do before we reboot - we must activate the RAID and make the system boot off of it:

raidctl -A root raid0
cat > /mnt/etc/fstab <&ltEOF
/dev/raid0a ffs rw 1 1
/dev/raid0b swap sw 0 0
This tells the kernel to mount root off the RAID array, and sets up the partitions to be mounted and swap to be activated.

Once again, we reboot and cross our fingers.

Watch as the system boots - you should see it say something like root on raid0a and /dev/rraid0a: file system is clean; not checking. If that is the case, we have successfully booted off the RAID. Watch the lightshow that the HD icons show for us.

Now, lets check the status of this array. Note that no spares appear, even though we specified them. Again, if somebody can shed light on why this is, please comment.

# raidctl -s raid0 (enter)
raid0 Components:
/dev/sd0d: optimal
/dev/sd1d: optimal
/dev/sd2d: optimal
/dev/sd3d: optimal
No spares.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

Let's have a little fun with this. Since no spares are configured for it, let's add the spares:

# raidctl -a /dev/sd4d raid0 (enter)
# raidctl -a /dev/sd5d raid0 (enter)
# raidctl -a /dev/sd6d raid0 (enter)

You may see some concerning errors in the dmesg output - these can be ignored.

Now let's simulate a disk failure. This is easy to do:

# raidctl -F /dev/sd2d raid0 (enter)

This will fail the device specified, and begin rebuilding onto a hot spare. Until this rebuild is completed, the array will limp on in "degraded" mode. You will be able to view the progress of this in the VMware window - some of the hard drives will light up as the data gets reconstructed.

Now let's have a look at the status now:

# raidctl -s raidd0 (enter)
raid0 Components:
/dev/sd0d: optimal
/dev/sd1d: optimal
/dev/sd2d: spared
/dev/sd3d: optimal
/dev/sd4d: used_spare
/dev/sd6d: spare
/dev/sd5d: spare
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

What we have here is a hard drive that is no longer in use, and whose place has been taken (temporarily) by a hot spare. In situations where you are able to hot swap, you can pull the hard drive out and replace it, then run the following to rebuild the array:

# raidctl -R /dev/sd2d raid0 (enter)

This will rebuild the array as it was before. If you restart the machine before replacing the hard drive, the spare drive list will be reset and one disk will appear as "failed". The machine would still limp on, albeit in a crippled state. At that point running the above command would ensure that the system presses on - presumably to allow the system to continue should it not have the ability to hot-swap drives. This is made possible by the parity data, allowing the missing data to be rebuilt in the event of a failure.

To test the array to see if the system could continue in the event of a hard drive failure, fire up another SSH session and do a hard disk-intensive task - for example, getting an MD5 sum of each file on the system:

# find / -type f -exec md5 {} \;

Then, in the original session, fail a drive in the array as you did before, then rebuild. You should notice that the operation proceeds without stopping or spouting errors.

In conclusion, it should be noted that under no circumstances should RAID be considered a replacement for a good backup strategy, even if it is in mirrored mode. It is a vital component in the war against wear and tear, but it need not be your entire offensive strategy. It's also not the only way to speed your system up.

I hope this clears the air for those of you who are interested in RAID but had no idea what they were walking into until now. In a future post I'll delve even deeper into nested RAID levels using RAIDframe. Later days!

No comments:

Post a Comment