Friday, July 23, 2010

Petabytes by the penny

Hi all,

I recently found this very interesting blog post about a mass storage company who built their own SAN infrastructure using commodity hardware. I know, it's old news, but I still think it's awesome - one of these things holds over 3 times what my organisation's EqualLogic PS6000s can at a fraction of the cost.

A few storage experts have waded into the debate citing numerous issues - there's vibration (which is addressed by using "anti-vibration sleeves" (aka rubber bands) and a large piece of foam in the top part of the case), the high failure rate of the hard drives (which they are aware of and say they only have to replace one drive a week on average across their entire infrastructure), poor throughput as a result of the use of PCI, and multiple single points of failure (the boot hard drive and SATA cards) - which I will address here.

So what's wrong with the design? Nothing if throughput and business continuity aren't your goals (and you don't put all of your data on one machine - which these guys don't). However, there are still problems that need to be faced, especially if you intend to use only one of these units.

Consider the following diagram, which assumes five 2-port SATA cards instead of three 2-port cards and a 4-port card:


Each cell represents a drive. The number pair relates to the SATA controller and port. These are in blocks of five to represent each port multiplier.

So each controller gets 10 drives. Now say each column represents a RAID-6 array. What if controller 1 dies? There's 10 drives down and a RAID-6 array decimated. What about controller 2? There's two RAID arrays down the toilet in one fell swoop.

Not only that, the entire system is hedged on a single, non-redundant hard drive. If that goes, there goes your system.

If 'twere up to me, I would run the array on EON - a version of OpenSolaris that is stripped down to the point where it'll fit into approx. 200mb - and use a CompactFlash card instead of a hard drive (the image file would be backed up so that if the card failed, I could just slap the image onto a new card, power down the pod and plug it in). I would then arrange the drives into five RAIDZ2 groups like so:
In this way, the arrays take one drive from each SATA port. In this scenario, the hypothetical SATA controller failure would, at worst, degrade an array.

I hope I haven't scared anybody away from exploring cheap cloud storage - it does help greatly to know the pitfalls, and how to get around them. The key to designing storage systems is not designing them to fail by putting a large number of eggs into one basket.

Building these things is a great learning exercise, and a rewarding one at that. Good luck!

No comments:

Post a Comment