Some harsh realities

Recently BitFolk has been accused of overcharging for disk space.

In general I don’t try to defend BitFolk’s price-point – the unmanaged VPS hosting market is flooded and it is very easy to find stuff hosted out of the US or continental Europe for just a couple of pounds per month. Clearly I am not going to try to compete on price alone, yet BitFolk does sit firmly towards the cheap end which I feel is fair given that there isn’t a 24-hour team of support persons in nice business premises.

This particular complaint however seems to stem from the perception that “disk is cheap.” Well, yes, it is fairly cheap. That’s why we sell it at the “fairly cheap” price of £6/5GiB/year (10p/GiB/month), with no VAT added on top. Just because you can buy a 1.5T consumer hard drive for about 9p a gigabyte doesn’t mean that you should expect to find 1GiB of usable disk space on a server in a decent datacentre for anywhere close to that figure!

I try to keep costs down by using a configuration based around 4×7.2kRPM 3.5″ SATA disks with hardware RAID. I would dearly love to have a nice shared storage solution with 10 or 15kRPM 2.5″ SAS disks, or even to use them as local storage. Lack of disk I/O is the limiting factor for how many customers I can put on one machine. The problem is that the storage costs would be around 10 times as much and the target market (mostly people looking for cheap personal hosting) will not pay for it. They don’t understand why it would be desirable; for many of them it may not even be necessary since if they do only a little I/O they get the same performance either way.

So okay, if we resign ourselves to 4×7.2kRPM SATA disks and a RAID card as local storage, the next way to keep the price down would be to buy the disks with the sweet spot for price per gigabyte. At the moment that would be 1T. The problem now is that I’d end up with roughly twice as much disk space as I could ever sell on each server. I don’t get to keep adding customers until the disk space runs out — the I/O operations per second run out first. At the moment I can sell around 700GiB per server.

I thought I would not need to explain that 2x500G in a stripe with no redundancy would be insane, but apparently not, because I am told that some people “don’t need RAID.” I have to disagree, and I feel the ~49 or so other people on the server would also disagree when the first disk failure sees their service down and all their data lost (apart from the ones who have a backup strategy, right? No, really, why are you laughing?). Let’s not go there.

If you recall, I/O is what runs out first. So any sort of RAID-5 configuration is a bad idea because of the read-modify-write problem. The minimum number of disks and the most sensible RAID level then is a 4-disk RAID-10. Four 500G Western Digital Green Power drives will set me back around £165+VAT. You’re looking at around a further £225+VAT for a 3ware 9650 RAID controller. After the manufacturer lies are accounted for and an operating system is installed, there’s going to be about 930GiB of usable space left. We’re now at £390 for the lot, or 41p/GiB of usable space. Excluding VAT.

By the way, I am repeatedly told that Linux software RAID is good enough and I needn’t bother with hardware RAID (even a cheapy one like 3ware). I started off using Linux software RAID and still have one server using it, but that’s due for decommissioning next month. In general it does perform well enough. Unfortunately, hard drives accumulate errors and the only way to find them is to read the disks looking for them. The code for doing so on software RAID needs to be in the main operating system and the Linux mdadm package in Debian (and presumably elsewhere) handles it by means of a cron job that runs once a month to verify all the disks. Because it’s running on the host all the data has to go through the OS and while the machines are under moderate write load I have found that this verify process will take several days to complete and will impact I/O performance. In short it’s actually more cost effective to spend more on a RAID controller and put more customers on one machine.

Now consider the power usage. More than 60% of BitFolk’s recurring hosting costs are directly related to power. Disks aren’t huge power draws when compared to the CPU or chipset, but it’s not an inconsiderable extra cost and it’s often overlooked.

We’re already up to 41p/GiB cost price, but you may be thinking that this is no problem since at 10p/GiB/month, 700GiB sold brings in £70 a month, paying for all the disks and RAID controller after about 6 months. The reality is nothing like this. The full price has to be paid up front to get the hardware into service, but it’s going to be months before the server is full of paying customers. And if those customers don’t happen to want any extra disk space, then still around 50% of this capacity will remain unsold. The remaining capacity is not usable when the IOP/s have run out, but it has to be there from the start just in case there is demand. Does 10p/GiB/month start to look more reasonable yet?

If not, maybe you would be better off going to a really big cloud computing vendor who can take advantage of massive economies of scale to really drive the price down for you. Like say, Amazon S3 who will charge you $0.18/GiB/month for storing stuff in Europe. Plus $0.10/GiB/month more to write it and $0.14/GiB/month to read it.

Finally, the entire point of paying for a virtual server is that you don’t need to worry about the hardware. If it breaks, it’s BitFolk that replaces it, hopefully without you even noticing. If you are sitting there thinking “I could buy a 1.5T hard disk for 9p a GB, screw this!” then you just don’t get it. If from the outset you are prepared to manage your own hardware, and your needs justify purchasing an entire machine, then guess what? Don’t buy a virtual server on someone else’s hardware! Buy your own hardware that is set up exactly how you want (and please feel free to have no RAID and host it under your bed). With this mindset, pretty much every “* as a Service” product is going to look expensive to you because you have missed the point.

5 thoughts on “Some harsh realities

  1. > Unfortunately, hard drives accumulate errors and the only way to find them is to read the disks looking for them.

    That’s probably true of any RAID system. You might as well just disable the verification checks then. (You can also, obviously, tweak the speedlimit).

    I’ve found that if you monitor for errors with SMART (which you often can’t do with hardware RAID), then you can probably do away with the verification-check. The check is a relatively recent addition, and it’s never found me any discrepancies (I monitor SMART to pick up bad sectors).

    Disclaimer: My experiences of hardware RAID have never been positive, so I promote the use of software RAID where possible.

  2. @Stefano,

    Believe me, I looked into this a lot before making the decision to spend an extra £225 per server!

    Already at the default speed limit it causes performance problems and doesn’t complete the verify for several days. If I lowered the speed in order to not cause performance problems then I would be looking at weeks of verify. With the 3ware I verify every week and it’s not noticeable.

    It is true of any disk (not just RAID setups) so you must verify them, otherwise you may end up in a situation where one disk dies and whilst rebuilding from the other you encounter a read error. At this point data is lost.

    SATA hard drives typically have an unrecoverable bit error rate of 1 in 10^14 bits read – or one error every 11TiB read – so you have to read them to find these, then you can reconstruct the data from the other disk(s) and rewrite the bad sector. If you wait until you have no other copies (such as when rebuilding after a failed disk) then you risk data loss. Newer filesystems like zfs are tackling this issue by building in a checksum.

    I don’t think SMART checks will be this thorough, but FWIW 3ware cards do allow SMART commands through to the individual drives if you want to use SMART.

  3. @Andy:

    Aah, right, didn’t realise you were doing the verify on the 3ware card too.

    Agreed re bit error rate, although I can’t say I’ve personally noticed corruption caused by it, the ratio of bit error rate vs disk size is getting rather unfavourable.

  4. An interesting post, thanks Andy. I’ve often wondered (but never got down to sitting down and working out) why the storage costs on Bitfolk feel out of line with the excellent pricing of the rest of the service. Certainly when I was looking at replacing a previous dedicated “server” with a bitfolk box I worked out what it would cost to host the ~80GB of digital photos I have in various galleries and it was a bit rich for my taste.

    My solution was to host all the big-disk, low-risk stuff on a Dreamhost account, which can perform relatively sluggishly and occasionally drop off the ‘net with no dire consequences and the more important stuff on the bitfolk box.

Leave a Reply

Your email address will not be published. Required fields are marked *