Difference between revisions of "Linux RAID best practices"

From Strugglers
Jump to: navigation, search
(more to come..)
 
m (categorise)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This article is about the various issues surrounding '''[[Wikipedia:Redundant Array of Independent Disks|RAID]] on [[Wikipedia:Linux|Linux]]'''.
 
This article is about the various issues surrounding '''[[Wikipedia:Redundant Array of Independent Disks|RAID]] on [[Wikipedia:Linux|Linux]]'''.
 +
 +
==Why RAID?==
 +
In these days of dwindling disk prices and ever-increasing capacities, it is not excessive to purchase extra disks.  A simple two-disk [[Wikipedia:SATA|SATA]] setup may only cost £100 more than the single disk scenario, and personally I know that is a lot less than the cost of 100+ GB of data and hours of my time that would be lost should that single disk die.
 +
 +
I will never again run a machine with data on that I care about without redundant disks.
 +
 +
===RAID is not a substitute for backups!===
 +
This is mentioned everywhere but it can't be mentioned enough.  When a disk dies backups can save your arse in a similar way to RAID, but other than that the purposes are completely different — you need both!
 +
 +
If your single disk dies then your machine doesn't work anymore and restoring from backup can be a time-consuming process.
 +
 +
If you delete or corrupt a file then RAID makes sure it's deleted or corrupt on every disk — now you need to restore from a backup!  Even if you don't personally do the deletion or corruption, all software has bugs, so this is a fact of life.
  
 
==Hardware or software?==
 
==Hardware or software?==
 
===Performance===
 
===Performance===
Given a decent CPU in your host machine, software RAID using MD will most likely be faster than a hardware RAID card.  However if your host experiences frequent high CPU loading then this can degrade performance system-wide.  Hardware RAID cards have their own CPU so can continue to provide decent performance regardless of the load of the host system.
+
Given a decent CPU in your host machine (Pentium 1GHz+), software RAID using MD will most likely be faster than a hardware RAID card.  However if your host experiences frequent high CPU loading then this can degrade performance system-wide.  Hardware RAID cards have their own CPU so can continue to provide decent performance regardless of the load of the host system, but note that this is only really relevant for using the levels with parity (e.g. 5, 50, etc.).
  
 
More expensive RAID cards include a small amount of battery-backed storage.  This increases performance as the card can inform the operating system that the write is complete as soon as it hits its memory.
 
More expensive RAID cards include a small amount of battery-backed storage.  This increases performance as the card can inform the operating system that the write is complete as soon as it hits its memory.
  
 
===Reliability===
 
===Reliability===
I haven't personally had any issues with Linux software RAID, and use it a lot.  Do bear in mind that MD is still under development and it is possible, though unlikely, that a bug could be introduced.
+
I haven't personally had any issues with Linux software RAID, and use it a lot.  Do bear in mind that MD is still under development and it is possible, though unlikely, that a bug could be lying dormant or introduced.
  
 
Since MD runs as part of the Linux kernel, you need a working kernel to have working RAID.  If you break your kernel and your machine does not boot then you will need a rescue disk that supports MD in order to fix things.
 
Since MD runs as part of the Linux kernel, you need a working kernel to have working RAID.  If you break your kernel and your machine does not boot then you will need a rescue disk that supports MD in order to fix things.
  
Hardware RAID is presented to the BIOS as a normal disk and the better cards allow configuration from the BIOS.
+
Hardware RAID is presented to the BIOS as a normal disk and the better cards allow configuration from the BIOS as well as from userland once the system is booted.
  
If there is a serious problem with a hardware RAID card then you must rely on the vendor to help you get your data.  If changing RAID cards then you need to restore from backup unless changing only between cards from the same vendor and there is guaranteed compatability.  By contrast you can usually take disks running under MD and install them into another Linux box running MD, in any order, and still have array(s) assemble correctly.
+
If there is a serious problem with a hardware RAID card then you must rely on the vendor to help you get your data.  If changing RAID cards then you need to restore from backup unless changing only between cards from the same vendor and there is guaranteed compatability.  By contrast you can usually take disks running under MD and install them into another Linux box running MD, in any order, and still have array(s) assemble correctly.  You also have a lot of flexibility over how the arrays are assembled with MD, the on-disk storage formats are public knowledge, and there are many people who can help to recover data even if it would technically be considered lost.
 +
 
 +
More expensive RAID cards include a small amount of battery-backed memory which is used to ensure that the card finishes writing all changes when power is lost.  This may be irrelevant for a server on decent redundant power.
  
 
==Levels==
 
==Levels==
For a decent explanation of what the different RAID levels are, try the [[Wikipedia:Redundant Array of Independent Disks#Standard_RAID_levels|Wikipedia article]].
+
For a good explanation of what the different RAID levels are, try the [[Wikipedia:Redundant Array of Independent Disks#Standard_RAID_levels|Wikipedia article]].
 +
 
 +
===Linear===
 +
Not really a RAID level or configuration, this is just a concatenation of devices into one big device.  Usually best avoided since not only is there no redundancy but it leaves hot spots.
 +
 
 +
===RAID-0===
 +
Avoid except for scratch areas or filesystems where you really don't care if they get destroyed, e.g. data that can be relatively easily generated from other data but needs to be on RAID-0 for the read performance.
 +
 
 +
===RAID-1===
 +
A simple mirror.  Under MD, each device can be removed and used as if it were not part of a RAID set.  There isn't too much point in having a RAID-1 of a lot more than 2 devices — the chances of simultaneous failure of more than 2 devices rapidly gets smaller and smaller, and this can also negatively affect write performance.
 +
 
 +
===RAID-5===
 +
Parity is distributed across all devices and takes up one device's worth of capacity.  The minimum number of devices is 3.  RAID-5 can have quite serious write performance problems as the number of devices increases, as every write requires a parity recalculation involving all devices.
 +
 
 +
When there are many devices, the resync time required when a single device fails and a new one is added to replace it can be excessive, on the order of days, and performance is seriously degraded during this time.  A serious issue is that a second unusable area encountered during a resync will kick another device from the array and render the whole array unusable.
 +
 
 +
RAID-5 is however well understood both in MD and in hardware RAID cards, and provided the environment does not involve a large amount of writing it may still be favoured as it offers some redundancy with little loss of capacity.
 +
 
 +
===RAID-6===
 +
This level is not commonly found on hardware RAID cards but is supported and recently considered stable in MD.  It is similar to RAID-5 but uses an additional parity stripe (therefore capacity is n-2 and minimum 4 devices are required).  This means that the array can survive the failure of any 2 devices.  There is a slight additional write performance penalty as compared to RAID-5 but this is unlikely to be noticeable except in environments with heavy write requirements (in which case neither RAID-5 nor RAID-6 are ideal).
 +
 
 +
A RAID-6 with 4 devices gives 2 devices worth of capacity and is usually preferable to a RAID-5 with one hot spare, since in RAID-6 the device that would otherwise be the hot spare is kept continually in action, yet any two devices can fail.  The immediate failure of a hot spare device is relatively common since these devices go from being completely idle to being subject to the extreme stress of being the target of resynchronisation.
 +
 
 +
===RAID-10 or 1+0===
 +
RAID-10 can be implemented as a stripe of RAID-1 pairs.  For example, given 6 devices, you may configure them as three RAID-1s A, B and C, and then configure a RAID-0 of ABC.
 +
 
 +
RAID-10 is recommended by database vendors and is particularly suitable for providing high performance (both read and write) and redundancy at the same time.  The downside of course is that only half the total capacity of the devices is usable.
 +
 
 +
RAID-10 can survive the loss of up to 1 device from each of the underlying mirrors.  In the example above, it is possible that 3 devices could fail (one from each of A, B and C) and the array would still be in operation.  If two devices from the same mirror were to fail however then the array would be lost.
 +
 
 +
RAID-10 is very scalable.  Other levels can suffer severe problems with write performance as the number of devices increases, but RAID-10 has no such problems.  It is however advisable to use hot spares when there are many devices, to mitigate the chances of a double device failure within one mirror.
 +
 
 +
Classic RAID-10 requires an even number of devices (e.g. 4, 6, 8, ...).  Some hardware RAID vendors do support proprietary versions of RAID-10 that can cope with an odd number of devices — IBM has RAID-1E for example.  Linux MD supports an odd number of devices for RAID-10.
 +
 
 +
===RAID 0+1===
 +
A bit like RAID-10, but the other way around — a mirror of stripes,  This configuration doesn't make much sense and is therefore only commonly available in Linux MD as a manual multi-level setup.
 +
 
 +
The problem with this configuration is that the whole array is degraded as soon as a single device fails, and once the device is replaced a full resync has to happen across many devices.
 +
 
 +
===Conclusion===
 +
If you can afford it, stay with RAID-10 (possibly with hot spares) as much as possible.  If using Linux MD then bear in mind that grub/lilo cannot boot off anything but RAID-1 though.
 +
 
 +
In low-write environments RAID-5 will give much better price per GiB of storage, but as the number of devices increases (say, beyond 6) it becomes more important to consider RAID-6 and/or hot spares.
 +
 
 +
A RAID-1 on two devices should be the minimum configuration for any machine of any importance at all.
 +
 
 +
[[Category:Linux]]
 +
[[Category:Sysadmin]]
  
{{stub}}
+
{{Stub}}

Latest revision as of 22:50, 2 February 2006

This article is about the various issues surrounding RAID on Linux.

Why RAID?

In these days of dwindling disk prices and ever-increasing capacities, it is not excessive to purchase extra disks. A simple two-disk SATA setup may only cost £100 more than the single disk scenario, and personally I know that is a lot less than the cost of 100+ GB of data and hours of my time that would be lost should that single disk die.

I will never again run a machine with data on that I care about without redundant disks.

RAID is not a substitute for backups!

This is mentioned everywhere but it can't be mentioned enough. When a disk dies backups can save your arse in a similar way to RAID, but other than that the purposes are completely different — you need both!

If your single disk dies then your machine doesn't work anymore and restoring from backup can be a time-consuming process.

If you delete or corrupt a file then RAID makes sure it's deleted or corrupt on every disk — now you need to restore from a backup! Even if you don't personally do the deletion or corruption, all software has bugs, so this is a fact of life.

Hardware or software?

Performance

Given a decent CPU in your host machine (Pentium 1GHz+), software RAID using MD will most likely be faster than a hardware RAID card. However if your host experiences frequent high CPU loading then this can degrade performance system-wide. Hardware RAID cards have their own CPU so can continue to provide decent performance regardless of the load of the host system, but note that this is only really relevant for using the levels with parity (e.g. 5, 50, etc.).

More expensive RAID cards include a small amount of battery-backed storage. This increases performance as the card can inform the operating system that the write is complete as soon as it hits its memory.

Reliability

I haven't personally had any issues with Linux software RAID, and use it a lot. Do bear in mind that MD is still under development and it is possible, though unlikely, that a bug could be lying dormant or introduced.

Since MD runs as part of the Linux kernel, you need a working kernel to have working RAID. If you break your kernel and your machine does not boot then you will need a rescue disk that supports MD in order to fix things.

Hardware RAID is presented to the BIOS as a normal disk and the better cards allow configuration from the BIOS as well as from userland once the system is booted.

If there is a serious problem with a hardware RAID card then you must rely on the vendor to help you get your data. If changing RAID cards then you need to restore from backup unless changing only between cards from the same vendor and there is guaranteed compatability. By contrast you can usually take disks running under MD and install them into another Linux box running MD, in any order, and still have array(s) assemble correctly. You also have a lot of flexibility over how the arrays are assembled with MD, the on-disk storage formats are public knowledge, and there are many people who can help to recover data even if it would technically be considered lost.

More expensive RAID cards include a small amount of battery-backed memory which is used to ensure that the card finishes writing all changes when power is lost. This may be irrelevant for a server on decent redundant power.

Levels

For a good explanation of what the different RAID levels are, try the Wikipedia article.

Linear

Not really a RAID level or configuration, this is just a concatenation of devices into one big device. Usually best avoided since not only is there no redundancy but it leaves hot spots.

RAID-0

Avoid except for scratch areas or filesystems where you really don't care if they get destroyed, e.g. data that can be relatively easily generated from other data but needs to be on RAID-0 for the read performance.

RAID-1

A simple mirror. Under MD, each device can be removed and used as if it were not part of a RAID set. There isn't too much point in having a RAID-1 of a lot more than 2 devices — the chances of simultaneous failure of more than 2 devices rapidly gets smaller and smaller, and this can also negatively affect write performance.

RAID-5

Parity is distributed across all devices and takes up one device's worth of capacity. The minimum number of devices is 3. RAID-5 can have quite serious write performance problems as the number of devices increases, as every write requires a parity recalculation involving all devices.

When there are many devices, the resync time required when a single device fails and a new one is added to replace it can be excessive, on the order of days, and performance is seriously degraded during this time. A serious issue is that a second unusable area encountered during a resync will kick another device from the array and render the whole array unusable.

RAID-5 is however well understood both in MD and in hardware RAID cards, and provided the environment does not involve a large amount of writing it may still be favoured as it offers some redundancy with little loss of capacity.

RAID-6

This level is not commonly found on hardware RAID cards but is supported and recently considered stable in MD. It is similar to RAID-5 but uses an additional parity stripe (therefore capacity is n-2 and minimum 4 devices are required). This means that the array can survive the failure of any 2 devices. There is a slight additional write performance penalty as compared to RAID-5 but this is unlikely to be noticeable except in environments with heavy write requirements (in which case neither RAID-5 nor RAID-6 are ideal).

A RAID-6 with 4 devices gives 2 devices worth of capacity and is usually preferable to a RAID-5 with one hot spare, since in RAID-6 the device that would otherwise be the hot spare is kept continually in action, yet any two devices can fail. The immediate failure of a hot spare device is relatively common since these devices go from being completely idle to being subject to the extreme stress of being the target of resynchronisation.

RAID-10 or 1+0

RAID-10 can be implemented as a stripe of RAID-1 pairs. For example, given 6 devices, you may configure them as three RAID-1s A, B and C, and then configure a RAID-0 of ABC.

RAID-10 is recommended by database vendors and is particularly suitable for providing high performance (both read and write) and redundancy at the same time. The downside of course is that only half the total capacity of the devices is usable.

RAID-10 can survive the loss of up to 1 device from each of the underlying mirrors. In the example above, it is possible that 3 devices could fail (one from each of A, B and C) and the array would still be in operation. If two devices from the same mirror were to fail however then the array would be lost.

RAID-10 is very scalable. Other levels can suffer severe problems with write performance as the number of devices increases, but RAID-10 has no such problems. It is however advisable to use hot spares when there are many devices, to mitigate the chances of a double device failure within one mirror.

Classic RAID-10 requires an even number of devices (e.g. 4, 6, 8, ...). Some hardware RAID vendors do support proprietary versions of RAID-10 that can cope with an odd number of devices — IBM has RAID-1E for example. Linux MD supports an odd number of devices for RAID-10.

RAID 0+1

A bit like RAID-10, but the other way around — a mirror of stripes, This configuration doesn't make much sense and is therefore only commonly available in Linux MD as a manual multi-level setup.

The problem with this configuration is that the whole array is degraded as soon as a single device fails, and once the device is replaced a full resync has to happen across many devices.

Conclusion

If you can afford it, stay with RAID-10 (possibly with hot spares) as much as possible. If using Linux MD then bear in mind that grub/lilo cannot boot off anything but RAID-1 though.

In low-write environments RAID-5 will give much better price per GiB of storage, but as the number of devices increases (say, beyond 6) it becomes more important to consider RAID-6 and/or hot spares.

A RAID-1 on two devices should be the minimum configuration for any machine of any importance at all.