Child pages
  • All About RAID
Skip to end of metadata
Go to start of metadata

Disclaimer - The information below  is not intended to be a scientific study on RAID and how it works. It is nonspecific and the actual way in which a RAID controller stores data may differ widely. This is a basic overview of the principles of RAID in order to allow people to decide whether RAID is right for them, and if so, which RAID type they should choose.
 

What is RAID (Redundant Array of Inexpensive Disks)?

 

RAID is a set of disks used together to improve the performance and reliability of a storage set. RAID can be used for striping, which increases the performance of drives, but risks reliability by using two separate disks to store a single file. It can also be used for mirroring, which increases reliability by replicating data to two or more drives, but increases cost. A fundamental understanding of this comes from the knowledge that data on disks are stored in the form of binary signals on a disk. We use ones and zeros when talking about binary data. A single text file could include the classic “Hello World”.


In ASCII, “Hello World” becomes:

 

1001000 1100101 1101100 1101100 1101111 0100000 1010111 1101111 1110010 1101100 1100100

*Quotation marks omitted and spaces added to distinguish between individual binary sets for each character.


Using RAID0, a performance RAID configuration, this text file would be distributed between two or more disks. The striping storage method would store the data with as follows:


 

DISK0

DISK1

1001000 1101100 1101111 101011111100101100100
1100101 11011000100000 1101111 1101100

 

In actuality, RAID0 better balances the storage of data so that neither drive holds more than the other. However the basic concept is that one set of data is distributed between two separate drives in order to increase read and write speed for the set of drives. From the illustration above you can see that if one drive were lost the data on the other drive becomes meaningless. What was “Hello World” on a single drive became “HloWrd” on DISK0 and “el ol” on DISK1. However, with striping and regular disk testing, the speed of your storage set can improve at least 1.5 times a single drive’s speed.


For this reason many people choose to use RAID0, even with the risk of losing data when one drive fails. In addition, because the data is distributed across two drives of the same set, the storage space is just about the size of the two drives together. It is a slightly less because of RAID metadata - very much like formatting information on drives - and stores information about how the RAID is configured on the set. With RAID0, losing a single drive makes the rest of the data in the set inaccessible, as only half of the bits are available to the filesystem.


RAID1, a redundant configuration, increases the reliability of a data set by mirroring all bits written to the RAID1 array to all drives in the set. The above example of “Hello World” is saved the same way on each disk to create an automatic backup for all the data in the set. This does impact performance, but the reliability is better. Below is an illustration of the same data on RAID1 with two drives in the set:

 

 

DISK0

DISK1

1001000 1100101 1101100 1101100 1101111 0100000 1010111 1101111 1110010 1101100 1100100

1001000 1100101 1101100 1101100 1101111 0100000 1010111 1101111 1110010 1101100 1100100

 

As you can see, with RAID1 each drive is an exact copy of the other drives in the set, so all but one drive could fail and the data remains intact.


A more common form of RAID, which offers both redundancy and performance, is RAID10. As the name suggests it is a combination of RAID0 and RAID1 which offers many of the performance gains of RAID0 while adding the redundancy of RAID1. As a minimum, RAID10 requires four drives. Storing “Hello World” with RAID10 might look something like this:


 

DISK0

DISK1

DISK2

DISK3

1001000 1101100 1101111 1010111 1110010 1100100

1100101 1101100 0100000 1101111 1101100

1001000 1101100 1101111 1010111 1110010 1100100

1100101 1101100 0100000 1101111 1101100

 

Here the specifics will be different depending on model, but the basic concept is that the storage capacity is distributed across all four drives with two virtual sets cloning one another (DISK0 and DISK2), then (DISK1 and DISK3) and two virtual sets for striping (DISK0 and DISK1) with (DISK2 and DISK3).


This means there are two complete copies of your data stored across the four disks and because of the way the data is stored, 2 drives could fail while the data remains intact. In this example, we might lose: (DISK0 and DISK1) or (DISK1 and DISK2, etc). Again the specifics may differ and any single drive failure constitutes a need for replacement. RAID10 is also known as a hybrid RAID configuration because it uses RAID0 in combination with RAID1.