What is RAID and when to use it

Today we are going to talk to you about one of the most widely used computer systems by companies of any size and, increasingly, by individual users.

RAID.

You may have heard of this system before or it may be something totally unknown to you.

Be that as it may, we are going to explain what it is and how it can help you in the simplest possible way, avoiding strange words and very technical explanations that would leave you the same as at the beginning.

Our goal is that, after this post, you will be able to determine for yourself if you need a RAID, and if so, which one would suit your needs.

Here we go!

RAID: What is actually a RAID? What is its meaning?

RAID is an acronym from English (like almost everything else in this life 🙄):

Redundant Array of Independent Disks

This means: Redundant independent disk array.

RAID is a collection of hard disks (referred to as an array) converted into a single large-volume logical drive .
To clarify concepts, a logical drive is a “virtual” division of the physical disk.

That logical drive may take up the entire physical disk space (a 500 GB physical hard disk may have a 500 GB logical drive).

We can also divide it into three logical units, one unit of 300 GB and two units of 100 GB each, for example.

The RAID system allows several physical hard disks to be joined together into a single logical drive, making the operating system think it has only one hard disk (as a general rule).

RAID Concepts

As we have indicated, the RAID system stores the information on several disks (minimum two).

Depending on the type of RAID there are important concepts to know:

Sector size: The size of the sector or block size is the minimum fraction in which a file will be decomposed to keep it distributed among the disks.

How is the information distributed?

Either by software or hardware (a specific board to carry out the RAID) in all of them, the information will be cut up according to the block size or minimum sector size. A typical block size is 64 KB.

This means that the information will be chunked into a minimum unit of 64 KB between disks.

If a PDF file occupies exactly 1 MB (much to assume), it would be 1,024 KB.

The previous example was the representation of the “ideal” file.

Practically, this condition will NEVER be met.

So, what happens if the file occupies 1024 KB instead of 1024 KB?

In this case, we would have nine divisions. Five to the first disc and four to the second disc:

64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
64 KB disk 1		Actual space occupied = 64 KB
64 KB disk 2		Actual space occupied = 64 KB
Total file space	1088 KB	–	–
Total space used	–	1088 KB	–
Lost space	–	–	63 KB

With the above table, we can quickly see that in this case (also a bit extreme) with a sector size of 64 KB, for a file that does not use the entire sector, we would be losing storage capacity, since that sector is “reserved” for that file and cannot be shared.

Therefore, if of the 64 KB per sector we only occupy one or twenty of the sixty-four available, the rest of the space will be “lost” since it cannot be used for another file.

Another more real example: A text file (from Notepad) where we simply add four lines of text like the following example would occupy 0.1 KB (110 bytes)

But in RAID 0, this file would go inside a disk occupying 32 KB so we would be wasting a lot of space.

The most astute people will think… “The solution then is to make a very small block size (for example, 1 KB) and thus ensure that practically no loose files will be lost”.

This statement is valid, but we will encounter a problem: RAID performance.

A very small block size will optimize space to the maximum at the cost of very poor performance, since all files must be broken up into very small chunks in order to be written and, in turn, to be read.

A 1,024 KB file in a 64 KB block size makes a total of 16 divisions, while the same file with a 1 KB block size would make 1,000 divisions.

This process would drastically affect the performance of the file system.

If in a 1,024 KB file we used a block size of 512 KB, we would only make two divisions and it would go very fast. But if we were to save text files created with Notepad, despite a very fast speed, the loss of capacity would be very noticeable and as the disk would fill up, with very few files the space would be exhausted (a 1 KB Notepad file would actually occupy 512 KB).

At this point, which block size should I choose?

Although in the previous examples we have always talked about RAID 0, this answer is valid for any RAID.

The block size must be in accordance with the information that we foresee to be stored in the RAID.

Examples:

If the RAID is to be used for office applications (Word, Excel, PDF) a RAID with a block size of 64 KB is a suitable format.
If the RAID is to be used as a repository for small files (such as notepads, C or Visual Basic programming files, etc.) a RAID with a block size of 4 KB would be a suitable format.
If the RAID is to be used for user-level multimedia (photographs and videos taken by cameras and/or cell phones without extreme quality) a RAID with a block size of 128 KB or even 256 KB is a suitable format. Today the photographs (and let alone the videos) taken by a mobile phone or camera far exceed 4 or 5 MB of capacity.
If the use that is going to be given to the RAID is entirely to work in high definition format (RAW files and 4K onwards), a RAID with a block size 512 or 1024 KB is a suitable format, since the files will be large (for example, 2 or 3 GB).

Parity, what is it and what is it for?

Parity could be defined as the result of a mathematical operation in a very generic way. Applied to RAID (only some) it allows us to have “insurance” in case of loss of one of the disks. To try to illustrate parity, let’s look at the following example:

OPERATION	A=2	B=5	C=9	D=4	RESULT
A + B + C + D = PARITY					PARITY = 20
A = PARITY – D – B – C					A = 2
B = PARITY – D – A – C					B = 5
C = PARITY – D – A – B					C = 9
D = PARITY – A – B – C					D = 4

The example above is a very simple way to understand parity.

Let’s imagine that we have 4 disks in total forming a RAID. In this example it will be a RAID 5.

Suppose we do a sum of all the contents of disk A obtaining the number 2 as a result. Performing the same operation for B we obtain 5.

With C we get 9 and with D we get 4.

Following the mathematical formula described above we will obtain that the sum of A + B + C + D = 20.

If disk B fails (reading unit breakage, desynchronization, degraded surface, etc.), we can replace the damaged disk and thanks to the mathematical operation, we would once again have the lost content of disk B.

Exactly the same, it would happen with disk A, C or D.

Also, while the drive is being rebuilt, the RAID would continue to work for the end user with some delay in data delivery and recording.

This is possible thanks to the fact that both the software and the hardware controllers that perform the RAIDs can perform the mathematical operation to obtain the result of B and be able to display the data.

Really like how parity works in a RAID?

As we have discussed before, the information is written in blocks within the disks.

If each block (A1 A2 A3…) is 64 KB, it would form a set of 256 KB. calledstripe.

Imagining that each block that is written to the disks has a numbering like the table below that allows us to identify to which group it belongs (stripe) at all times, we will be able to recover the lost information from the disk.

DISK 1	DISK 2	DISK 3	DISK 4
A1	A2	A3	AP
B1	B2	BP	B3
C1	CP	C3	4C
DP	D2	D3	D4

In this case, the parity of each stripe is distributed on each disk backwards. But it does not always have to carry this configuration (in fact, there are many possibilities). Another example of how to distribute parity would be like this:

DISK 1	DISK 2	DISK 3	DISK 4
A1	A2	A3	AP
B1	B2	B3	BP
C1	C2	C3	CP
D1	D2	D3	DP

In this case, the parity calculation is only written to one disk. Instead of being spread over several disks, the result is written to one.

Having clarified the operation, it is important to know that the parity in a RAID 5 is really calculated by means of XOR.

XOR is a mathematical operation that determines if the result is true or false.

ENTRANCE		EXIT
A	B	A XOR B
0	0	0
0	1	1
1	0	1
1	1	0

Example of parity calculation for a RAID 5 of 5 disks:

STRIPE	DISK
Disk 1:__(Data)	00101010
Disk 2:__(Data)	10001110
Disk 3:__(Data)	11110111
Disk 4:__(Data)	10110101
Disk 5:__(Parity)	xxxxxx

00101010 XOR 10001110 XOR 11110111 XOR 10110101 = 11100110

Disk 5 would result in 11100110

If now disk 1 failed we would do the following operation:

10001110 XOR 11110111 XOR 10110101 XOR 11100110 = 00101010

And this operation returns the chunk within the stripe that belongs to disk 1.

However, in RAID 6, for example, another type is used: Reed-Solomon.

What types of RAID are there and how do they work?

To answer this question, we must take into account what will be the use of that RAID, in the same way that we must bear in mind what was previously commented (the block size).

The RAID system can be configured in various ways depending on the number of disks that make it up and the needs as users that we need.

How would we classify needs?

Assembly capacity, performance, fault tolerance and economic cost.

We will skip obsolete RAIDS today since its use or implementation is very low or practically non-existent, in the same way as the very complex ones since here we seek to give a global vision of the RAID operation.

To carry out a RAID 0 you need a minimum of two disks.

RAID 0

This type of RAID adds all the sizes of the disks forming a single unit.

RAID 0 uses the striping format and performs an equal distribution of data between the disks. In a generic way, we could say that if there are two disks forming a RAID 0, 50% of the information would go to one disk and the other 50% to the other disk. If there were 3 disks, it would be 33.33% to each of them.

RAID 0 offers the best performance in terms of storage capacity by adding up the available space of the drives.

Data read and write performance is high (can read sequentially and randomly)

Disadvantages: There is no redundancy or fault tolerance, so any failure or failure in one of the disks leads to a total loss of data.

Recommended: If we prioritize system performance and access to information (graphic design, 3D and video editing). It offers high performance, especially for large files.

In case of loss of one or several disk(s), it will only be possible to recover the files that occupy the block size (or less) on the disks that are still operational.

RAID 1

To carry out a RAID 1 you need a minimum of two disks.

This type of RAID adds all the sizes of the disks forming a single unit, regardless of the number of disks that form it.

That is, 10 2TB drives in RAID 1 format will show a single 2TB drive. In the supposed case of having 9 2 TB disks and one 1.5 TB disk, the logical unit that will be shown will be 1.5 TB, losing 0.5 TB in the rest of the disks.

Stores all data in duplicate. It does not have striping, nor block size, since the RAID works with the entire file, duplicating the information in all the units of the RAID set.

Advantages: RAID 1 offers complete redundancy. The information is duplicated throughout all the disks. It allows a fast reading of the information, but depending on the hardware used and the number of disks, the writing speed may be low.

Disadvantages: The cost is high, since storage capacity is wasted.

RAID 5

To carry out a RAID 5 you need a minimum of three disks.

This type of RAID adds all the sizes of the disks minus one ( N-1, where N is the total number of disks).

In the event of having disks of different sizes, the smallest will always be taken as a reference, discarding the rest of the space in the RAID set. Theoretically a RAID 5 does not have a disk limit, but the most common is to limit the use of drives.

As previously discussed, RAID 5 uses block size and parity.

The information is stored in stripes following the block size and a parity calculation is performed. Parity can be distributed among all disks, or it can be stored only on one disk. In any case, it always allows the fault tolerance of a unit.

When reading the stripe, the CRC is not read to avoid unnecessary charges. The CRC will only be read and modified if there is a block within the stripe that has been modified.

On the other hand, when a drive fails, the parity blocks of the functional drives are combined with the data using the math operations described above (typically XOR) to recreate the data on the fly.

Advantages:Raid 5 allows storage while providing fault tolerance with adequate performance. Balance between security, fault tolerance and performance.

Allows the failure of a unit.

When it is replaced, it will automatically “rebuild” itself with the information from the rest of the disks.

As a data storage system, it is one of the most cost-effective and efficient.

Disadvantages: Loss of write performance when the blocks are very small, since the parity calculation must be performed constantly.

In the event of a drive failure and replacement it can take a long time to rebuild the data due to parity.

During the rebuild of the failed drive, other drives may also fail, due to the workload of rebuilding the replaced drive.

RAID 6

To carry out a RAID 6 you need at least 4 disks.

The main difference is fault tolerance. Two drives can fail simultaneously, although the available space is penalized, since the total number of available space will be equal to the number of drives minus two (N-2).

Unlike RAID 5, specific hardware is required for RAID management.

Provides high data redundancy and read performance.

The read performance is optimal, but the write performance is lower than Raid 5 due to the two parity calculations.

Parity is calculated based on the Reed-Solomon code.

Advantages: The same as a RAID 5 adding one more disk for fault tolerance.

Disadvantages: Slower writing speed in some cases. Higher economic cost (Hardware).

RAID 10 (RAID 1+0)

For the realization of a RAID 10 a minimum of four disks is needed.

The structure of a RAID 1+0 or RAID 10 is as follows: Two disks form a RAID 1 and the other two disks form another RAID 1. Of the two resulting volumes, a RAID 0 is performed.

Offers high read performance (thanks to Raid 0).

In turn, the two sets of RAID 1 provide security and fault tolerance. Allows two drives to be broken as long as they are not from the same group (i.e. the two drives that make up RAID 1 are broken simultaneously)

Advantages: Performance.

Disadvantages: Cost for hardware.

RAID 0+1

To carry out a RAID 0+1, a minimum of four disks are needed.

Unlike RAID 1+0, two drives form RAID 0 and the other two drives form another RAID 0. The two resulting volumes form a RAID 1. It is a less secure configuration than RAID 1+0 since it does not allow two simultaneous failures.

RAID 50 (RAID 5+0)

To carry out a RAID 50, a minimum of six disks are needed.

The RAID 50 structure is as follows:

A RAID 5 is carried out on three units and another RAID 5 on the other three remaining units (and so on with the rest of the sets, although there may be groups of 4 and 4 or 5 and 5, etc.). The resulting two or more volumes form a RAID 0.

Allows fault tolerance of 1 concurrent disk for every RAID 5 in the array.

Advantages:Robust system and good read performance compared to RAID 5

Disadvantages: High failure rebuild time. Hardware cost.

RAID 60 (RAID 6+0)

To carry out a RAID 60, a minimum of eight disks is needed.

The RAID 60 structure is as follows:

A RAID 6 is carried out on four units and another RAID 6 on the other four remaining units (And so with the rest of the sets, although there may be groups of 4 and 4 or 5 and 5, etc.). The resulting two or more volumes form a RAID 0.

Advantages: Robust system against data loss (tolerance to 4 disk failures)

Disadvantages: Low write performance and hardware cost.

Does a RAID system replace backups?

No, we strongly ask that you continue to make backup copies.

A RAID will help you to have constant access to your data in real time in case one of the hard drives fails (as long as it is not a RAID 0), something that an external backup cannot do.

But a RAID cannot protect you against other situations that could put your data at risk.

For example: the data will be stored on a physical hard drive connected to your server or device. In case of power failure, you will not be able to access them, while if you have a copy in the cloud, you will be able to access them from any device elsewhere.

Not to mention that there is a fire or something similar that irretrievably ends with your data if you do not have an external copy well away from the hard drives.

In addition, there are certain cyber attacks (what we call viruses) that can endanger the RAID data, this will make you have to resort to the backup copy that you will have in another place, disconnected from your device.

What do I need to use a RAID?

This system is used mainly in companies that use large data servers.

But you can also use it without having access to a server.

You will need a RAID controller that can be hardware or software.

It is very likely that your computer (unless it is about 10 years old) has a RAID controller (both software and hardware), and a certain number of hard drives (depending on the type of RAID).

Some people prefer to mix brands of drives in the RAID to minimize possible simultaneous drive failures. This is already a somewhat personal issue or perception of each one.

For homes or small businesses we also have NAS devices such as QNAP or SYNOLOGY systems based on RAID 0, RAID 1 or RAID 5 systems, easy to configure and NETWORK connectivity.