RAID - redundant array of independent disks
A RAID system consists of two or more disks working in parallel. These disks can be hard discs but there is a trend to also use the technology for solid state drives. There are different RAID levels, each optimized for a specific situation. These are not standardized by an industry group or standardisation committee. This explains why companies sometimes come up with their own unique numbers and implementations.

RAID Data Recovery The majority of small-to-medium businesses across the globe have turned to RAID-configured systems for their storage solutions. The most frequently cited reasons for utilizing RAID Arrays in businesses today are the highly fault-tolerant level the solution offers and the cost effectiveness of acquisition and maintenance.

However, if a RAID Array fails as a result of component malfunctions (including hard drives and controller cards) or operating and application corruption, it leaves the data unusable and in most cases corrupted.

RAID data recovery is an intricate task since RAID data configurations often have different data layouts depending on manufacturers – often for competitive reasons. Without an in-depth knowledge of how RAID arrays are configured at hardware, firmware and software levels, data recovery attempts will not only fail, but result in further data corruption.

Using our vast knowledge of RAID Array storage technology, We can successfully recover data from the very earliest to most recent NAS, SAN and Server RAID configurations in the market. RAID servers and configurations supported include:

RAID drives 

Popular RAID Configurations

Array Levels       Array Hard Drive Types       RAID Data Recovery for Server Makes and Media
  • RAID 0
  • RAID 0 1
  • RAID 1
  • RAID 1E
  • RAID 3
  • RAID 4
  • RAID 5
  • RAID 5E
  • RAID 5EE
  • RAID 6
  • RAID 10
  • RAID 50
  • RAID 51
  • RAID 60
  • SAS
  • SCSI
  • Fibre Channel
  • FCoE
  • AoE
  • iSCSI
  • eSATA
  • All ProLiant Series
  • All PowerEdge Series
  • IBM xSeries, Power Series ( AIX, Linux) and storage subsystems
  • All Intel and AMD product lines
  • SAN and NAS based RAID Arrays and standalone storage systems
  • EMC and NetApp

Why RAID Arrays Fail

There are four general categories around failure: hardward RAID failure, human error, software RAID failure, and application failure.

Hardware RAID Failure Human Error Software RAID Failure Application Failure
Actuator Failure
Bad sectors
Controller Failure
Controller Malfunction Corrupted RAID Config
Lightning, Flood and Fire Damage
Damaged Motor
Drive physical abuse
Hard disk component failure and crashes
Hard disk drive component failure
Hard drive crashes
Hard drive failure
Head Crash
Intermittent drive failure
Media Damage
Media surface
Multiple drive failure
Power Spike
Power Supply Burn out or failure
RAID controller failure
RAID corruption
RAID disk failure
RAID disk overheat
RAID drive incompatibility
RAID drive overheat
RAID Array failed
Vibration damage
Unintended deletion of files
Reformatting of drives / Array
Reformatting of partitions
Incorrect replacement of media components
Accidentally deleted records
Mistaken overwritten database files
Employee sabotage
Lost/Forgotten password
Overwritten files
Overwritten RAID config files
Overwritten RAID settings
RAID incorrect setup
RAID user error
Back up failures
Computer virus and wormDamaged Motor damage
Corrupt files / data Damaged files or folders
Directory corruption
Firmware corruption Repartition
Server registry configuration
Missing partitions
RAID configuration
Applications that are unable to run or load files
Corrupted files
Corrupted database files
Data corrupted
Locked databases
preventing access
Deleted tables


(block-level striping without parity or mirroring) has no (or zero) redundancy. It provides improved performance and additional storage but no fault tolerance. Any drive failure destroys the array, and the likelihood of failure increases with more drives in the array.

RAID level 0 – Striping

In a RAID 0 system data are split up in blocks that get written across all the drives in the array. By using multiple disks (at least 2) at the same time, this offers superior I/O performance. This performance can be enhanced further by using multiple controllers, ideally one controller per disk.


  • RAID 0 offers great performance, both in read and write operations. There is no overhead caused by parity controls.
  • All storage capacity is used, there is no disk overhead.
  • The technology is easy to implement.


RAID 0 is not fault-tolerant. If one disk fails, all data in the RAID 0 array are lost. It should not be used on mission-critical systems.

Ideal use

RAID 0 is ideal for non-critical storage of data that have to be read/written at a high speed, such as on a Photoshop image retouching station.



In RAID 1 (mirroring without parity or striping), data is written identically to two drives, thereby producing a "mirrored set"; the read request is serviced by either of the two drives containing the requested data, whichever one involves least seek time plus rotational latency. Similarly, a write request updates the stripes of both drives. The write performance depends on the slower of the two writes (i.e. the one that involves larger seek time and rotational latency). At least two drives are required to constitute such an array. While more constituent drives may be employed, many implementations deal with a maximum of only two. The array continues to operate as long as at least one drive is functioning.

RAID level 1 – Mirroring

Data are stored twice by writing them to both the data disk (or set of data disks) and a mirror disk (or set of disks) . If a disk fails, the controller uses either the data drive or the mirror drive for data recovery and continues operation. You need at least 2 disks for a RAID 1 array.

RAID 1 systems are often combined with RAID 0 to improve performance. Such a system is sometimes referred to by the combined number: a RAID 10 system.


  • RAID 1 offers excellent read speed and a write-speed that is comparable to that of a single disk.
  • In case a disk fails, data do not have to be rebuild, they just have to be copied to the replacement disk.
  • RAID 1 is a very simple technology.


  • The main disadvantage is that the effective storage capacity is only half of the total disk capacity because all data get written twice.
  • Software RAID 1 solutions do not always allow a hot swap of a failed disk (meaning it cannot be replaced while the server keeps running). Ideally a hardware controller is used.

Ideal use

RAID-1 is ideal for mission critical storage, for instance for accounting systems. It is also suitable for small servers in which only two disks will be used.



In RAID 2 (bit-level striping with dedicated Hamming-code parity), all disk spindle rotation is synchronized, and data is striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive.[5] This theoretical RAID level is not used in practice.



In RAID 3 (byte-level striping with dedicated parity), all disk spindle rotation is synchronized, and data are striped so each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive.[5] Although implementations exist,[9] RAID 3 is not commonly used in practice.

RAID level 3

On RAID 3 systems, data blocks are subdivided (striped) and written in parallel on two or more drives. An additional drive stores parity information. You need at least 3 disks for a RAID 3 array.

Since parity is used, a RAID 3 stripe set can withstand a single disk failure without losing data or access to data.


  • RAID-3 provides high throughput (both read and write) for large data transfers.
  • Disk failures do not significantly slow down throughput.


  • This technology is fairly complex and too resource intensive to be done in software.
  • Performance is slower for random, small I/O operations.

Ideal use

RAID 3 is not that common in prepress.



(block-level striping with dedicated parity) is equivalent to RAID 5 (see below) except that all parity data are stored on a single drive. In this arrangement files may be distributed among multiple drives. Each drive operates independently, allowing I/O requests to be performed in parallel.[citation needed]

RAID 4 was previously used primarily by NetApp, but has now been largely replaced by an implementation of RAID 6 (RAID-DP).[10]


(block-level striping with distributed parity) distributes parity along with the data and requires all drives but one to be present to operate; the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. RAID 5 requires at least three disks.

RAID level 5

RAID 5 is the most common secure RAID level. It is similar to RAID-3 except that data are transferred to disks by independent read and write operations (not in parallel). The data chunks that are written are also larger. Instead of a dedicated parity disk, parity information is spread across all the drives. You need at least 3 disks for a RAID 5 array.
A RAID 5 array can withstand a single disk failure without losing data or access to data. Although RAID 5 can be achieved in software, a hardware controller is recommended. Often extra cache memory is used on these controllers to improve the write performance.


Read data transactions are very fast while write data transaction are somewhat slower (due to the parity that has to be calculated).


  • Disk failures have an effect on throughput, although this is still acceptable.
  • Like RAID 3, this is complex technology.

Ideal use

RAID 5 is a good all-round system that combines efficient storage with excellent security and decent performance. It is ideal for file and application servers.



(block-level striping with double distributed parity) provides fault tolerance up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems. This becomes increasingly important as large-capacity drives lengthen the time needed to recover from the failure of a single drive. Like RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt.[5]

 RAID 10

In RAID 10, often referred to as RAID 1+0 (mirroring and striping), data is written in stripes across primary disks that have been mirrored to the secondary disks.

RAID level 10 – Combining RAID 0 & RAID 1

RAID 10 combines the advantages (and disadvantages) of RAID 0 and RAID 1 in one single system. It provides security by mirroring all data on a secondary set of disks (disk 3 and 4 in the drawing below) while using striping across each set of disks to speed up data transfers.

What about RAID levels 2, 4, 6 and 7?

These levels do exist but are not that common, at least not in prepress environments. This is just a simple introduction to RAID-system. You can find more in-depth information on the pages of wikipedia or ACNC.

RAID is no substitute for back-up!

All RAID levels except RAID 0 offer protection from a single drive failure. A RAID 6 system even survives 2 disks dying simultaneously. For complete security you do still need to back-up the data from a RAID system.

  • That back-up will come in handy if all drives fail simultaneously because of a power spike.
  • It is a safeguard if the storage system gets stolen.
  • Back-ups can be kept off-site at a different location. This can come in handy if a natural disaster or fire destroys your workplace.
  • The most important reason to back-up multiple generations of data is user error. If someone accidentally deletes some important data and this goes unnoticed for several hours, days or weeks, a good set of back-ups ensure you can still retrieve those files.


The following table provides an overview of some considerations for standard RAID levels. In each case:

  • Array space efficiency is given as an expression in terms of the number of drives, ; this expression designates a fractional value between zero and one, representing the fraction of the sum of the drives' capacities that is available for use. For example, if three drives are arranged in RAID 3, this gives an array space efficiency of  thus, if each drive in this example has a capacity of 250 GB, then the array has a total capacity of 750 GB but the capacity that is usable for data storage is only 500 GB.
  • Array failure rate is given as an expression in terms of the number of drives, , and the drive failure rate,  (which is assumed to be identical and independent for each drive). For example, if each of three drives has a failure rate of 5% over the next three years, and these drives are arranged in RAID 3, then this gives an array failure rate of  over the next 3 years.
Level Description Minimum # of drives** Space efficiency Fault tolerance Array failure rate*** Read performance Write performance
RAID 0 Block-level striping without parity or mirroring 2 1 0 (none) 1−(1−r)n nX nX
RAID 1 Mirroring without parity or striping 2 1/n n−1 drives rn nX***** 1X
RAID 2 Bit-level striping with dedicated Hamming-code parity 3 1 − 1/n ⋅ log2(n-1) RAID 2 can recover from one drive failure or repair corrupt data or parity when a corrupted bit's corresponding data and parity are good. Variable Variable Variable
RAID 3 Byte-level striping with dedicated parity 3 1 − 1/n 1 drive 1-(1-r)(n-2)/2 (n−1)X (n−1)X*
RAID 4 Block-level striping with dedicated parity 3 1 − 1/n 1 drive 1-(1-r)(n-2)/2 (n−1)X (n−1)X*
RAID 5 Block-level striping with distributed parity 3 1 − 1/n 1 drive 1-(1-r)(n-2)/2 (n−1)X* (n−1)X*
RAID 6 Block-level striping with double distributed parity 4 1 − 2/n 2 drives 1-(1-r)(n-3)/2 (n−2)X* (n−2)X*
RAID 10 Mirroring without parity, and block-level striping 4 2/n 1 drive / span **** nX (n/2)X
Level Description Minimum # of drives** Space efficiency Fault tolerance Array failure rate*** Read performance Write performance

* Assumes hardware is fast enough to support
** Assumes a non-degenerate minimum number of drives
*** Assumes independent, identical rate of failure amongst drives
**** Raid 10 can only lose 1 drive per span up to the max of 2/n drives
***** Theoretical maximum, as low as 1X in practice

Nested (hybrid) RAID

In what was originally termed hybrid RAID,[11] many storage controllers allow RAID levels to be nested. The elements of a RAID may be either individual drives or RAIDs themselves. However, if a RAID is itself an element of a larger RAID, it is unusual for its elements to be themselves RAIDs.

As there is no basic RAID level numbered larger than 9, nested RAIDs are usually clearly described by attaching the numbers indicating the RAID levels, sometimes with a "+" in between. The order of the digits in a nested RAID designation is the order in which the nested array is built: For a RAID 1+0, drives are first combined into multiple level 1 RAIDs that are themselves treated as single drives to be combined into a single RAID 0; the reverse structure is also possible (RAID 0+1).[citation needed]

The final RAID is known as the top array. When the top array is a RAID 0 (such as in RAID 1+0 and RAID 5+0), most vendors omit the "+" (yielding RAID 10 and RAID 50, respectively).

  • RAID 0+1: striped sets in a mirrored set (minimum four drives; even number of drives) provides fault tolerance and improved performance but increases complexity.
The key difference from RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set. The array continues to operate with one or more drives failed in the same mirror set, but if drives fail on both sides of the mirror the data on the RAID system is lost.
  • RAID 1+0: (a.k.a. RAID 10) mirrored sets in a striped set (minimum four drives; even number of drives) provides fault tolerance and improved performance but increases complexity.
The key difference from RAID 0+1 is that RAID 1+0 creates a striped set from a series of mirrored drives. The array can sustain multiple drive losses so long as no mirror loses all its drives.[12]
  • RAID 5+3: mirrored striped set with distributed parity (some manufacturers label this as RAID 53)[citation needed]

RAID parity

Many RAID levels employ an error protection scheme called "parity", a widely used method in information technology to provide fault tolerance in a given set of data. Most use the simple XOR parity described in this section, but RAID 6 uses two separate parities based respectively on addition and multiplication in a particular Galois Field or Reed–Solomon error correction.[13]

Non-standard levels

Many configurations other than the basic numbered RAID levels are possible, and many companies, organizations, and groups have created their own non-standard configurations, in many cases designed to meet the specialized needs of a small niche group. Most non-standard RAID levels are proprietary:

  • Linux MD RAID10 (RAID 10) implements a general RAID driver that defaults to a standard RAID 1 with two drives, and a standard RAID 1+0 with four drives, but can have any number of drives, including odd numbers. MD RAID 10 can run striped and mirrored, even with only two drives with the f2 layout (mirroring with striped reads, giving the read performance of RAID 0; normal Linux software RAID 1 does not stripe reads, but can read in parallel).[12][14][15]
  • Hadoop has a RAID system that generates a parity file by xor-ing a stripe of blocks in a single HDFS file.[16]

Data backup

A RAID system used as secondary storage is not an alternative to backing up data. In RAID levels > 0, a RAID protects from catastrophic data loss caused by physical damage or errors on a single drive within the array (or two drives in, say, RAID 6). However, a true backup system has other important features such as the ability to restore an earlier version of data, which is needed both to protect against software errors that write unwanted data to secondary storage, and also to recover from user error and malicious data deletion. A RAID can be overwhelmed by catastrophic failure that exceeds its recovery capacity and, of course, the entire array is at risk of physical damage by fire, natural disaster, and human forces, while backups can be stored off-site. A RAID is also vulnerable to controller failure because it is not always possible to migrate a RAID to a new, different controller without data loss.[17]


The distribution of data across multiple drives can be managed either by dedicated computer hardware or by software. A software solution may be part of the operating system, or it may be part of the firmware and drivers supplied with a hardware RAID controller.

Software-based RAID

Software RAID

implementations are now provided by many operating systems. Software RAID can be implemented as:

  • A layer that abstracts multiple devices, thereby providing a single virtual device (e.g. Linux's md)
  • A more generic logical volume manager (provided with most server-class operating systems, e.g. Veritas or LVM)
  • A component of the file system (e.g. ZFS or Btrfs)

Volume manager support

Server class operating systems typically provide logical volume management, which allows a system to use logical[jargon] volumes which can be resized or moved. Often, features like RAID or snapshots are also supported.

  • Vinum is a logical volume manager supporting RAID 0, RAID 1, and RAID 5. Vinum is part of the base distribution of the FreeBSD operating system, and versions exist for NetBSD, OpenBSD, and DragonFly BSD.
  • Solaris SVM supports RAID 1 for the boot filesystem, and adds RAID 0 and RAID 5 support (and various nested combinations) for data drives.
  • Linux LVM supports RAID 0 and RAID 1.
  • HP's OpenVMS provides a form of RAID 1 called "Volume shadowing", giving the possibility to mirror data locally and at remote cluster systems.

[edit] File-system support

Some advanced file systems are designed to organize data across multiple storage devices directly (without needing the help of a third-party logical volume manager).

  • ZFS supports equivalents of RAID 0, RAID 1, RAID 5 (RAID Z), RAID 6 (RAID Z2) and a triple parity version RAID Z3 and any nested combination of those like 1+0. ZFS is the native file system on Solaris and also available on FreeBSD.
  • Btrfs supports RAID 0, RAID 1 and RAID 10 (RAID 5 and 6 are under development).

Operating-system support

Many operating systems provide basic RAID functionality independently of volume management:

  • Apple's OS X and OS X Server support RAID 0, RAID 1, and RAID 1+0.[18][19]
  • FreeBSD supports RAID 0, RAID 1, RAID 3, and RAID 5, and all nestings via GEOM modules and ccd.[20][21][22]
  • Linux's md supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6, and all nestings.[23][24] Certain reshaping/resizing/expanding operations are also supported.[25]
  • Microsoft's server operating systems support RAID 0, RAID 1, and RAID 5. Some of the Microsoft desktop operating systems support RAID. For example, Windows XP Professional supports RAID level 0, in addition to spanning multiple drives, but only if using dynamic disks and volumes. Windows XP can be modified to support RAID 0, 1, and 5.[26] Windows 8 and Windows Server 2012 introduces a RAID-like feature known as Storage Spaces, which also allows users to specify mirroring, parity, or no redundancy on a folder-by-folder basis.[27]
  • NetBSD supports RAID 0, 1, 4, and 5 via its software implementation, named RAIDframe.[28]

Over time, the increase in commodity CPU speed has been consistently greater than the increase in drive throughput;[29] the percentage of host CPU time required to saturate a given number of drives has decreased. For instance, under 100% usage of a single core on a 2.1 GHz Intel "Core2" CPU, the Linux software RAID subsystem (md) as of version 2.6.26 is capable of calculating parity information at 6 GB/s; however, a three-drive RAID 5 array using drives capable of sustaining a write operation at 100 MB/s only requires parity to be calculated at the rate of 200 MB/s, which requires the resources of just over 3% of a single CPU core.

Another concern with software implementations is the process of booting the associated operating system. For instance, consider a computer being booted from a RAID 1 (mirrored drives); if the first drive in the RAID 1 fails, then a first-stage boot loader might not be sophisticated enough to attempt loading the second-stage boot loader from the second drive as a fallback. The second-stage boot loader for FreeBSD is capable of loading a kernel from a RAID 1.[30]

Hardware-based RAID

Hardware RAID

controllers use proprietary data layouts, so it is not usually possible to span controllers from different manufacturers.[citation needed] They do not require processor resources, the BIOS can boot from them, and tighter integration with the device driver may offer better error handling.

On a desktop system, a hardware RAID controller may be an expansion card connected to a bus (e.g. PCI or PCIe), a component integrated into the motherboard; there are controllers for supporting most types of drive technology, such as IDE/ATA, SATA, SCSI, SSA, Fibre Channel, and sometimes even a combination. The controller and drives may be in a stand-alone enclosure, rather than inside a computer, and the enclosure may be directly attached to a computer, or connected via a SAN.

Firmware/driver-based RAID

A RAID implemented at the level of an operating system is not always compatible with the system's boot process, and it is generally impractical for desktop versions of Windows (as described above). However, hardware RAID controllers are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do not contain a dedicated RAID controller chip, but simply a standard drive controller chip with special firmware and drivers; during early stage bootup, the RAID is implemented by the firmware, and once the operating system has been more completely loaded, then the drivers take over control. Consequently, such controllers may not work when driver support is not available for the host operating system.[31]

Data scrubbing / Patrol read

Data scrubbing is periodic reading and checking by the RAID controller of all the blocks in a RAID, including those not otherwise accessed. This allows bad blocks to be detected before they are used.[32]

An alternate name for this is patrol read. This is defined as a check for bad blocks on each storage device in an array, but which also uses the redundancy of the array to recover bad blocks on a single drive and reassign the recovered data to spare blocks elsewhere on the drive.[33]

Problems with RAID

Correlated failures

In practice, the drives are often the same age (with similar wear) and subject to the same environment. Since many drive failures are due to mechanical issues (which are more likely on older drives), this violates those assumptions; failures are in fact statistically correlated.[5] In practice, the chances of a second failure before the first has been recovered (causing data loss) is not as unlikely as four random failures. In a study including about 100,000 drives, the probability of two drives in the same cluster failing within one hour was observed to be four times larger than was predicted by the exponential statistical distribution which characterizes processes in which events occur continuously and independently at a constant average rate. The probability of two failures within the same 10-hour period was twice as large as that which was predicted by an exponential distribution.[34]

A common assumption is that "server-grade" drives fail less frequently than consumer-grade drives. Two independent studies (one by Carnegie Mellon University and the other by Google) have shown that the "grade" of a drive does not relate to the drive's failure rate.[35][36]

Unrecoverable Read Errors (URE) during rebuild

Unrecoverable Read Errors present as sector read failures. The UBE (Unrecoverable Bit Error) rate is typically specified at 1 bit in 1015 for enterprise class drives (SCSI, FC, SAS), and 1 bit in 1014 for desktop class drives (IDE/ATA/PATA, SATA). Increasing drive capacities and large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a RAID group after a drive failure because an unrecoverable sector is found on the remaining drives.[5][37] Parity schemes such as RAID 5 when rebuilding are particularly prone to the effects of UREs as they will affect not only the sector where they occur but also reconstructed blocks using that sector for parity computation; typically an URE during a RAID 5 rebuild will lead to a complete rebuild failure.[38]

Double protection schemes such as RAID 6 are attempting to address this issue, but suffer from a very high write penalty. Non-parity (mirrored) schemes such as RAID 10 have a lower risk from UREs.[39] Background scrubbing can be used to detect and recover from UREs (which are latent and invisibly compensated for dynamically by the RAID controller) as a background process, by reconstruction from the redundant RAID data and then re-writing and re-mapping to a new sector; and so reduce the risk of double-failures to the RAID system[40][41] (see Data scrubbing above).

 Recovery time is increasing

Drive capacity has grown at a much faster rate than transfer speed, and error rates have only fallen a little in comparison. Therefore, larger capacity drives may take hours, if not days, to rebuild. The re-build time is also limited if the entire array is still in operation at reduced capacity.[42] Given a RAID with only one drive of redundancy (RAIDs 3, 4, and 5), a second failure would cause complete failure of the array. Even though individual drives' mean time between failure (MTBF) have increased over time, this increase has not kept pace with the increased storage capacity of the drives. The time to rebuild the array after a single drive failure, as well as the chance of a second failure during a rebuild, have increased over time.[43] Mirroring schemes such as RAID 10 have a bounded recovery time as they require the copy of a single failed drive, compared with parity schemes such as RAID 6 which require the copy of all blocks of the drives in an array set. Triple parity schemes, or triple mirroring, have been suggested as one approach to improve resilience to an additional drive failure during this large rebuild time.[44]

 Atomicity: including parity inconsistency due to system crashes

A system crash or other interruption of a write operation can result in states where the parity is inconsistent with the data due to non-atomicity of the write process, such that the parity cannot be used for recovery in the case of a disk failure (the so-called RAID 5 write hole).[5]

This is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple" during the early days of relational database commercialization.[45]

 RAID write hole

The RAID write hole is a known data corruption issue in older and low-end RAIDs, caused by interrupted destaging of writes to disk.[46]

Write cache reliability

A concern about write cache reliability exists, specifically regarding devices equipped with a write-back cache—a caching system which reports the data as written as soon as it is written to cache, as opposed to the non-volatile medium.[47]

Drive error recovery algorithms

Many modern drives have internal error recovery algorithms that can take upwards of a minute to recover and re-map data that the drive fails to read easily. Frequently, a RAID controller is configured to drop a component drive (that is, to assume a component drive has failed) if the drive has been unresponsive for 8 seconds or so; this might cause the array controller to drop a good drive because that drive has not been given enough time to complete its internal error recovery procedure. Consequently, desktop drives can be quite risky when used in a RAID, and so-called enterprise class drives limit this error recovery time in order to obviate the problem.

A fix specific to Western Digital's desktop drives used to be known: A utility called WDTLER.exe could limit a drive's error recovery time; the utility enabled TLER (time limited error recovery), which limits the error recovery time to 7 seconds. Around September 2009, Western Digital disabled this feature in their desktop drives (e.g. the Caviar Black line), making such drives unsuitable for use in a RAID.[48]

However, Western Digital enterprise class drives are shipped from the factory with TLER enabled. Similar technologies are used by Seagate, Samsung, and Hitachi. Of course, for non-RAID usage, an enterprise class drive with a short error recovery timeout that cannot be changed is therefore less suitable than a desktop drive.[48]

In late 2010, the Smartmontools program began supporting the configuration of ATA Error Recovery Control, allowing the tool to configure many desktop class hard drives for use in a RAID.[48]

Scenarios other than disk failure

While RAID may protect against physical drive failure, the data are still exposed to operator, software, hardware, and virus destruction. Many studies cite operator fault as the most common source of malfunction,[49] such as a server operator replacing the incorrect drive in a faulty RAID, and disabling the system (even temporarily) in the process.[50]

 RAID 5 in enterprise environments

Rebuilding a RAID 5 array after a failure will add additional stress to all of the working drives, because every area on every disc marked as being "in use" must be read to rebuild the redundancy that has been lost. If drives are close to failure, the stress of rebuilding the array can be enough to cause another drive to fail before the rebuild has been finished, and even more so if the server is still accessing the drives to provide data to clients, users, applications, etc. Even without complete loss of an additional drive during rebuild, an unrecoverable read error (URE) is likely for large arrays which will typically lead to a failed rebuild.[37] Thus, it is during this rebuild of the "missing" drive that the entire RAID 5 array is at risk of a catastrophic failure. The rebuild of an array on a busy and large system can take hours and sometimes days.[37] Therefore, it is not surprising that, when systems need to be highly available and highly reliable or fault tolerant, other levels, including RAID 6 or RAID 10, are chosen.[37]

With a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5.[37] RAID 10 also minimises these problems.[39]

As of August 2012, Dell, Hitachi, Seagate, Netapp, EMC, HDS, SUN Fishworks and IBM have current advisories against the use of RAID 5 with high capacity drives and in large arrays.[51]