Raid Server Failure Overview

Types Of RAID Failures

Adroit Data Recovery Centre has handled a few hundreds of RAID data recovery cases throughout the years.

A large number of users had been made into believing that RAID should not fail, as a result of over emphasis of RAID's fault tolerance functions or auto rebuilt functions. As a result, up to date backups are seldom performed when the data disaster nightmare unfolds.

RAID may be implemented by hardware or software -based method, differentiated by the presence or absence of a RAID controller, Basically, a number of independent hard disks are connected to form a single and often larger virtual volume. Depending on the RAID configuration, there may be an increase in simultaneous reading and writing of drives along with the fault tolerance feature.

Popular RAID manufacturers such as Mylex, Adaptec, Compaq, HP, IBM etc. promotes the idea of extended data availability and protection when a failed hard disk was detected. In a typical RAID 5 configuration, without even power off, the RAID controller could rebuild the data volume from a hot standby drive or a replacement drive through hot swap. The only time it will fail is when two disks failed simultaneously but such probability is one in a million! As a result, one may tend to believe that RAID can not fail.

The reality: RAID fails

In reality and to the surprise of most, RAID could fail and often fail. See some typical scenario below :

Scenario 1:
When one hard disk fails, very often, there is no hot standby. As a result, the raid array is running on degraded mode. While waiting for the replacement drive which may take a day or two, the likelihood of next drive failure disabling the raid volume is very high. It is reasonable to assume that all the drives in the array are from the same batch and subject to equal amount of working stress. So if one disk fails, the other is also near imminent failure and it often does.

 

Scenario 2:
Most raid server has a single controller. Its failure will result in catastrophic single point of failure.

 

Scenario 3:
Frequently, due to power surge, the controller or a number of disk elements could fail resulting in total loss of data. It is also found that a power surge may corrupt the RAID configuration setting of NVRAM in the controller card.

 

Scenario 4:
It is also commonly found that while replacing a faulty drive in an attempt to rebuild the raid volume to healthy state, wrong procedures are performed resulting in wrong or partial rebuild, or complete system breakdown upon completion of rebuild.

 

Scenario 5:
Not to forget that a RAID configuration with fault tolerance at best only intends to protect the physical failure, but not logical corruption such as system corruption, virus infection, or inadvertent deletion.

 

Types Of RAID failures

To summarize, RAID server often fails as a result of the following situations and frequently, a combination of them :

Malfunctioned Controller Missing RAID partition Power Surge
Data Deletion or reformat Virus Attack Inadvertent reconfiguration of RAID volume
Raid rebuild error or volume reconstruction problem
Multiple disk failure in off-line state resulting in loss of RAID volume
Wrong replacement of good disk element belonging to a working raid volume
Loss of RAID disk access after system or application upgrade
Loss of RAID configuration settings or system registry

In case you have a RAID server failure, you may want to read the emergency RAID Rescue guide before sending the disk to us for data recovery.