WD sharespace RAID 5 recovery

hello guys,

Hardware/setup WD Sharespace 4x2TB HDD Raid 5, latest firmware

I am confronted with the following situation:

stage 1:

via the web interface I discovered a “division by zero”- failure. Raid 5 fully active, all drives good. At that stage all files could be accessed. My first thought was shutdown and restart (shame on me!). Result NAS was up and running with raid failure.

stage 2:

not sure if the sharespace died, the config (which a have a copy from) smashed, or drive(s) gone. As I was unsure on what happens when restarted (worst scenario a new raid is being built up and destroys all data) I decided to take off the HDDs, take a set of emtpy ones to be sure that the sharespace is ok and Raid 5 is running with the new set of drives. This is done - meaning the machine is ok without failure. The logic would be a HDD failure. If 1 drive out of 4 causes a problem this still means no data loss under Raid 5.

stage 3:

before simply reinstalling the drives again I was looking for a method to recover the data. This link was of great help for me although this was my first time I used linux:

http://community.wdc.com/t5/WD-ShareSpace/HOWTO-Sharespace-RAID-5-Data-Recovery/td-p/287736

Setup: 4 Hdds directly connected via sata to PC, booted with ubuntu 10.04 Live CD

Here is the code:

ubuntu@ubuntu:~$ sudo su -
root@ubuntu:~# fdisk -l

Platte /dev/sda: 2000.4 GByte, 2000398934016 Byte
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder
Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x96563827

   Gerät boot. Anfang Ende Blöcke Id System
/dev/sda1 1 26 208844+ fd Linux raid autodetect
/dev/sda2 27 156 1044225 fd Linux raid autodetect
/dev/sda3 157 176 160650 fd Linux raid autodetect
/dev/sda4 177 243201 1952098312+ fd Linux raid autodetect

Platte /dev/sdc: 2000.4 GByte, 2000398934016 Byte
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder
Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xadbcc383

   Gerät boot. Anfang Ende Blöcke Id System
/dev/sdc1 1 26 208844+ fd Linux raid autodetect
/dev/sdc2 27 156 1044225 fd Linux raid autodetect
/dev/sdc3 157 176 160650 fd Linux raid autodetect
/dev/sdc4 177 243201 1952098312+ fd Linux raid autodetect

Platte /dev/sdb: 2000.4 GByte, 2000398934016 Byte
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder
Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x47839a94

   Gerät boot. Anfang Ende Blöcke Id System
/dev/sdb1 1 26 208844+ fd Linux raid autodetect
/dev/sdb2 27 156 1044225 fd Linux raid autodetect
/dev/sdb3 157 176 160650 fd Linux raid autodetect
/dev/sdb4 177 243201 1952098312+ fd Linux raid autodetect

Platte /dev/sdd: 2000.4 GByte, 2000398934016 Byte
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder
Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x6b42ef2f

   Gerät boot. Anfang Ende Blöcke Id System
/dev/sdd1 1 26 208844+ fd Linux raid autodetect
Partition 1 does not start on physical sector boundary.
/dev/sdd2 27 156 1044225 fd Linux raid autodetect
Partition 2 does not start on physical sector boundary.
/dev/sdd3 157 176 160650 fd Linux raid autodetect
Partition 3 does not start on physical sector boundary.
/dev/sdd4 177 243201 1952098312+ fd Linux raid autodetect

root@ubuntu:~# mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.

root@ubuntu:~# mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4 --force
mdadm: forcing event count in /dev/sdb4(1) from 5107212 upto 5107220
mdadm: forcing event count in /dev/sda4(3) from 5107212 upto 5107220
mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/sdb4
mdadm: clearing FAULTY flag for device 0 in /dev/md0 for /dev/sda4
mdadm: /dev/md0 has been started with 3 drives (out of 4) and 1 spare.

root@ubuntu:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sdd4[0] sdc4[4] sda4[3] sdb4[1]
      5855125824 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
      [>....................] recovery = 1.9% (38922880/1951708608) finish=14149.0min speed=2253K/sec
      
unused devices: <none>

root@ubuntu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Fri Feb 5 07:34:56 2010
     Raid Level : raid5
     Array Size : 5855125824 (5583.88 GiB 5995.65 GB)
  Used Dev Size : 1951708608 (1861.29 GiB 1998.55 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Sep 10 10:00:33 2012
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 2% complete

           UUID : 2550adaa:a2816e27:fd958af3:56e7459b
         Events : 0.5107222

    Number Major Minor RaidDevice State
       0 8 52 0 active sync /dev/sdd4
       1 8 20 1 active sync /dev/sdb4
       4 8 36 2 spare rebuilding /dev/sdc4
       3 8 4 3 active sync /dev/sda4

What I got so far (to my understanding)

  • data is not (yet) lost

  • a recovery seems to be in progress ending in 14.000 min ??! = 10 days?

My 2 questions:

a. any idea to speed up the recovery? ev. replace with a blank new drive with full procedere as shown above?

b. what about the idea to kill the recovery process, reassamble the sharespace with the 4 hdds in the right order and boot the whole thing?

Any other suggestions are highly appreciated. Thanx in advance.

if the recovery already started, leave it like that. It will take some time but you will get your files.

thank you ragdexx.

I just checked progress with cat /proc/mdstat. The result worries me a little bit: 10% after more than 20 hours. That gives another 9 days. Do you have any experience on this to keep me patient?

ok. After 4 days the rebuilding process finished.

cat /proc/mdstat:
               Personalities : [raid6] [raid5] [raid4] 

               …
               … [_U_U]

2 disks out. Got the feeling that the array cannot be rebuilt. Meanwhile I am goiing to accept the thoug the files are lost.  I hinki about to replace the faulty disk d with a brand new one, mount the 4 disks in original order back to the sharespace and pray.