How long should a RAID 5 rebuild take?

OK, so I have an EX4 with 4 x 3TB RED disks in it.

All used and configured in a RAID 5 setup.

One disk failed, so I was able to send it away for replacement.  It took about 2 weeks which wasn’t too bad because I could continue to use the NAS without it.  I had access to all my data, no problems… well done RAID 5.

So, now I have the new replacement disk returned to me.

I insert it in the bay (bay 3) and off it goes rebuilding the RAID volume.

Message on the LCD saying the system is busy, and message on the dashboard saying the system is busy and to waid until the LED turns solid blue before inserting another drive.  

It stays like this for 2 days… and then the LCD just shows the name of my NAS, but the blue LED is still flickering.  The dashboard still tells me to wait for the LED to stop flickering before adding another disk.

I don’t want to add another disk.  I just want to use my NAS and access my data.  I don’t really understand why I can use it without the drive in there, but as soon as I insert a new drive and it starts to rebuild the volume, I cannot access any of my data.  I thought the idea of hot-swap drives was that you could swap them over without turning the system off and keep working.

Any idea how long it will take before I can access my data?

thanks,

lew66.

In my limited experience, 2 days does seem excessive. However, I am just going through a similar experience but didn’t stall like your appears to have done. I had a three disk RAID 5 system running successfully. Upon adding a fourth and final disk (I did power off first but I don’t know if this was necessary). Anyway, when I powered back up, it did ‘chunter’ away for a while but then settled down.

Even though I’d inserted a new disk the capacity figures didn’t change. Upon advisement from this forum, I enetred RAID 5 option in ‘Storage’ and chose to ‘expand’. It reported that the process would take about 20 hours which I understand is a typical length of time.

So sadly, I can’t offer advice but only relate my experience. I used bays 1,2 and 3 originally and added to the fourth.

Thanks for your reply…bob, but we are into day 4 now.

I haven’t been able to get into the interface, because everytime I sign onto it, it just tells me that the system is busy and to wait until the system light is steady blue before adding another disk.

Reading some other posts, it appears that it might take 6 days… but again, I don’t see why I cannot access my data while it’s rebuilding if I could when the faulty disk wasn’t even in there.

If anyone has any advice, please let me know.  I don’t want to turn it off now in case it’s in the middle of something and I lose data.

I’ll keep reporting back as the days tick by to let you all know.

cheers…

OK… now into day 5.

System light still flickering but LCD still shows no error only displaying my NAS name.

Try to get into the interface still comes up with the same “please wait…” message which means I cannot get into the system to check anything or change any settings.

Trying to log a call with WD but am now getting a “an error was encountered during a knowledge base search” message when I try to create a service call.

Cannot access any of my data still and cannot log a service call… HELP!

Did you take a backup prior to the re-build If not, then you have nothing to lose form waiting some more time, but the time is clearly excessive.  I would try to SSH to the unit and see that it is doing with a TOP command.

If you have taken a backup I’d nuke it and start again.

Hi skiwi, thanks for replying.

Unfortunately, I didn’t take a backup before the rebuild… I really didn’t think I needed one.  I was just inserting a new disk into a hot swap bay…I will wait and keep everyone posted.

I’ll try to ssh to the unit and see what it’s doing

here’s what it says … hope it makes sense to someone out there…

The "Disks: written (BOLD underlined) keeps increasing… maybe I just need to wait until it gets to 15487799???

Processes: 192 total, 2 running, 12 stuck, 178 sleeping, 1070 threads 15:51:22
Load Avg: 1.78, 1.80, 1.87 CPU usage: 14.73% user, 12.56% sys, 72.70% idle
SharedLibs: 71M resident, 0B data, 5140K linkedit.
MemRegions: 62960 total, 846M resident, 21M private, 180M shared.
PhysMem: 4069M used (1485M wired), 25M unused.
VM: 469G vsize, 1353M framework vsize, 122476568(0) swapins, 123459597(0) swapou
Networks: packets: 121084380/110G in, 122536214/54G out.
Disks: 15487799/683G read, 10457297/618G written.

PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP
83214 top 2.6 00:03.31 1/1 0 19 1984K 0B 216K 83213
83213 login 0.0 00:00.04 2 0 27 292K 0B 800K 83213
83211 CVMCompiler 0.0 00:00.38 2 1 31 12M 0B 4732K 83211
83207 mdworker 0.0 00:00.05 4 0 53 1384K 0B 856K 83207
83204 coresymbolic 0.0 00:00.02 2 1 21 508K 0B 376K 83204
83203 spindump_age 0.0 00:00.01 2 1 31 24K 0B 824K 83203
83202 spindump 0.0 00:00.75 2 1 43 6192K 0B 14M 83202
83198 amfid 0.0 00:00.08 2 1 26 8192B 0B 2100K 83198
83186 mdworker 0.0 00:00.21 4 0 52 24K 0B 5436K 83186
83181 mdworker 0.0 00:00.27 5 1 54 660K 0B 4712K 83181
83177 mdworker 0.0 00:00.07 4 0 53 52K 0B 2116K 83177
83176 com.apple.GS 0.0 00:00.01 3 2 22 8192B 0B 792K 83176
83172 deleted 0.0 00:00.02 2 1 33 8192B 0B 1200K 83172
83147 bash 0.0 00:00.04 1 0 15 8192B 0B 640K 83147

Guys… it looks like I was actually seeing TOP on my mac not the EX4… sorry for the confusion.

I get a connection refused on the EX4, so not sure if it’s just busy rebuilding or if it’s dead in the water and needs a kick in the guts.

All the drive lights are solid blue, so I would think if it is rebuilding a RAID volume they would all be flashing like crazy.

The only LED flashing is the system one (top left).

Hi All,

Here’s an update on day 6.

Last night I figured it wasn’t doing anything and I couldn’t access my data anyway, so I powered down the unit by holding the power button down.

The message came up that the system was shutting down… I waited and it did.

I left it for a few minutes to let everyone catch their breath and then I turned it on.

It went through it’s normal checks an fired up all 4 drives.

The system light was now red and flashing and the “please wait” message was gone from the dashboard.

I could finally get into the dashboard, and it said the volume was degraded.  

I figure that’s because it is still rebuilding the RAID.

Anyway, I tried to access the data and I could, so I left it to rebuild.

This morning, the system light is still red and flashing, it still says volume degraded but all the drives appear to be working and I can access my data.

Let’s see how long this one takes… I’ll keep you all posted.

lew66.

OK, I think I’ve found the problem…

shellingye posted a while ago with what looks like the same issue…

It seems like if an auto rebuild doesn’t finish because the NAS goes to sleep for example, it will never finish.

Or to quote…

"I think WDEx4 auto rebuilding may have some bug, when it rebuilding failed, it will not try to fix,

and because the disk have a partion on it already,  so it will never try to rebuilding again.

And it will not show “Manually rebuild“ option also."

I have the same issue… manual rebuild option is not available, the size of the RAID volume doesn’t include the 4th drive, and the system still reports as volume degraded - see dashboard, but is not doing anything to fix it.

Shellingye fixed it by running some manual commands, but I’m not a linux guru so I don’t know what I need to do.

I’ll paste their solution below and if someone can please help me with the commands for my scenario to manually rebuild the RAID volume with the new disk in bay 3, I’d really appreciate it…

If I’m at the ssh prompt, what exactly do I type?

QUOTE…

I have solved this problem by my self

I use ssh login the system , and rebuild the raid by my self, the command I used:

mdadm --remove  /dev/md0 /dev/sdd1 (my broken disk is bay4, remove it from md0 array, else it will report disk is busy)

dd if=/dev/zero of=/dev/sdb2 (clean the disk, I’m not find fdisk command on WDEx4)

mdadm --add /dev/md1 /dev/sdd2 (add the disk to raid array md1, rebuilding will start)

Now that you can access the unit TAKE A BACKUP.

Once that is done, you can get down and dirty with SSH, but the potential for failure is there so TAKE A BACKUP.

WRT the commands posted, in short:

  1. the first “mdadm” removes the “ssd1” disk from the software RAID array (md0)
  2. The dd command copies /dev/zero to /dev/sdb2.  This is a kludge that uses a special Unix convention.  /dev/zero is basically an endless bucket of null’s. It clears /sdb2 in other words.
  3. the 3rd command adds /sdb2 to the array

SSH to the unit and type in “mdadm --detail /dev/md0” and post what is reported back to you.

Also, “watch cat /proc/mdstat” will show you what is happening in the rebuild process.

1 Like

@skiwi, you are awesome…

Ok, so I take it I should take a backup… only kidding, backup running now, so it might be awhile before I can actually do anything.

Couple of quickies since you’ve been so kind as to educate me…

If “ssd1” is the disk that you remove from the RAID array (md0), what is “sdb2” that you are adding?

I ran the command and this is what I got…

/dev/md0:

        Version : 00.90.03

  Creation Time : Tue Aug 11 18:34:06 2015

     Raid Level : raid1

     Array Size : 2097088 (2048.28 MiB 2147.42 MB)

    Device Size : 2097088 (2048.28 MiB 2147.42 MB)

   Raid Devices : 4

  Total Devices : 3

Preferred Minor : 0

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Aug 12 20:36:13 2015

          State : active, degraded

Active Devices : 3

Working Devices : 3

Failed Devices : 0

  Spare Devices : 0

           UUID : dc4ea6c7:50945712:39bd1eec:51f26edc

         Events : 0.5253

    Number   Major   Minor   RaidDevice State

       0       8        1        0      active sync   /dev/sda1

       1       8       17        1      active sync   /dev/sdb1

       2       8       49        2      active sync   /dev/sdd1

       3       0        0        3      removed

thanks heaps…

Why does everything have to be so hard?

Now that I have access to my data, I thought I’d follow @skiwi’s advice and take a backup.

so I plug in a USB drive and set up a couple of backup jobs… one works (eventually), the other fails.  OK I think maybe there is insufficient room on the USB disk.  I disconnect it and plug it into my mac to start deleting old files.

Now the USB drive is read only… what the???  How can running a backup job from the EX4 make this external drive read only???

Tried everything I could and couldn’t delete anything, so finally I plug it back into the EX4 and from Finder, I can get to the USB disk and I start deleting… it does a count of files to delete from each folder, and then just sits there… it is deleting but VERY slowly… about 50GB deleted in about 2 hours!!!

Any idea how I might be able to speed this up?  I’m happy to format the USB disk and start the backups again, but when I plug it into my mac, that option isn’t there either.

… or I can just wait until it finishes sometime in 2017… any suggestions?

And with regard to the RAID rebuild issue (the original issue), WD Support have said, that is seems like an issue and that I can use ssh to resolve it, but they cannot help me with how to do it, because ssh is out of their scope.

so in short… I had a disk failure, and the disk was replaced, but when I re-inserted it, something went wrong and WD said I can use ssh to fix it but they can’t tell me how… that about sums it up!

Any help from your guru’s would be excellent.

thanks heaps.

You are corect, this sort of thing is harder than it should be.

The USB disk shouldn’t mount on the Mac read-only.  If it does, go into Finder, select the drive and then “Get Info” and you will see the permissions tab.  Click the “lock” icon and unlock your permissions (you will need to enter your Adim password), and then the table will allow you to modify existing permissions or to setup new user permissions.

Make sure you do this because direct access to the USB drive is FAR FASTER than the method you are using.

Also, by default, the NAS will scan the USB drive for media (to be clear “Media Server” is turned on by default).  This makes Backups VERY SLOW.  To stop this, make sure that the USB Drive Share (access via the NAS GUI) has media serving turned off.  To be sure that that has worked, access the Twonky GUI and make sure that the USB Drive is not being scanned by Twonky.

thanks mate…

assuming I’m able to backup my data so I feel safe to get into ssh, does the info I posted earlier mean anything to you?

It looks like the drive in question would be sdc1, in bay 3.

But to my other question on your ssh commands, if “ssd1” is the disk that you remove from the RAID array (md0), what is the “sdb2” that you are adding?

It looks like I have to run 3 separate commands to get this disk back into the RAID array…

One to remove the disk

One to clear the disk

And one to add it to the array.

If I’m not stretching the friendship, can you please tell me what commands i should use to achieve these things based on the results of the “mdadm --detail /dev/md0” command I posted.

thanks again

Yes, the names are the names of the individual disks in your RAID set.  Can you post what the “watch cat /proc/mdstat” command states?

here’s the results…

Every 2s: cat /proc/mdstat                                  2015-08-13 19:50:47

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md1 : active raid5 sda2[0] sdd2[3] sdb2[1]

      8778211776 blocks super 1.0 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]

      bitmap: 2/2 pages [64KB], 131072KB chunk

md0 : active raid1 sdd1[2] sdb1[1] sda1[0]

      2097088 blocks [4/3] [UUU_]

      bitmap: 16/16 pages [512KB], 8KB chunk

unused devices:

thanks again for helping…

Ok, thanks.  According to the output, your 3rd drive is not in the RAD5 array, so it has been effectively removed.

Once  your backup has completed , type

“mdadm --re-add /dev/md1 /dev/sdc2”

if that works, fine, but if you get a message “resource busy” (or similar), then type

“mdadm --remove /dev/md1 /dev/sdc2”

“mdadm --add /dev/md1 /dev/sdc2”

this will attempt to add the 3rd drive (/dev/sdc2) to the existing array.

tell me if that works - a “cat /proc/mdstat” should show the array rebuilding.

Thanks mate, I’ll give it a go as soon as I am confident I’ve got my data safely tucked away… there’s a bit of it, and the reason I got this NAS was so I didn’t have to have all the smaller devices… now I’m having to backup to the smaller devices scattered around the house.

Question, do I need to do the “clean” step?

Awesomeness abounds…

You shouldn’t need to, but it would do no harm.