RAID rebuild taking days/weeks/months/not working?

I have an EX-4 configured with 4 disks 4TB disks and two RAID 1 volumes.  After a recent power failure, the NAS has been rebuilding volume one for 5 days now (volume two is not mounted).  Volume one was almost full with only 88 GB remaining.  When I check on the status, the build time is stuck on 27:39:36.  How long should I expect to wait to rebuild this volume?  Is there some thing I can check to determine if any real progress is being made.  I think I can access my data on the volume while it is rebuilding, but I need to determine if this unit is really protecting my data which is why I invested so much money for it the first place.  Unfortunately, the unit was not plugged into a UPS when power was lost, but I’d rather know now if this NAS is just a clunker since my data is important to me.  From all the issues and comments on this forum, it is looking like WD EX-4is not a good choice if your data and time is important.

RAID status rebuild is at 16%.  When I refresh the page, there is a number at the end of the status that does change (UI does not display the entire number/message).  At this rate, it will take a month to complete.

Log in via SSH and check

cat /proc/mdstat

That will give you more details.

Thanks for the suggestion.  Here is what I see.

Volume1Rebuild.png

Is this what I should expect when rebuilding RAID 1?  Finish time is 52260 minutes.  So it will take over a month to rebuild.  I hope I can find some way to speed this up.

Hmmm.

Is your NAS “busy” while this is going on?  

The last time I rebuilt mine (4-disk RAID5), it only took about 24-36 hours, but it was pretty much idle during the rebuild.

I’ve got news for you John1000,  Recently I whent through the same problem being stuck at the rebuilding status for countless hours and well I kinda got tired of waiting for it and decided to speed things up. 

Now I don’t know why WD would cap the Rebuild speed to 1031K/sec but if you wanna speed things up all you have to do is log into your device and type in:

sysctl -w dev.raid.speed_limit_min=100000

after that go ahead and type

cat /proc/mdstat

 

and you’ll see some BIG difference in speed.

~ # cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md1 : active raid1 sdc2[0] sdd2[1]
      3902822264 blocks super 1.0 [2/2] [UU]
      [====>................] resync = 22.2% (869768832/3902822264) finish=792.1min speed=63812K/sec
      
md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      2097088 blocks [4/4] [UUUU]
      
unused devices: <none>

Just make sure not to do anything else with the device and disable all downloading or sync services while you do this process as this will degrade the speed for the rebuild. 

WD PLEASE RELEASE THE SDK FOR THIS DEVICE IT WOULD HELP UNDERSTAND THIS DEVICE WAY MUCH BETTER.

1 Like

Thanks, James_THHD,  for the awesome suggestion.  It appears that things are running much faster at the moment and that my rebuild will finish before the 4th of July - in fact, maybe tomorrow.  I’ll post back hopefully soon when it completes to report the results.  Thanks again.

IncreasedPerf.png

James_TNHD wrote:

Now I don’t know why WD would cap the Rebuild speed to 1031K/sec…

WD didn’t really do the capping.   Capping is setting the MAXIMUM, not the minimum.

The default  dev.raid.speed_limit_min  is 1000, which is what the WD is set for.

It’s for other reasons the mdarray is tending toward the minimum instead of the maximum (which WD actually increases from the default of 100,000 to 200,000) probably related to io queue depth or something.

Without adjusting the parameter, I just manually forced a RAID5 resync on my EX4:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sdd2[3] sda2[0] sdc2[2] sdb2[1]
      8778211776 blocks super 1.0 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      [>....................] recovery = 0.0% (602496/2926070592) finish=890.1min speed=54772K/sec

md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      2097088 blocks [4/4] [UUUU]

 It’s chugging along quickly at 54.7 megabytes per second; eta ~12 hours.

The rebuild finished the next day after configuring things differently.   Strange that setting the minimum limit  higher resolved the issue and got things accomplished in less than a day compared to over a month if I had done nothing.  As pointed out, the maximum setting is usually the the culprit, not the minimum.   Thanks for the insight as to how to resolve this issue.

WD should be ashamed that they release a product which is not capable of the bare minimum capabilities and relies on the user community to work out all the issues.  Is that where we are at now?  Release a product  that is not capable of providing the consumer the minimum functionality that is advertised and ostensibly justifies the price that we pay while providing little or no support?  Thankfully, we have a forum to work things out amongst ourselves, but this is not the way it used to be or could have been in the past.  Companies used to have their reputation and survival on the line, but now everything is out the door.  Hire cheap labor and let the chips fall where they may and rely on knowledgable consumers to help others out.

And WD is a big company that has decided to screw the consumer by releasing a half-a** product.  And if I ever speak frankly regarding why we find ourselves in this situation, my posts are removed.  After many decades in the industry where I have insight into what is happening, my opinion (facts) are removed.  Excuse me, but I have been working and dealing with these types of issues for over 3 decades, I don’t have an axe to grind, and I just try to report the facts.

Thanks for all the community contributions to this forum.   As for WD, there are a few words that I can’t use to describe what they have decided is “good enough” for us paying consumers.  And as I say this, I acknoweldge that there are many (most) WD employees who are awesome.  But some where along the line, f-sticks in decision-making positions have decided to take the company in the wrong direction.  All we can hope for is natural selection to weed out companies that go awry, and WD seems to be going in the wrong direction at this point.  

I had a similar issue last week.  I had my EX4 configured with 2 x 3TB drives in a RAID-1 mirrored volume.  I backed everything up and added a 3rd 3TB drive.  I went through the wizard to change the RAID mode from RAID-1 to RAID-5 and migrate the data.

After about 5 to 10 minutes, the progress indicator went from 0 to 5%.  It didn’t change after an hour, so I created a support ticket with WD.  After failing to receive a response from WD support after a day, I called.  (After 25 hours, the progress had gone from 5% to 6% and the progress indicator was estimating more than 30 days.)  WD support informed me that this was normal, but that the process would not take 30 days.  They estimated 3 days.  Amazingly, they were right.  After 73 hours, the process completed.  The next step of resizing the volume from 3 TB to 6TB took another couple of hours.

The information above regarding adjusting the throttling configuration is brilliant.  I wish I’d seen it prior to my experience.

How to do you log into the decive? I logged into the dashboard but don’t know where to enter that text. Please help!

I do apologize, but this is my first time using the board,

Can you assist me with logging into my device to change the settings, I logged into my dashboard and don’t know where to go please advise.

Thanks

go to users tab → admin

and you should be able to change all the things on need there

I have a default raid5 made up with 2+2+2+2 terabyte disk.

It is around 56% full.

I removed and reinserted disk 3 for a test in a matter of seconds and now is automatically rebuilding the volume 1.

according to /proc/mdstat

md1 : active raid5 sde2[2] sda2[0] sdd2[3] sdb2[1]

      5847956928 blocks super 1.0 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]

      [>…]  recovery =  4.7% (91982336/1949318976) finish=945.3min speed=32743K/sec

can i access the data during the rebuild?

It seems i can not see the smb or aftp shares in the meanwhile :frowning:

For example from dashboard → shares → public everything is greyed out except “oplocks”

the only user is “admin”

After the end of rebuild everything is back to normality.

Isn’t it possible to keep access to the samba shares during the raid5 volume rebuild ?

Can I expect to have a full data access after my WD EX4 rebuilt due to accidentaly power failure?

James_TNHD,

Thank you for your excellent advice. That is going to speed things up considerably. Once the sync is complete, do you recommend reducing the speed limit back to the original setting? If yes, what parameter would you use? If no, do you think the higher speed raid syncing will impact user upload/download performance?

Thanks for the assistance.

James_TNHD wrote:

I’ve got news for you John1000,  Recently I whent through the same problem being stuck at the rebuilding status for countless hours and well I kinda got tired of waiting for it and decided to speed things up. 

Now I don’t know why WD would cap the Rebuild speed to 1031K/sec but if you wanna speed things up all you have to do is log into your device and type in:

sysctl -w dev.raid.speed_limit_min=100000

after that go ahead and type

 

cat /proc/mdstat

 

and you’ll see some BIG difference in speed.

 

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sdc2[0] sdd2[1]
3902822264 blocks super 1.0 [2/2] [UU]
[====>…] resync = 22.2% (869768832/3902822264) finish=792.1min speed=63812K/sec

md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
2097088 blocks [4/4] [UUUU]

unused devices:

 

Just make sure not to do anything else with the device and disable all downloading or sync services while you do this process as this will degrade the speed for the rebuild. 

 

 

WD PLEASE RELEASE THE SDK FOR THIS DEVICE IT WOULD HELP UNDERSTAND THIS DEVICE WAY MUCH BETTER.

Quote? I don’t need no stinkin quote…

James…

Thanks that command (sysctl -w dev.raid.speed_limit_min=100000) totally helped me out

My rebuild was at 59407 minutes then dropped to 13859 !!!

Then few mins later dropped to 1396 (under a day)  WOW

KUDOS!!!

Hi.

Can anyone please help me?

I have read where many EX4 users have experienced long rebuild of EX4.

Just over a week ago, I purchased a new EX4 24tb and set it to RAID 10.

Copied 8tb of files to it and all was going well. While still copying the last 2 tb, as fate would have it… a sudden power failure in the area shut down the EX4 …mid copying.

Upon startup, it bagan a rebuild of volume 1.

7days have now passed, and the rebuild is still showing 9999 minutes and has completed only 3% so far.

It has not stalled and continues to rebuild files… but at this rate… 3% in 7 days… it will take around 240 days to complete the rebuild.

I have read on the community boards, that I can go to “admin - and change the things there”… but I looked there, and everywhere in the dashboard… but cannot see anywhere how to do this.

Can anyone please steer me in the right direction, please? 

I am using a Windows 8.1 computer connected to the same network as the EX4.

Many thanks and best regards

umbgumb

hello

I ve been trying for hours to find a way to enter the text    sysctl -w dev.raid.speed_limit_min=100000     as mentioned on the forum but I can’t find the right place to enter it. I noticed that others cannot find it either, even though we checked under    users/admin as prompted.   There was nothing there to enter any text.

Any help would be appreciated

best regards

Jim

Hi Guys,

I understand, I was new to all this not too long ago…

You will need to enter these commands via a command line ssh… BUT be very careful because once you are in there, you CAN do damage unless you are very careful as you are going “under the covers” so to speak.

You will need to log into your portal, then go to Settings.  Select Network (on the left) and enable SSH.  You may also need to set a password.

Once you have done that you can start a terminal session and log into the device itself… on my mac, I connect using Terminal, and then type “ssh @” then hit enter, and it will prompt you for a password.

Once you are logged in you can enter the commands as stated earlier in this post.

Hope this helps…