Raid Recovery

I had the EX4 running for a few days and experienced a drive failure.  The array was configured as raid 10. WD rma’d the drive and I have plugged it into the device.  I am getting the message:

Event title:Drive Inserted

Event description:A new drive has been inserted into Bay 1&2&3&4. To add this drive to your existing RAID set, navigate to Storage > RAID and click the Change RAID Mode button.

Severity:Info
Event code:2214
Event time:01-16-2014 02:18:35 PM
Firmware version:1.02.25

I have multiple TB of irreplaceable data on the device which is the reason I configured it as RAID 10.  Any ideas on how to proceed to get access to the data on the device.  The message scares me that if I do somethig I will lose the current data.  Any suggestions?

System has spent another message:

Following events are generated on your WDMyCloudEX4 ConroyCloud2.

Event title:System is in standby mode

Event description:System is in standby mode
Severity:Info
Event code:2032
Event time:01-16-2014 02:45:59 PM
Firmware version:1.02.25

WillieC wrote:

I have multiple TB of irreplaceable data on the device which is the reason I configured it as RAID 10.

First, the sermon.

RAID is not a backup!  NEVER rely only on RAID to protect your data or you will lose it.

Log into the User Interface and go to the STORAGE / RAID section – what does it tell you to do?

It should say:

cap1.png

You’d want to press the Manual Rebuild button.

My raid profile says:

No Configured Volumes
 
Setup a RAID Mode to configure new RAID Volume(s) on this device.

Not what I was hoping for.  I am hoping someone has an idea on possibly how to get disk 1 & 2 online.  If I understand the documentation correctly disk 3&4 are a mirror of 1&2.  Drive 3 failed.

When drive 3 failed, were you able to still see your data using the mirrored 1&2 disks?  Was it after you inserted the new drive, you could not see your data when the NAS could not find a RAID configuration to rebuild?

When drive 3 failed the system reported the degraded status and I could see the data.  I should have copied everything off at that point but didn’t.  I expected the rebuild to work.

I had powered the system down while I waited for the new drive.  When I replaced the drive in the EX4 and powered it up it decided I had replaced all 4 drives.

I’m not sure how the device behaves if you shut it down and replace the drive in the middle of a RAID failure before powering it up.

The instructions are fairly clear…  it’s intended for drives to be replaced while powered up.

This would be the first device I have seen that could not be shutdown from the console and hold its configuration state at shutdown.  If this is true this would be a very dangerous device to use and definately not be a candidate for ultimate reliability in a residential context.  

This would mean the EX4 would be at high risk for catastrophic loss due to a power disruption if its raid is operating in a degraded configuration.  It looks that a routine RMA for one of these disks could be up to 4-5 days.  I have the device behind a ups but a ups is usually used to provide time to either transition to an alternate power source, standby generators in the case of most of the data centers I have designed, or to provide a graceful shutdown if it looks like power restoration will take longer than the rated power of the ups.

I hope that you are incorrect in that the device is unable to take a shutdown command from the dashboard in a degraded state without dropping its configuration information.  If so it is poorly designed and very dangerous to use without significant caveats.  Most residential users do not have access to alternate power sources and this winter snow storms in the Midwest and Northeast have provided numerous opportunities to test my ups and perform graceful shutdowns of my other equipment.  I would be surprised if most residential users have the EX4 behind a ups, thinking at worse they would lose any data transmissions in progress when the device shuts down, not suffer a catastrophic failure of the device.

So where do stand currently with your configuration?  As I understand it, after putting in the new drive, the RAID was still unavailable and there was no option to rebuild.  If I have the scenario correct, what you have encountered is completely unacceptable.  Please keep all of us who care about our data posted on your final outcome.  Thanks.

Your description is correct.  I placed the new drive into the EX4.  I have no option to rebuild its RAID 10 configuration.  It is actually worse than that as it appears to want to rebuild everything with I assume a total data loss.  I have opened a support case with WD and will provide info on their response.  I have shut the drive back down from the dash board, and packed it up for shipment, as I need to head to a client site for an engagement.  I will hand carry it and hope in the next few weeks to get the system back online.

On another note, I looked at whether I could use off site capabilities to backup a 16 TB Raid 10 configuration (8TB usable).  It seems most internet providers have a data limit before they strat throttling back your speed. In the case of AT&T uverse the limit is 250GB a month. At this point they throttle back your speeds.  This means the initial backup of the relatively loaded drive would not be very practical across most residential internet connections, at least uverse.  

It looks like MediaComm has the potential to provide a 2TB data plan before throttling the speed so the market may be getting ready to provide better data volumes, I assume to support streaming HD movies… I will switch my home service to a commercial circuit if I get this device back with its data as at 2TB this still looks impractical.  If the data is unrecoverable, it looks like I will take device out of service and move to an off premise solution and give up on this device.

Sorry to hear about the potential data loss on this device.  It is not acceptable that you should experience data loss based on the scenario you have described.  There are those who always chime in that RAIDis not a backup solution.  But RAID should be able to give you some confidence that your data is safe if one drive fails.   Otherwise, why spend the money?

As for off site backups over the internet, this solution is only viable when backing up a small amount of data.  As a typical home consumer, the bandwidth available to backup T bytes of data is not practical.  Cloud backup solutions should only be used for a very limited amount of data backup for the average user.   I don’t even do cloud backup for this reason and due to the fact the cloud storage companies can come and go.  I have decided to have an on-site back solution for the most part with drives kept in a waterproof, fireproof enclosure.  I may consider backing up the most critical files to the cloud, but no one solution should be considered the end all be all.

Good luck.  I hope your data is retrievable, and if not, that you did have a backup so nothing is lost.  I am simulating various scenarios of drive failures on this NAS to determine if it will provide a level of confidence that I need to protect my data.  Although this is not rocket science, unfortunately too many companies have given up their control of QA for the bottom line.  Way too much outsourcing/insourcing to shops without the minimum level of talent to pull off even the simplest implementation of a closed system.  For the most part, we are talking about moving bits.  Nothing earth shattering here.  I guess I could get on my soap box again, and maybe I should.  It can not be emphasized enough that we are dealing with high unemployment among US citizen engineers, and yet we are allowing hundreds of thousands so-called qualified engineers into the US on H1-B Visas because we don’t have the talent to do the work ourselves.  Give me a break.   And along with the insourcing, so many companies are outsourcing.  I am just feed up, and who wouldn’t be.  Let’s hope as consumers, we can make a difference and not accept what is happening with corporate America. 

Thanks for your comments.  I am hoping there is a way to retrieve the data.  Part of the rationale for why I moved the data vs copy was I was cleaning up 2 4tb DLink nas drives that I was going to use to provide the location to back up the EX4. I had already done the math and figured I was going to need to use onsite backup.  

You are correct I justified the dollar expense for the EX4 based upon the upgrade in reliability I expected.  Until I work through this issue with WD, I do not consider this device ready for prime time.  Until I get some insight to what set of conditions caused this issue and whether the data is recoverable I could not use, or recommend this device, be used in any environment.  It is certainly not worth paying a premium for reliability that does not work.  I have been told Sunday the issue has been escalated within WD.  I am hopeful that when someone does eventually respond that I can give you a glowing recommendation of their customer service capabilities.  Until then this hardware is in the penalty box.

I really wish I could test this myself, but I have 1.5 TB of data on my system that I can’t risk right now.

Have you called WD yet and asked for help?

I think one possible difference is that you made a drive change while the system was powered down.   It would have possibly been a different thing if you’d powered it up before inserting the disk, but you changed the hardware while it was down (and thus may have prevented it from recognizing the new drive the same way.)

I tested this on a RAID5 and didn’t have any issue, but to go to RAID10, I’d have to reformat.

I have opened a support case with WD.  I was told it is being escalated. I wil let you know what the engineers say after they work the issue.  I had hoped they would have contacted me today.  I opened support case Thursday night.  

Still waiting for WD to get back to me for the support case.  Case was entered Thursday evening.  

I was contacted yesterday by WD.  They are building a test environment to try to duplicate the issue to determine whether issue is present in product line or just my specific chassis.  This afternoon we should get together and discuss the results of their attempts to duplicate issue and work the strategy to retrieve data if possible.  I will update the group on the results.