RAID data recovery (FYPS2002)

WDC_HDD · July 28, 2012, 11:56am

In the middle of a rebuilding process, one of the drives in my RAID 6 array decided to break down and take the entire array with it. The drive is encountering multiple SMART errors. Please see pictures. Any ideas on how to remedy it? The drive is a Western Digital RE4-GP 2GB 2002FYPS drive with firmware 04.05G05

fzabkar · July 29, 2012, 8:24am

AIUI, RAID 6 can continue working in the case where two drives have failed.

So why not just replace the drive and rebuild the array?

WDC_HDD · July 31, 2012, 9:46am

I should have been more clear in my original post. I did not have all the information back then. Actually, I still don’t have all the information, but I have a much better picture.

Here’s my setup:

OS: Windows Server 2008 R2

HDD: Western Digital 2002FYPS

Raid Controller: 3ware 9650SE 24ML

Array: RAID 6 (14 HDDs)

Here are the error logs from my RAID controller.

https://dl.dropbox.com/u/10737837/lsi.Win2k8R2.HOMESERVER.072412.10704.zip
https://dl.dropbox.com/u/10737837/errorlog_0.dat

A few nights ago, the array reported that there was a problem with two of the drives (drives 0 and 5). I’m not sure what the exact error was. I was in the middle of a relatively large transfer (~50GB). All of a sudden, my system froze for about 2 hours, and after that, I managed to do a normal restart. The controller does this sometimes - kicks out two drives randomly and then proceeds to rebuild them.

When the system started, I checked 3DM2 (the raid controller software). It said that the array was degraded and proceeded to automatically rebuild the array. Everything was fine until the rebuild process hit 14%. Then I received several errors concerning drive 3 (which is strange, because the two drives that were dropped are 0 and 5). While the rebuilding process is still listed as “active” under 3DM2, it hasn’t progressed at all overnight.

I took drive 3 out of the array and ran the WD diagnostics software on it:

I also tried reading the SMART data via HD Tune:

fzabkar · July 31, 2012, 10:27am

The drive in the HD Tune report is a wreck. I don’t know how you would rebuild the array when 3 drives are bad. ISTM that your best approach may be to at least clone one drive, sector by sector, using a utility such as ddrescue which understands how to work around bad media.

WDC_HDD · July 31, 2012, 10:29am

The thing is though, the other two drives aren’t bad. The controller just habitually kicks them out of the array. Is there anything I can do to recover the data on those 2 drives?

DualportSRAM · August 2, 2012, 1:34pm

What made me post this is that it looks like you have severel drives fail at nearly the same time.

I don’t know if your problem is the same as I experienced with an early WD2002FYPS (same FW as yours) but I got the same problems running the drive in 2 (different make) enclosures (where I had/ and are using severel WD1002FBYS w/o problems) Failures also registered in SMART log (can’t remember which failures and don’t have drive online so I could look.

One of my enclosures : http://www.raidsonic.de/en/products/backplanes.php?we_objectID=5896

If I connected the drive directly to the RAID controller with only one SATA cable the problem seemed to be gone but I don’t trust the drive.

So I in my book some (?or all) of these drives are defective by design, problem maybe in the SATA interface IC area ?

Maybe fzabkar knows if there is more hw revisions of this drive ? ( I doubt there is ) so I would consider to replace the drives with another model (or make) RAID grade drive when you (I sure hope so) have gotten your files recovered.

So if you have important data here you should consider to take them to a specialist and not try to get them online yourself cause you risk destroying more files in the process.

Many times he can make some kind of track to track copy of the drive so work can be done on the “copydrives” in stead of risking to corrupt the original data (on the original drives) if one repair operation fails.

Also above if you don’t have same drive space elsewhere .

In my country there has come more companys in the harddisk repair business so prices has gone down so maybe you should get a quote before you spend maybe many ours trying to get your date back ?

The harddisk repair company could also maybe help you to get a refund/replacement (to another model:) ) on the old defective drives from WD.

fzabkar · August 7, 2012, 10:14am

I know very little, if anything, about RAID, but all the recommendations in other storage forums say to clone the problematic drives, sector by sector, and then rebuild the array. ISTM that your best approach would be to clone the two “dropouts” as well as the wrecked drive. In fact you may not even need to clone the latter.

mangrove · August 9, 2012, 4:32pm

Yes, you can. actually, i just did this for a friend; he had a Qnap NAS with four WD Green (EARS) drives in a RAID5, racking up high LCC counts (he didn’t know he should have run WDIDLE3…) and then failing. Two disks failed with read errors and the array dropped; a third drive actually dropped during reconstruction. LCC count = around 360k. Aaaanyways he got all his data back.

And what did I do? I used a third party tool. It’s called R-studio, and it’s quite seriously one of the more impressive tools I have found during fifteen years of IT work. You attach the disks to your SATA controller, USB adapters, or a mix, and then you can create a virtual RAID disk by dragging and dropping partitions in the right order – including adding empty drives (parity info will be reconstructed on the fly). So you can use the “best” drives from your array, leaving out the worst two (in the case of RAID6). I happened to leave one disk out completely as I only had three USB adapters (SATA will be much faster of course!) and my friends didn’t lose a single file.

The software doesn’t “drop” an array, it tries to reconstruct data even if it gets a read error, and it’s not affected by TLER in the same way as a real RAID controller. Also, it reads EXT4 natively, so I could sit back at my Windows box and copy files from his broken Linux software raid formatted in EXT4… which is hugely awesome.

I highly recommend this software. If you want more information, check out their homepage or drop me a mail through my home page www.magnuswedberg.com (no I don’t get a commission, sadly!). I think the software cost me $80.

Edit: important addition. I didn’t have extra drives to work with and so I had to use the originals, but of course cloning the drives and using the clones to recover data is the best idea.