Impending Doom?

scottgu3 · August 13, 2013, 3:07am

All,

Sadly, my new WDC WD20EARX (2 months old max) appears to be failing (to me). I have a single EXT4 Linux partition with a whole bunch of movies for my Media Server…that appears to have some sort of DMA problem…below is my SMART report. Can anyone confirm that this drive is indeed Doomed???

Thanks,

Scott

Location	SATA device A	Drive size	1.86 TB
Make and model	ATA WDC WD20EARX-00P	Supports SMART?	Yes
SMART enabled?	Yes	Errors logged	147 errors detected
Passed drive check?	No	Model family	Western Digital Caviar Green (Adv. Format)
Make and model	WDC WD20EARX-00PASB0	Serial number	WD-deleted
Capacity	2,000,398,934,016 bytes [2.00 TB]

Additional SMART attributes

Offline data collection status	Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled.
Self-test execution status	The previous self-test completed having the read element of the test failed.
Total time to complete Offline data collection	39600 seconds.
Offline data collection capabilities	SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.
SMART capabilities	Saves SMART data before entering power-saving mode. Supports SMART auto save timer.
Error logging capability	Error logging supported. General Purpose Logging supported.
Short self-test routine recommended polling time	2 minutes.
Extended self-test routine recommended polling time	381 minutes.
Conveyance self-test routine recommended polling time	5 minutes.
SCT capabilities	SCT Status supported. SCT Feature Control supported. SCT Data Table supported.
Raw Read Error Rate	19352
Spin Up Time	6150
Start Stop Count	7
Reallocated Sector Ct	1
Seek Error Rate	151
Power On Hours	385
Spin Retry Count	0
Calibration Retry Count	0
Power Cycle Count	7
Power-Off Retract Count	6
Load Cycle Count	7583
Temperature Celsius	29
Reallocated Event Count	1
Current Pending Sector	1347
Offline Uncorrectable	0
Multi Zone Error Rate	0

Full SMART status report

smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.8.0-27-generic] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===

Model Family: Western Digital Caviar Green (Adv. Format)

Device Model: WDC WD20EARX-00PASB0

Serial Number: WD-deleted

LU WWN Device Id: 5 0014ee 20857cbf7

Firmware Version: 51.0AB51

User Capacity: 2,000,398,934,016 bytes [2.00 TB]

Sector Sizes: 512 bytes logical, 4096 bytes physical

Device is: In smartctl database [for details use: -P show]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Mon Aug 12 22:51:48 2013 EDT

SMART

support

 is: Available - device has SMART capability.
SMART

support

 is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status: (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: (39600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off

support

.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities: (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability: (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 381) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f 001 001 051 Pre-fail Always FAILING_NOW 19352
  3 Spin_Up_Time 0x0027 177 177 021 Pre-fail Always - 6150
  4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 7
  5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 1
  7 Seek_Error_Rate 0x002e 198 195 000 Old_age Always - 151
  9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 385
 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6
193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 7583
194 Temperature_Celsius 0x0022 121 118 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1
197 Current_Pending_Sector 0x0032 196 196 000 Old_age Always - 1347
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
ATA Error Count: 147 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 147 occurred at disk power-on lifetime: 354 hours (14 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 d8 01 80 ef Error: UNC at LBA = 0x0f8001d8 = 260047320

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  c8 00 00 e8 00 80 ef 08 7d+21:08:02.306 READ DMA
  c8 00 80 68 00 80 ef 08 7d+21:08:02.299 READ DMA
  c8 00 40 28 00 80 ef 08 7d+21:08:02.295 READ DMA
  c8 00 20 08 00 80 ef 08 7d+21:08:02.294 READ DMA
  c8 00 08 00 00 80 ef 08 7d+21:08:02.284 READ DMA

Error 146 occurred at disk power-on lifetime: 354 hours (14 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 a8 85 00 e0 Error: UNC 8 sectors at LBA = 0x000085a8 = 34216

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  c8 00 08 a8 85 00 e0 08 7d+21:07:25.512 READ DMA
  c8 00 08 a0 85 00 e0 08 7d+21:07:25.512 READ DMA
  c8 00 08 98 85 00 e0 08 7d+21:07:25.512 READ DMA
  c8 00 08 90 85 00 e0 08 7d+21:07:25.512 READ DMA
  c8 00 08 88 85 00 e0 08 7d+21:07:25.512 READ DMA

Error 145 occurred at disk power-on lifetime: 354 hours (14 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 60 7b 00 e0 Error: UNC 8 sectors at LBA = 0x00007b60 = 31584

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  c8 00 08 60 7b 00 e0 08 7d+21:07:19.390 READ DMA
  c8 00 08 58 7b 00 e0 08 7d+21:07:19.390 READ DMA
  c8 00 08 50 7b 00 e0 08 7d+21:07:19.390 READ DMA
  c8 00 08 48 7b 00 e0 08 7d+21:07:19.390 READ DMA
  c8 00 08 40 7b 00 e0 08 7d+21:07:19.390 READ DMA

Error 144 occurred at disk power-on lifetime: 354 hours (14 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  01 51 08 20 71 00 e0 Error: AMNF 8 sectors at LBA = 0x00007120 = 28960

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  c8 00 08 20 71 00 e0 08 7d+21:07:13.518 READ DMA
  ec 00 00 00 00 00 a0 08 7d+21:07:13.494 IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08 7d+21:07:13.494 SET FEATURES [Set transfer mode]

Error 143 occurred at disk power-on lifetime: 354 hours (14 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  01 51 08 20 71 00 e0 Error: AMNF 8 sectors at LBA = 0x00007120 = 28960

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  c8 00 08 20 71 00 e0 08 7d+21:07:10.533 READ DMA
  ec 00 00 00 00 00 a0 08 7d+21:07:10.509 IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08 7d+21:07:10.509 SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 354 10160

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Kieren · August 13, 2013, 3:41am

Analysing the test output is out of my league, but it does not look good.

However, do you have a PC which you can hook the Green drive up to, boot to Linux from a CD such as the Hirens BootCD, and rescue as much of your files as possible? I’d be thinking data recovery while it might still be doable.

Zatick · August 13, 2013, 7:09am

Unfortunately that drive is toast.

Briefly, you have 1 reallocated sector, which means the drive has not been able to read data from that sector twice.
A single reallocated sector is enough to RMA a WD drive by the way.

You have a further 1347 sectors that cannot be read, and if you were to delete the data in them and then re-write some data back to those sectors they would most likely turn into reallocated sectors.

It’s not a DMA error, the data transmission was okay and it made it onto the disk, the problem is that the data can’t be read back now, and this is before it even tries to transmit it back to the SATA controller.

P.S. the head on your drive is parking approximately 20 times an hour. That green drive should be rated for 300,000 load/unload cycles, which puts it at 625 days before you’ve breached that rating.

scottgu3 · August 13, 2013, 1:55pm

Thanks Kieren,

That’s what I was afraid of!

I am in fact using ddrescue to attempt to recover individual files. I have gparted Magic as well, but figured I’d try the ddrescue first. Fortunately, I have another nearly empty 2TB drive to move the files to in the same machine.

I haven’t tried Hirens yet, but I’ll have a look!

Thanks for the feedback and tips.

Scott

scottgu3 · August 13, 2013, 2:00pm

Zatick,

Thankyou! As I said to Kieren, this is what I feared, but in anticipation, I’ve been ddrescue’ing as quickly as I can.

I’m doing what I can to recover the data, but will be RMA’ing the drive soon.

Very unusual, I’ve not had a drive (that wasn’t floating around a drawer for a couple of years between installations) fail on me in years. It took me almost a day to realize it was the drive and not the network causing my playback problems. Took only a few minutes with Apt-Get, SMARTctl and ddrescue to decide I was probably hosed, and a few seconds on the forums to confirm!

Again thanks! Off to RMA to see what I can see.

Scott

Kieren · August 13, 2013, 2:08pm

Hi scottgu3,

gparted Magic is what saved my bacon. At least, he default Linux boot on Hirens is centered around gparted. I wish you all the best in your efforts. I don’t think there will be any problem with the RMA, but good luck with the data recovery! I feel that I was very fortunate to salvage everything from my 3+ year old EARS drive.

Regards,

Kieren

scottgu3 · August 14, 2013, 12:00pm

Zatick,

The 625 day estimate may have been optimistic .

Sometime last night the thing imploded. I had given up on recovering data (taking too long), so I was rebuilding my movie collection from scratch (56 items in my Handbrake queue last night when I went to sleep), onto the other 2TB drive in the system. I decided to ignore the WD for now, and attempt to re-partition and format it before I sent it back to WD on RMA, and just left it connected and (foolishly) in the fstab.

Well…this morning, my media server once again lay on my office floor with it’s guts hanging out, as the drive failed (apparently disastrously), and in so doing (somehow) apparently caused enough errors on the SATA bus that my boot partition (on an SSD) hosed itself too. Now, the /root gets mounted RO, and I can’t remove the failed drive from the fstab, e2fsk tells me my superblocks aren’t so super any more…and so on…LOL.

I don’t think my Wife is going to be too keen on this whole media server idea if this sort of thing keeps happening .

Ho hummm…gparted magic here I come.

Nonetheless, you were correct, that drive IS toast!

Scott

scottgu3 · August 14, 2013, 12:03pm

Kieren,

See my recent reply to Zatick, the drive failed gloriously last night, and now gparted magic is the only thing that might save me from a total Ubuntu Server re-install!

The best part is I _could_ have taken it out yesterday before it failed, and all would be well! LOL.

Oh well, it’s not a hobby unless there is a healthy dose of frustration involved…right?

Later!

Scott

Kieren · August 14, 2013, 4:48pm

scottgu3 wrote:
[SNIP]

Oh well, it’s not a hobby unless there is a healthy dose of frustration involved…right?

Later!

Scott

Ain’t that the truth!

I guess it might be late for this, but my drive disaster was mitigated in part by sticking the drive in a ziplock freezer bag (quart size is ideal) and putting it in the freezer from an hour to overnight. I also took measures to keep it cold while copying from it. An anti-static bag inside the ziplock wouldn’t hurt, either.

scottgu3 · August 14, 2013, 6:53pm

Thanks Kieran,

To be honest, the data on the drive is simply not worth the pain of saving it. The tip may be useful in the future though.

I’m looking at a WD Red solution as a replacement…I’ll take the RMA drive and use it, then add a 3rd drive in the guise of an WD Red, then eventually replace the Green and the Seagate with two more Reds. Based on my reading, short of going to full on enterprise class drives, they should last a bit longer and be a bit more appropriate for a media server than the Green drive.

I’m not sure what my new backup plan is. I’m thinking I may just use some USB3 HD’s and back up the data periodically as I put it on the server. We’ll see.

Hey, at least I know what I’m doing for the rest of the week! LOL

Again, Thanks

Scott

Zatick · August 14, 2013, 9:35pm

It doesn’t sound like you had any redundancy, check out RAIDZ, or even better yet Freenas. I’ve got a Freenas server with 3x 5 disk RAIDZ5 volumes and can copy between those volumes at ~500MB/s. You don’t need a raid card, and RAIDz is really nice, self healing etc.

RAID isn’t a backup so I backup all of that data to a set of drives in my desktop in a Drivebender pool, although the Windows 8 pooling would probably do the job these days.

Have a look around google and see what other people are doing

Kieren · August 14, 2013, 9:41pm

Those do sound interesting.I’ve got chasis that could do it. I’ll have to see if I have functional board and CPU.

scottgu3 · August 15, 2013, 1:19am

Nope, no RAID yet, and really no backup. Until very recently, I had minimal data on the server, and I was considering my backup plan. (Cheap 1TB USB 3 External drives that I buy on Sale and store elsewhere seem appealing).

Right now, the server is a ASUS Mini-ITX MB running an Intel I3-2200 with support for 6 Drives (4 in RAID) on board, and the chassis has support for say 3 3.5"'s and an optical. I could probably squeeze 4 3.5’s in there and shoe-horn the SSD in someplace.

I’m running Ubuntu 13.04 Server, headless, usually with only access from my Desktop machine through SSH.

The box is running PlexMediaServer & SAMBA, and that’s about it.

Right now, it has a 2TB (that other brand) HD, and an older Patriot TORQX 128GB SSD (boot./root/swap). The WD 2TB GREEN was intended to be the Movie Drive, and the Other Brand was a backup drive. There were 200+ Movies on the WD when it fried, and I have backups of about 100 or so of them, as well as the original media.

Longer term, I was thinking of throwing a Freenas box on the network, and using this box as a backup server while migrating the movies to the Freenas.

But…budget wise, I’m going to have to wait on that…

Frankly, I think I was just unlucky with the WD. I am however going to invest in some Reds in the near term. Given what I use the system for, 7200 RPM desktop drives seem a bit silly.

I am going to have to do some reading on Drivebender…I’m not familiar with it. Sounds like a JBOD? Ah…I see. Sort of a JBOD mixed with a bit of Dropbox style magic. Interesting.

Also not in the budget right now, but worth a thought for the future.

Thanks for the tips! Good stuff.

Scott

Zatick · August 15, 2013, 1:32am

Yeah, Drivebender is something like that. It’s just nice that you can stack odd sized drives and it presents them as a single volume. There are options to duplicate files (so 2 copies on 2 drives, auto repairing if a drive dies/is replaced), also records CRC32 info on each file so you know if there is corruption. Okay for a backup but I wouldn’t use it as primary storage…

But what caught my eye was the mention of the “support for 6 Drives (4 in RAID)”, try to stay away from intel ICH raid. You’ll probably be able to push 2-3 times more data through a RAIDZ and a RAIDZ gives you a bit more freedom in the event of motherboard/CPU failure (you just need a box with running freenas to mount your volume again). Plus you get the auto healing and all of the other benefits that go with it…

It looks like your current box would be fine for Freenas, although you typically just install Freenas on a USB flash drive (works fine). It doesn’t look like Plex Media is working that great in Freenas at the moment, but there are guys that have got it working.

scottgu3 · August 15, 2013, 1:51am

Zatick,

Very good point. I didn’t know that about RAIDZ. I’ll do some reading on that too. I’ve always stayed away from RAID since a 90’s era disaster with a Promise RAID controller and a RAID 0 array… bad juju that. I’ve been recently considering it for two reasons…1. I’m a bit less ignorant than I was in the 90’s, and two things like freenas, and RAIDZ (and modern OS’s as well) have come a long long way since those days.

Lemme do some reading on RAIDZ and I’ll get back to you!

Scott

Zatick wrote:

But what caught my eye was the mention of the “support for 6 Drives (4 in RAID)”, try to stay away from intel ICH raid. You’ll probably be able to push 2-3 times more data through a RAIDZ and a RAIDZ gives you a bit more freedom in the event of motherboard/CPU failure (you just need a box with running freenas to mount your volume again). Plus you get the auto healing and all of the other benefits that go with it…

scottgu3 · August 15, 2013, 2:09am

Hmmm…

Seems like I could simply add ZFS to Ubuntu Server, and then grab 3 2TB WD Red’s, and create a ZFS RAID-Z Pool with the the drives. It’s a bit expensive for 4 TB ~$300, but that’s a lot cheaper than losing my data… Might need to bump up my RAM too (1 GB / TB, and I’m only running 4GB now…so there wouldn’t be a lot of RAM for Plex).

– a little more reading reveals a bunch of Plex users doing this! –

Pretty cool stuff. Very interesting…of course now, I need to convince the significant other that 3 new WD Red’s are not just toys, but NEEDED! LOL

Thanks!

Scott

Zatick · August 15, 2013, 2:34am

Adding ZFS to your existing server is another way to go

As far as memory goes, there are a couple of schools of thought on that. There is a lot of 1GB/1TB recommendations, but there are also people saying for lightly loaded media servers you get away with much less.

Good luck on getting permission for your drives!

You should get away with mixing a desktop drive, WD green, and a new WD Red if it came down to it. I’ve got a bunch of green drives in my server and some “alternative” desktop models and have not had any problems with them, had one blip (pending sector) but drive did not drop. Freenas just emailed me and kept on running, didn’t miss a beat.

scottgu3 · August 15, 2013, 3:50pm

Woot! The Mrs must really like her Plex movies on the iPad. She just said “Go For It” on the 3 WD Reds.

Color me amazed.

Now I need to find a deal on them, and we’re good to go! I think I’m going to wait and see how it works with 4GB of RAM, but if I see a deal on that, I may grab a couple of 4GB Sticks to cover my butt on that part.

Again, thanks for all the tips. I really appreciate it.

Scott

Zatick · August 16, 2013, 12:51am

haha :smileyvery-happy:

You’re welcome