WDC WD30EFRX - 3 drives out of 3 bad?

I have never had so much trouble with new HDs as with this set of 3 Red 3TB drives. Usually it is just install, partition and forget. Given my previous good experince, I, perhaps naively, find it hard to believe I got 3 bum drives, and wonder what is going on.

I have the 3 WD30EFRX in a Linux (3.6.6) machine, all connected to same motherboard (Abit NF-M2 nView(C51PVMCP51)), there is also 4th HD, older Samsung HD154UI, all drives are using same type of SATA cables. 0 problems with Samsung.

Ran badblocks -w on the drives, no problems reported. 

The problems I am seeing are twofold: 

  • SMART problems: using smartmontools-6.0 I am getting all kinds of weirdness: like running same command 3 times in a row, I get 3 different outputs:

1) 

smartctl -l scterc /dev/sdc
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright © 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

Unknown SCT Status format version 258, should be 2 or 3.
SCT (Get) Error Recovery Control command failed

smartctl -l scterc /dev/sdc
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright © 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

3) 

smartctl -l scterc /dev/sdc
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright © 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Commands not supported

Then there are the weird messages from smartd in the logs:

Nov 25 15:32:59 charrm smartd[3823]: Device: /dev/sdc [SAT], unknown self-test status 0xa0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 1 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 3 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 4 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 5 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 7 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 9 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 10 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 11 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 12 = 0

And by running smartctl -x few times in a row I can have the WD30EFRX drive throw up errors like this:

Nov 25 15:22:03 charrm kernel: [701510.947013] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov 25 15:22:03 charrm kernel: [701510.947013] ata6.00: failed command: READ LOG EXT
Nov 25 15:22:03 charrm kernel: [701510.947013] ata6.00: cmd 2f/00:06:03:00:00/00:00:00:00:00/00 tag 0 pio 3072 in
Nov 25 15:22:03 charrm kernel: [701510.947013] res 51/84:04:02:00:00/84:00:00:00:00/00 Emask 0x10 (ATA bus error)
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6.00: status: { DRDY ERR }
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6.00: error: { ICRC ABRT }
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6: hard resetting link
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6: nv: skipping hardreset on occupied port
Nov 25 15:22:03 charrm kernel: [701511.416599] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 15:22:03 charrm kernel: [701511.423516] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
Nov 25 15:22:03 charrm kernel: [701511.423526] ata6.00: revalidation failed (errno=-5)
Nov 25 15:22:08 charrm kernel: [701516.416558] ata6: hard resetting link
Nov 25 15:22:08 charrm kernel: [701516.416568] ata6: nv: skipping hardreset on occupied port
Nov 25 15:22:09 charrm kernel: [701516.883269] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 15:22:09 charrm kernel: [701516.980346] ata6.00: configured for UDMA/133
Nov 25 15:22:09 charrm kernel: [701516.980406] ata6: EH complete
Nov 25 15:22:24 charrm kernel: [701532.431206] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov 25 15:22:24 charrm kernel: [701532.431221] ata6.00: failed command: READ LOG EXT
Nov 25 15:22:24 charrm kernel: [701532.431235] ata6.00: cmd 2f/00:06:03:00:00/00:00:00:00:00/00 tag 0 pio 3072 in
Nov 25 15:22:24 charrm kernel: [701532.431235] res 51/84:02:04:00:00/84:00:00:00:00/00 Emask 0x10 (ATA bus error)
Nov 25 15:22:24 charrm kernel: [701532.431241] ata6.00: status: { DRDY ERR }
Nov 25 15:22:24 charrm kernel: [701532.431245] ata6.00: error: { ICRC ABRT }

Then there is this output from smartctl -x:

Warning! SATA Phy Event Counters error: invalid SMART checksum.
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 2 Command failed due to ICRC error
0x0002 2 28 R_ERR response for data FIS
0x0003 2 28 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 0 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 706937 Vendor specific

The counter for ‘R_ERR response’ increases almost every time smartclt -x is excuted.

  • Then there are plain I/O errors not related to any SMART activity, as far as I can see:

Nov 25 05:03:30 charrm kernel: [664398.333684] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 25 05:03:30 charrm kernel: [664398.333698] ata6.00: failed command: WRITE DMA EXT
Nov 25 05:03:30 charrm kernel: [664398.333712] ata6.00: cmd 35/00:00:00:44:c1/00:04:ec:00:00/e0 tag 0 dma 524288 out
Nov 25 05:03:30 charrm kernel: [664398.333712] res 40/00:00:02:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 25 05:03:30 charrm kernel: [664398.333718] ata6.00: status: { DRDY }
Nov 25 05:03:30 charrm kernel: [664398.333730] ata6: hard resetting link
Nov 25 05:03:30 charrm kernel: [664398.333734] ata6: nv: skipping hardreset on occupied port
Nov 25 05:03:31 charrm kernel: [664398.800302] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 05:03:31 charrm kernel: [664398.813998] ata6.00: configured for UDMA/133
Nov 25 05:03:31 charrm kernel: [664398.814065] sd 5:0:0:0: [sdd]
Nov 25 05:03:31 charrm kernel: [664398.814069] Result: hostbyte=0x00 driverbyte=0x08
Nov 25 05:03:31 charrm kernel: [664398.814075] sd 5:0:0:0: [sdd]
Nov 25 05:03:31 charrm kernel: [664398.814078] Sense Key : 0xb [current] [descriptor]
Nov 25 05:03:31 charrm kernel: [664398.814084] Descriptor sense data with sense descriptors (in hex):
Nov 25 05:03:31 charrm kernel: [664398.814088] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Nov 25 05:03:31 charrm kernel: [664398.814106] 00 00 00 01
Nov 25 05:03:31 charrm kernel: [664398.814115] sd 5:0:0:0: [sdd]
Nov 25 05:03:31 charrm kernel: [664398.814118] ASC=0x0 ASCQ=0x0
Nov 25 05:03:31 charrm kernel: [664398.814124] sd 5:0:0:0: [sdd] CDB:
Nov 25 05:03:31 charrm kernel: [664398.814126] cdb[0]=0x2a: 2a 00 ec c1 44 00 00 04 00 00
Nov 25 05:03:31 charrm kernel: [664398.814142] end_request: I/O error, dev sdd, sector 3972088832
Nov 25 05:03:31 charrm kernel: [664398.814189] ata6: EH complete

and, in other drive:

Nov 25 13:03:30 charrm kernel: [693198.277437] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 25 13:03:30 charrm kernel: [693198.277451] ata3.00: failed command: WRITE DMA EXT
Nov 25 13:03:30 charrm kernel: [693198.277466] ata3.00: cmd 35/00:08:d8:3f:7b/00:00:35:01:00/e0 tag 0 dma 4096 out
Nov 25 13:03:30 charrm kernel: [693198.277466] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 25 13:03:30 charrm kernel: [693198.277472] ata3.00: status: { DRDY }
Nov 25 13:03:30 charrm kernel: [693198.277484] ata3: hard resetting link
Nov 25 13:03:30 charrm kernel: [693198.277488] ata3: nv: skipping hardreset on occupied port
Nov 25 13:03:31 charrm kernel: [693198.744091] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 13:03:31 charrm kernel: [693198.758083] ata3.00: configured for UDMA/133
Nov 25 13:03:31 charrm kernel: [693198.758114] sd 2:0:0:0: [sda]
Nov 25 13:03:31 charrm kernel: [693198.758118] Result: hostbyte=0x00 driverbyte=0x08
Nov 25 13:03:31 charrm kernel: [693198.758124] sd 2:0:0:0: [sda]
Nov 25 13:03:31 charrm kernel: [693198.758127] Sense Key : 0xb [current] [descriptor]
Nov 25 13:03:31 charrm kernel: [693198.758133] Descriptor sense data with sense descriptors (in hex):
Nov 25 13:03:31 charrm kernel: [693198.758137] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Nov 25 13:03:31 charrm kernel: [693198.758155] 00 00 00 00
Nov 25 13:03:31 charrm kernel: [693198.758163] sd 2:0:0:0: [sda]
Nov 25 13:03:31 charrm kernel: [693198.758166] ASC=0x0 ASCQ=0x0
Nov 25 13:03:31 charrm kernel: [693198.758172] sd 2:0:0:0: [sda] CDB:
Nov 25 13:03:31 charrm kernel: [693198.758175] cdb[0]=0x8a: 8a 00 00 00 00 01 35 7b 3f d8 00 00 00 08 00 00
Nov 25 13:03:31 charrm kernel: [693198.758196] end_request: I/O error, dev sda, sector 5192237016
Nov 25 13:03:31 charrm kernel: [693198.758206] Buffer I/O error on device sda4, logical block 624615675
Nov 25 13:03:31 charrm kernel: [693198.758210] lost page write due to I/O error on sda4
Nov 25 13:03:31 charrm kernel: [693198.758238] ata3: EH complete

So far I got these I/O errors in 2 out of 3 drives, SMART weirdness is going strong in all of them.

So… have I just got 3 bad drives, and only need to replace them, or is there perhaps some compatibility problem and I need to forget this brand or model?

Any compatibility red flags for WD30EFRX, or rumours?

FWIIW, I booted with kernel flags sata_nv.swncq=0 libata.force=noncq, as this seemed to help with I/O errors but after running for couple of days the problems returned.

Thanks in advance for any suggestions.

Hello,

That’s not cool at all, I got 2 reds not problem at all. Try to reach WD, is my understanding they are working close the users with Red problems…

Try to rule out a software (kernel) issue first.

Post your ‘smartctl -a’ output of each device (or only one, if they ar the same).

I experienced big problems in the past, with my old mobo (a M4N72-E,  also MCP based) regarding the SATA controller and the kernel. In my case it was first workarounded by using the kernel parameter ‘pcie_aspm=off’. Then the kernel was fixed later.  The power management of the pcie bus affected the sata controller and i got all sort of weird and dangerous issues with several disks. In my case the problems were with Fedora, but i noticed other kernel worked fine, so eventually the guilty commponent was found. 

Alternativelly,  you might try a different kernel, maybe even using a livecd to perform tests:

http://www.sysresccd.org/SystemRescueCd_Homepage

Also check your logs, do you experience “timeout” “ata bus error” “invalid chs 0” (or something similar to those)?

Well, sure, I am trying to get WD’s attention, after all the promise of dedicated premium 24/7 support was one of the reason I bought these drives. I am about to find out what exactly does “dedicated”, “premium” and “24/7” mean. 

Opened troubleshooting case under WD Red product family over 24hr ago, so far got only canned email, which states: 

“Thank you for your email. Our goal is to answer your email within one (1) business day. However, sometimes due to heavy volume it might take little longer to respond.”

Not sure how to reconcile this with “24/7”. Anyway, the fact that deficated premium support experts are overloaded is not a good sign for this product line.

Thanks for taking a look. 

Yes, the point of me posting this here is to figure out if I have a software or hardware incompatibility or just bad drives.

My SATA controller is not that new, so hopefully kernel bugs related to it should be ironed out by now, here is relavant lspci output:

00:0e.0 IDE interface [0101]: NVIDIA Corporation MCP51 Serial ATA Controller [10de:0266] (rev a1) (prog-if 85 [Master SecO PriO])
Subsystem: ABIT Computer Corp. Device [147b:1c26]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin A routed to IRQ 21
Region 0: I/O ports at 09f0 [size=8]
Region 1: I/O ports at 0bf0 [size=4]
Region 2: I/O ports at 0970 [size=8]
Region 3: I/O ports at 0b70 [size=4]
Region 4: I/O ports at e000 [size=16]
Region 5: Memory at fe02d000 (32-bit, non-prefetchable) [size=4K]
Capabilities:
Kernel driver in use: sata_nv
00: de 10 66 02 07 00 b0 00 a1 85 01 01 00 00 00 00
10: f1 09 00 00 f1 0b 00 00 71 09 00 00 71 0b 00 00
20: 01 e0 00 00 00 d0 02 fe 00 00 00 00 7b 14 26 1c
30: 00 00 00 00 44 00 00 00 00 00 00 00 0a 01 03 01

Regarding errors in the logs, yes there are both timeouts and ATA bus errors, samples are posted above. Haven’t got any invalid CHS yet.

Here goes smartctl -a output, note corruption in Information section, this is intermittent, after few tries output is normal, as for  another disk, posted below:

smartctl -a /dev/sdd
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright © 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: C WD30EFRX-68AX9N0 ?
Serial Number: WD-WMC1T0500586
Firmware Version: .00A80WD
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: Unknown(0x0000) (unknown minor revision code: 0x746b)
Local Time is: Mon Nov 26 21:26:26 2012 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 34) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: (40080) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 402) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 1
3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 2
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 224
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 114 108 000 Old_age Always - 36
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 3
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Interrupted (host reset) 20% 193 -

2 Extended offline Interrupted (host reset) 90% 184 -

3 Short offline Completed without error 00% 171 -

4 Extended offline Completed without error 00% 7 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sda
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright © 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD30EFRX-68AX9N0
Serial Number: WD-WMC1T0674209
LU WWN Device Id: 5 0014ee 0ae197a14
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Nov 26 21:30:59 2012 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (41520) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 417) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 189 188 021 Pre-fail Always - 5550
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 9
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 286
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 9
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 3
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 5
194 Temperature_Celsius 0x0022 108 104 000 Old_age Always - 42
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 254 -

2 Extended offline Interrupted (host reset) 50% 254 -

3 Short offline Completed without error 00% 230 -

4 Extended offline Completed without error 00% 34 -

5 Selective offline Completed without error 00% 25 -

6 Selective offline Completed without error 00% 24 -

7 Selective offline Interrupted (host reset) 90% 24 -

8 Conveyance offline Completed without error 00% 24 -

9 Short offline Completed without error 00% 22 -

#10 Short offline Interrupted (host reset) 90% 22 -
#11 Extended offline Interrupted (host reset) 90% 22 -
#12 Short offline Completed without error 00% 21 -
#13 Short offline Completed without error 00% 21 -
#14 Extended offline Completed without error 00% 18 -
#15 Short offline Completed without error 00% 11 -
#16 Short offline Aborted by host 70% 11 -
#17 Extended offline Aborted by host 90% 11 -
#18 Extended offline Aborted by host 10% 11 -
#19 Extended offline Interrupted (host reset) 80% 1 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

A non-update for those following at home:

It has been over three (3) business days since I opened the support case. Usually this would be more than enough time for regular non-premium support of a reputable company to respond. Not a peep from dedicated premium support experts of WD. It is becoming clear to me what “dedicated”, “premium” and “24/7” stand for.

Dedicated premium support experts seem to be quite busy responding to negative reviews of WD30EFRX on Newegg (it is reassuring, really, to find out that DOA drive is not “intended experience with the product”).

A few questions for those that are still happy with their WD30EFRX purchase:

 - if you are using the drive(s) on Linux, what is your kernel version and SATA controller?

 - are you drive’s serial numbers in the range WD-WMC1T05* or WD-WMC1T06*

 - are you using the drives in software RAID (mdadm)? (mine, the one with s/n WD-WMC1T05*, gets kicked out of RAID1 every few hours, even without file system activity)

I think the best course of action for your case is to:

#1: post the problem on a specialized forum (like the distribution forum)

#2: open a bug-report on the distribution bugtracker

In my opinion, it looks like a software problem (of course, i cannot be sure). A workaround might even exist, but for that to happen/be known i would open a thread on the distribution forum first.

There are some things that you might test on your own, like jumper the drive (if possible/supported) to 3Gbps-1.5Gbps instead of 6Gbps-3Gbps. The Spread Spectrum Clocking (SSC) might also be worth a try.

You can check other stuff i tested in my old (solved long ago) fedora bug-report:

https://bugzilla.redhat.com/show_bug.cgi?id=611350

rtguille:

Thank you for the suggestions.

Re. opening a bug with distribution: this is not likely to amount to anything, as this is not specific to particular distribution, after a bit of research there are plenty of reports of similar symptoms on various combinations of hardware and Linuxes. This bug, with some very similar symptoms, has been open since Dec 2009, no solution:  https://bugzilla.redhat.com/show_bug.cgi?id=549981

I expect WD to have resources to determine compatibility of their hardware with Linux. I only want one thing – authoritative response, whether the drives are supposed to work on Linux or not. Declaring WD30EFRX incompatible with Linux (on some hardware) might be enough of incentive (or not) for Linux community to take a look at this problem.

Regarding trying 1.5Gbps, adding libata.force=1.5.Gbps kernel flag does nothing, drives still work at 3.0Gbps. Does anyone know if this document  http://wdc.custhelp.com/app/answers/detail/a_id/1271/related/1/session/L2F2LzEvdGltZS8xMzU0Mjg5MTIzL3NpZC9jVFBWZ0FjbA%3D%3D applies to WD30EFRX, and if so, which section?

Regarding obtaining an ‘authoritative response’, try: http://vger.kernel.org/vger-lists.html

and subscribe to the kernel mailing lists: “linux-scsi” and “linux-ide”. Try “linux-scsi” first.

At least you might get an answer from the mantainer of the specific linux kernel subsystem.

Try “linux-scsi” first. Please be very polite there, because it has very high visibility. I used the ‘linux-usb’

on one ocassion and i can say i did obtain a response to some weird and rare issue that i had no hope

for solving.

The bug-report https://bugzilla.redhat.com/show_bug.cgi?id=549981  may never be fixed, too many

different configurations in a same bug-report, one must open a specific bug-report and let the mantainer

flag it as a dupplicate if that is the case. They might have the same issue, but the trigger might be different, good

luck diagnosing these. It will be very hard.

As for: http://wdc.custhelp.com/app/answers/detail/a_id/1271/related/1/session/L2F2LzEvdGltZS8xMzU0Mjg5MTIzL…

I don’t know. If the drive has jumpers, i would try. Alternatively, contact WD and ask them, they will reply in a couple of days.

Cheers.

I have similar problems with an older mainboard. Has this problem been resolved?

ISTM that the entire 512-byte Identify Device information block has been shifted to the left by 1 word, ie the leading 2 bytes have been chopped off. The serial numbers probably have leading spaces, so they look OK. The corrupt firmware version (.00A80WD) appears to have picked up a trailing “WD” from the next field.

Device Model: C WD30EFRX-68AX9N0 ?
Device Model: WDC WD30EFRX-68AX9N0

Serial Number: WD-WMC1T05nnnnn
Serial Number: WD-WMC1T06nnnnn

Firmware Version: .00A80WD
Firmware Version: 80.00A80

LU WWN Device Id: 5 0014ee 0ae197a14
no WWN output for probem drive

To see the raw hex data, use …

smartctl -r ioctl,2 -i /dev/ice

Alternatively, CrystalDiskInfo organises the Identify Device information in a more readable format. Note that the data are little endian.

Here is an example for a WD20EARS:
http://www.users.on.net/~fzabkar/HDD/WD20EARS/NO_AAM.TXT