I have never had so much trouble with new HDs as with this set of 3 Red 3TB drives. Usually it is just install, partition and forget. Given my previous good experince, I, perhaps naively, find it hard to believe I got 3 bum drives, and wonder what is going on.
I have the 3 WD30EFRX in a Linux (3.6.6) machine, all connected to same motherboard (Abit NF-M2 nView(C51PVMCP51)), there is also 4th HD, older Samsung HD154UI, all drives are using same type of SATA cables. 0 problems with Samsung.
Ran badblocks -w on the drives, no problems reported.
The problems I am seeing are twofold:
- SMART problems: using smartmontools-6.0 I am getting all kinds of weirdness: like running same command 3 times in a row, I get 3 different outputs:
1)
smartctl -l scterc /dev/sdc
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
Unknown SCT Status format version 258, should be 2 or 3.
SCT (Get) Error Recovery Control command failed
smartctl -l scterc /dev/sdc
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
3)
smartctl -l scterc /dev/sdc
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.6.6-1-ARCH] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
SCT Commands not supported
Then there are the weird messages from smartd in the logs:
Nov 25 15:32:59 charrm smartd[3823]: Device: /dev/sdc [SAT], unknown self-test status 0xa0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 1 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 3 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 4 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 5 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 7 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 9 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 10 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 11 = 0
Nov 25 16:03:00 charrm smartd[3823]: Device: /dev/sdc [SAT], same Attribute has different ID numbers: 12 = 0
And by running smartctl -x few times in a row I can have the WD30EFRX drive throw up errors like this:
Nov 25 15:22:03 charrm kernel: [701510.947013] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov 25 15:22:03 charrm kernel: [701510.947013] ata6.00: failed command: READ LOG EXT
Nov 25 15:22:03 charrm kernel: [701510.947013] ata6.00: cmd 2f/00:06:03:00:00/00:00:00:00:00/00 tag 0 pio 3072 in
Nov 25 15:22:03 charrm kernel: [701510.947013] res 51/84:04:02:00:00/84:00:00:00:00/00 Emask 0x10 (ATA bus error)
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6.00: status: { DRDY ERR }
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6.00: error: { ICRC ABRT }
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6: hard resetting link
Nov 25 15:22:03 charrm kernel: [701510.950099] ata6: nv: skipping hardreset on occupied port
Nov 25 15:22:03 charrm kernel: [701511.416599] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 15:22:03 charrm kernel: [701511.423516] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
Nov 25 15:22:03 charrm kernel: [701511.423526] ata6.00: revalidation failed (errno=-5)
Nov 25 15:22:08 charrm kernel: [701516.416558] ata6: hard resetting link
Nov 25 15:22:08 charrm kernel: [701516.416568] ata6: nv: skipping hardreset on occupied port
Nov 25 15:22:09 charrm kernel: [701516.883269] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 15:22:09 charrm kernel: [701516.980346] ata6.00: configured for UDMA/133
Nov 25 15:22:09 charrm kernel: [701516.980406] ata6: EH complete
Nov 25 15:22:24 charrm kernel: [701532.431206] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov 25 15:22:24 charrm kernel: [701532.431221] ata6.00: failed command: READ LOG EXT
Nov 25 15:22:24 charrm kernel: [701532.431235] ata6.00: cmd 2f/00:06:03:00:00/00:00:00:00:00/00 tag 0 pio 3072 in
Nov 25 15:22:24 charrm kernel: [701532.431235] res 51/84:02:04:00:00/84:00:00:00:00/00 Emask 0x10 (ATA bus error)
Nov 25 15:22:24 charrm kernel: [701532.431241] ata6.00: status: { DRDY ERR }
Nov 25 15:22:24 charrm kernel: [701532.431245] ata6.00: error: { ICRC ABRT }
Then there is this output from smartctl -x:
Warning! SATA Phy Event Counters error: invalid SMART checksum.
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 2 Command failed due to ICRC error
0x0002 2 28 R_ERR response for data FIS
0x0003 2 28 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 0 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 706937 Vendor specific
The counter for ‘R_ERR response’ increases almost every time smartclt -x is excuted.
- Then there are plain I/O errors not related to any SMART activity, as far as I can see:
Nov 25 05:03:30 charrm kernel: [664398.333684] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 25 05:03:30 charrm kernel: [664398.333698] ata6.00: failed command: WRITE DMA EXT
Nov 25 05:03:30 charrm kernel: [664398.333712] ata6.00: cmd 35/00:00:00:44:c1/00:04:ec:00:00/e0 tag 0 dma 524288 out
Nov 25 05:03:30 charrm kernel: [664398.333712] res 40/00:00:02:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 25 05:03:30 charrm kernel: [664398.333718] ata6.00: status: { DRDY }
Nov 25 05:03:30 charrm kernel: [664398.333730] ata6: hard resetting link
Nov 25 05:03:30 charrm kernel: [664398.333734] ata6: nv: skipping hardreset on occupied port
Nov 25 05:03:31 charrm kernel: [664398.800302] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 05:03:31 charrm kernel: [664398.813998] ata6.00: configured for UDMA/133
Nov 25 05:03:31 charrm kernel: [664398.814065] sd 5:0:0:0: [sdd]
Nov 25 05:03:31 charrm kernel: [664398.814069] Result: hostbyte=0x00 driverbyte=0x08
Nov 25 05:03:31 charrm kernel: [664398.814075] sd 5:0:0:0: [sdd]
Nov 25 05:03:31 charrm kernel: [664398.814078] Sense Key : 0xb [current] [descriptor]
Nov 25 05:03:31 charrm kernel: [664398.814084] Descriptor sense data with sense descriptors (in hex):
Nov 25 05:03:31 charrm kernel: [664398.814088] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Nov 25 05:03:31 charrm kernel: [664398.814106] 00 00 00 01
Nov 25 05:03:31 charrm kernel: [664398.814115] sd 5:0:0:0: [sdd]
Nov 25 05:03:31 charrm kernel: [664398.814118] ASC=0x0 ASCQ=0x0
Nov 25 05:03:31 charrm kernel: [664398.814124] sd 5:0:0:0: [sdd] CDB:
Nov 25 05:03:31 charrm kernel: [664398.814126] cdb[0]=0x2a: 2a 00 ec c1 44 00 00 04 00 00
Nov 25 05:03:31 charrm kernel: [664398.814142] end_request: I/O error, dev sdd, sector 3972088832
Nov 25 05:03:31 charrm kernel: [664398.814189] ata6: EH complete
and, in other drive:
Nov 25 13:03:30 charrm kernel: [693198.277437] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 25 13:03:30 charrm kernel: [693198.277451] ata3.00: failed command: WRITE DMA EXT
Nov 25 13:03:30 charrm kernel: [693198.277466] ata3.00: cmd 35/00:08:d8:3f:7b/00:00:35:01:00/e0 tag 0 dma 4096 out
Nov 25 13:03:30 charrm kernel: [693198.277466] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 25 13:03:30 charrm kernel: [693198.277472] ata3.00: status: { DRDY }
Nov 25 13:03:30 charrm kernel: [693198.277484] ata3: hard resetting link
Nov 25 13:03:30 charrm kernel: [693198.277488] ata3: nv: skipping hardreset on occupied port
Nov 25 13:03:31 charrm kernel: [693198.744091] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 25 13:03:31 charrm kernel: [693198.758083] ata3.00: configured for UDMA/133
Nov 25 13:03:31 charrm kernel: [693198.758114] sd 2:0:0:0: [sda]
Nov 25 13:03:31 charrm kernel: [693198.758118] Result: hostbyte=0x00 driverbyte=0x08
Nov 25 13:03:31 charrm kernel: [693198.758124] sd 2:0:0:0: [sda]
Nov 25 13:03:31 charrm kernel: [693198.758127] Sense Key : 0xb [current] [descriptor]
Nov 25 13:03:31 charrm kernel: [693198.758133] Descriptor sense data with sense descriptors (in hex):
Nov 25 13:03:31 charrm kernel: [693198.758137] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Nov 25 13:03:31 charrm kernel: [693198.758155] 00 00 00 00
Nov 25 13:03:31 charrm kernel: [693198.758163] sd 2:0:0:0: [sda]
Nov 25 13:03:31 charrm kernel: [693198.758166] ASC=0x0 ASCQ=0x0
Nov 25 13:03:31 charrm kernel: [693198.758172] sd 2:0:0:0: [sda] CDB:
Nov 25 13:03:31 charrm kernel: [693198.758175] cdb[0]=0x8a: 8a 00 00 00 00 01 35 7b 3f d8 00 00 00 08 00 00
Nov 25 13:03:31 charrm kernel: [693198.758196] end_request: I/O error, dev sda, sector 5192237016
Nov 25 13:03:31 charrm kernel: [693198.758206] Buffer I/O error on device sda4, logical block 624615675
Nov 25 13:03:31 charrm kernel: [693198.758210] lost page write due to I/O error on sda4
Nov 25 13:03:31 charrm kernel: [693198.758238] ata3: EH complete
So far I got these I/O errors in 2 out of 3 drives, SMART weirdness is going strong in all of them.
So… have I just got 3 bad drives, and only need to replace them, or is there perhaps some compatibility problem and I need to forget this brand or model?
Any compatibility red flags for WD30EFRX, or rumours?
FWIIW, I booted with kernel flags sata_nv.swncq=0 libata.force=noncq, as this seemed to help with I/O errors but after running for couple of days the problems returned.
Thanks in advance for any suggestions.