WDC WD3000FYYZ on QNAP: I/O error, sense_key=0x3, asc=0x11, ascq=0x4

alger · July 4, 2013, 8:51pm

Installed eight WD3000FYYZ drives on QNAP TS-EC879U-RP (8-bay Linux-based NAS box) in RAID6. All brand new. During array initialization, several messages popped up:

Host: Drive7 read error corrected.
[Harddisk 7] I/O error, sense_key=0x3, asc=0x11, ascq=0x4, CDB=28 00 26 92 03 30 00 04 00 00 .
[Harddisk 7] medium error. Please run bad block scan on this drive or replace the drive if the error persists.

… about 200 of “read error corrected”, four each of I/O and “medium” errors. Since I ran initialization three times, this may have caused three instances of I/O and “medium” errors on the same block. All four I/O error entries have the same numbers - does it mean the error happens on the same physical block / sector of the hard drive?

According to some research, the I/O error above is RAID management reporting a URE but with failed re-allocation, “Unrecovered Read Error, Auto Reallocation Failed”. If I am getting it right, that means the drive (or the RAID brain) couldn’t write the data to a spare sector. Yet the initialization continued and eventually ended successfully. So I am a little puzzled as to what it all means.

Ran a “short” (5 minutes) SMART test - nothing. “Raw_Read_Error_Rate” - 0.

Running a “bad block scan”. This will take about 7 hours…

Questions:

What does this all mean? Is the drive bad or is it not? If it is bad, why does “Raw_Read_Error_Rate” show zero?
Are these errors enough to want / need to replace the disk? Will they be enough for WDC to issue an RMA?
If not, what tests do I need to run to determine that the drive is bad, that would be sufficient for WDC purposes?

Thanks.

John012 · July 5, 2013, 9:52pm

You could try placing the drive on a computer as secondary and test with Data Lifeguard Diagnostics.

How to test a drive for problems using Data Lifeguard Diagnostics for Windows

http://wdc.custhelp.com/app/answers/detail/a_id/940

alger · July 5, 2013, 10:47pm

Thanks John,

I am running a “complete” SMART test on the drive on QNAP (so far no error messages), 70% done after four hours. If NPF, will also run WinDlg on it. If still NPF, what’s next? Where did those error messages come from?

Also, what’s the purpose of running the “extended test” if a bad block scan and “Quick Test” found nothing? What does the extended test do, exactly, that the first two don’t? The extended test will take about seven hours - I really don’t have the time for it, and if I have to run it, I’d like to have a better idea what it’s for.

Thanks.

alger · July 6, 2013, 8:51am

WinDLG Extended test passed:

Test Option:	EXTENDED TEST
Model Number:	WDC WD3000FYYZ-01UL1B0
Unit Serial Number:	WD- ******** 3008
Firmware Number:	01.01K01
Capacity:	3000.59 GB
SMART Status:	PASS
Test Result:	PASS
Test Time:	01:43:21, July 06, 2013

Will try re-silvering the RAID set with that disk in a different bay, and see what happens.

David64 · July 6, 2013, 11:12am

unfortuantely, i had issues with certain raid controllers (HP) on linux before, resulting in the linux kernel even crashing sometimes. if the driver for these raid controllers aren’t really stable and properly tested, that might actually be the cause for the issues you are experiencing.

so, you would need to thourougly test this with a non-linux based operating system such as windows, to confirm the HDDs are properly working. might as well see if others have similar issues on linux with said sata/raid controller (-> google) first. might save you trouble and time!

alger · July 7, 2013, 1:38am

so, you would need to thourougly test this with a non-linux based operating system such as windows, to confirm the HDDs are properly working. might as well see if others have similar issues on linux with said sata/raid controller (-> google) first. might save you trouble and time!

Thanks David, that’s the standard operating procedure, already went through it (and then some) with no good results.