WD RED 4Tb Hds - strange errors in QNAP NAS TS-653A

Hello to Everybody !

I’m new to this community but I need the help from the Guru or the WD technicians.

I have bought few months ago one NAS QNAP 653A with 4 HD RED NAS of 4TB configured as one RAID5 volume.
The model of HD are: WD40EFRX-68WT0N0 firmware ver, 82.00A82
The NAS has been upgraded in the time to the latest firmware: 4.2.4 of 13/03/2017

I recently having problems with the NAS: it seems that the disk 1 and 2 of my RAID5 are having problems and the NAS SW said are Abnormal and put RAID5 in degraded mode /read only mode.
It happen now 3 times…the firs 2 with 2 HD and the last one with only HD2.

I have made some test and after the test the conditions of the HD are good so I don’t understand where is the problem I’m having .
If fact the TEST is Ok and also the SCAN operation made from the QNAP console seem OK.
Now I’m rebuilding the array to recove from read only mode, since all the 4 HD are green and in “good” status (at least this is what the STORAGE - DISK/VKBOD menu says.

Look at the log of the NAS:

[Pool 1] RAID Group 1 is in degraded and read-only mode.
[Bad Stripe Log]: Host Drive 1 sector(78291264, 8, 1) error found.
[Bad Block Log]: Host Drive 1 sector(78291264, 8, 1) error found.
[Volume DataVol1, Pool 1] Host: Disk 2 failed.
[Bad Block Log]: Host Drive 2 sector(78291264, 8, 1) error found.
A drive has been detected but is inaccessible. Please check it for faults.
A drive has been detected but is inaccessible. Please check it for faults.
Host: Disk 2 Read I/O error, sense_key=0x0, asc=0x0, ascq=0x0, CDB=88 00 00 00 00 00 00 08 00 40 00 00 00 08 00 00 .
Host: Disk 2 unplugged.
Hot-remove Host: Disk 2 failed.
Host: Disk 1 unplugged.
Hot-remove Host: Disk 1 failed.
Host: Disk 2 Read I/O error, sense_key=0x0, asc=0x0, ascq=0x0, CDB=88 00 00 00 00 00 00 09 80 b0 00 00 00 08 00 00 .
[…]
Host: Disk 2 pd error cleared.
[Host: Disk 2] Bad Blocks Scan completed.
Host: Disk 1 pd error cleared.
[Host: Disk 1] Bad Blocks Scan completed.
[Antivirus] Virus definitions updated.
[Host: Disk 1] Started scanning bad blocks.
[Host: Disk 2] Started scanning bad blocks.
[…]
A drive has been detected but is inaccessible. Please check it for faults.
[Volume DataVol1, Pool 1] Host: Disk 1 failed.
[Volume DataVol1, Pool 1] Host: Disk 2 failed.
A drive has been detected but is inaccessible. Please check it for faults.
Host: Disk 1 Read I/O error, sense_key=0x0, asc=0x0, ascq=0x0, CDB=88 00 00 00 00 00 00 08 00 40 00 00 00 08 00 00 .

As you can see if I do a complete scan of the HD then the error condition can be cleared and the RAID5 can be recovered…after one day however I have again problem , now only with disk 2.

Probably I need to replace these 2 hds but it is the first time I hade 2 drive fault of WD RED NAS HD at the same time ! I jyst want to be sure that I need to replace them…they should be under warranty but the NAS will probably loose all the data if I will do something wrong during the replacement operation.

Can you help me and confir the need to replace them ?

Thanks,
Skyeagle

Dear dsWv42,

Here below I copy the output of the QNAP Diagnostic tool v. 1.1.1, subsection HDD analyzer, regarding the SMART report:

Model : TS-653A
Firmware : 4.2.4 (20170313)
NAS : Q161I05043

==========[ BAY 1, WDCWD40EFRX-68WT0N03815447, WD-WCC4E2ZZ76RK ]
ID Description RawValue Value WorstValue Threshold Status
001 Raw_Read_Error_Rate 0x0 200 200 051 Good
003 Spin_Up_Time 0x1ec3 182 177 021 Good
004 Start_Stop_Count 0x35a 100 100 000 Good
005 Retired_Block_Count 0x0 200 200 140 Good
007 Seek_Error_Rate 0x0 100 253 000 Good
009 Power-On_Hours 0xc2f 096 096 000 Good
010 Spin_Retry_Count 0x0 100 100 000 Good
011 Calibration_Retry_Count 0x0 100 253 000 Good
012 Power_Cycle_Count 0x2b 100 100 000 Good
192 Power-Off_Retract_Count 0x19 200 200 000 Good
193 Load_Cycle_Count 0x101e 199 199 000 Good
194 Temperature_Celsius 0x1e 122 114 000 Good
196 Reallocated_Event_Count 0x0 200 200 000 Good
197 Current_Pending_Sector 0x0 200 200 000 Good
198 Uncorrectable_Sector_Count 0x0 100 253 000 Good
199 SATA_R-Error_Count 0x0 200 200 000 Good
200 Multi_Zone_Error_Rate 0x0 100 253 000 Good

==========[ BAY 2, WDCWD40EFRX-68WT0N03815447, WD-WCC4E6ZZPZKF ]
ID Description RawValue Value WorstValue Threshold Status
001 Raw_Read_Error_Rate 0x0 200 200 051 Good
003 Spin_Up_Time 0x1eaa 183 178 021 Good
004 Start_Stop_Count 0x345 100 100 000 Good
005 Retired_Block_Count 0x0 200 200 140 Good
007 Seek_Error_Rate 0x0 200 200 000 Good
009 Power-On_Hours 0xc2f 096 096 000 Good
010 Spin_Retry_Count 0x0 100 100 000 Good
011 Calibration_Retry_Count 0x0 100 253 000 Good
012 Power_Cycle_Count 0x2b 100 100 000 Good
192 Power-Off_Retract_Count 0x19 200 200 000 Good
193 Load_Cycle_Count 0xfc6 199 199 000 Good
194 Temperature_Celsius 0x1d 123 114 000 Good
196 Reallocated_Event_Count 0x0 200 200 000 Good
197 Current_Pending_Sector 0x0 200 200 000 Good
198 Uncorrectable_Sector_Count 0x0 100 253 000 Good
199 SATA_R-Error_Count 0x0 200 200 000 Good
200 Multi_Zone_Error_Rate 0x0 100 253 000 Good

==========[ BAY 3, WDCWD40EFRX-68WT0N03815447, WD-WCC4E4JHAJDT ]
ID Description RawValue Value WorstValue Threshold Status
001 Raw_Read_Error_Rate 0x0 200 200 051 Good
003 Spin_Up_Time 0x1ef5 181 177 021 Good
004 Start_Stop_Count 0x34e 100 100 000 Good
005 Retired_Block_Count 0x0 200 200 140 Good
007 Seek_Error_Rate 0x0 100 253 000 Good
009 Power-On_Hours 0xc52 096 096 000 Good
010 Spin_Retry_Count 0x0 100 100 000 Good
011 Calibration_Retry_Count 0x0 100 253 000 Good
012 Power_Cycle_Count 0x2b 100 100 000 Good
192 Power-Off_Retract_Count 0x1a 200 200 000 Good
193 Load_Cycle_Count 0xee4 199 199 000 Good
194 Temperature_Celsius 0x1f 121 113 000 Good
196 Reallocated_Event_Count 0x0 200 200 000 Good
197 Current_Pending_Sector 0x0 200 200 000 Good
198 Uncorrectable_Sector_Count 0x0 100 253 000 Good
199 SATA_R-Error_Count 0x0 200 200 000 Good
200 Multi_Zone_Error_Rate 0x0 100 253 000 Good

==========[ BAY 4, WDCWD40EFRX-68WT0N03815447, WD-WCC4E6ZZPATN ]
ID Description RawValue Value WorstValue Threshold Status
001 Raw_Read_Error_Rate 0x0 200 200 051 Good
003 Spin_Up_Time 0x1f0e 181 176 021 Good
004 Start_Stop_Count 0x34b 100 100 000 Good
005 Retired_Block_Count 0x0 200 200 140 Good
007 Seek_Error_Rate 0x0 100 253 000 Good
009 Power-On_Hours 0xc51 096 096 000 Good
010 Spin_Retry_Count 0x0 100 100 000 Good
011 Calibration_Retry_Count 0x0 100 253 000 Good
012 Power_Cycle_Count 0x2b 100 100 000 Good
192 Power-Off_Retract_Count 0x1a 200 200 000 Good
193 Load_Cycle_Count 0xece 199 199 000 Good
194 Temperature_Celsius 0x1e 122 115 000 Good
196 Reallocated_Event_Count 0x0 200 200 000 Good
197 Current_Pending_Sector 0x0 200 200 000 Good
198 Uncorrectable_Sector_Count 0x0 100 253 000 Good
199 SATA_R-Error_Count 0x0 200 200 000 Good
200 Multi_Zone_Error_Rate 0x0 100 253 000 Good

As I written the bad disk were the number 1 and 2.
This report is after the rebuilding of the RAID volume after the successufl scan of HD2.

Hope it will help to understand what is happening.

Regards,
SkyEagle

Greetings Sky Eagle,
I am curious, I have the same drives (WD40EFRX-68WT0N0 82.00A82) in my Qnap, and am having one of my disks perform in the same manner. I did find a fault in one of the sections titled:

197 Current_Pending_Sector 0x0 200 200 000

I am curious, what was your solution, did you purchase a new drive, or did you find a way to correct the issue on your Qnap?

Same here on HDD-1 (out of 4) on a TS453-PRO (FW: 4.3.4.0435).

Removed the drive and tested it with ‘HD Sentinel Pro’.

The abnormal status reported by QTS (and subsequently HD Sentinel), is CRC or ‘Cyclic Redundancy Check’.

This seems to indicate that the stored hash value for that cluster/sector (or in my case, two sectors) doesn’t match up to the data stored in that sector. But even more bizarrely than that, the offending sectors are ‘moving’.

With repeated tests in HD Sentinel, some clusters in the problematic zone(s) get flagged as pending, then on a subsequent run, will pass and be removed from the list of ‘pending’ sectors, only to be replaced by other clusters within the same sector(s).

It’s also worth pointing out that at no point does any sector get flagged as outright ‘BAD’. This is probably due to the fact that the read errors are occurring on random clusters within the problematic sector, and are not consistent.

The upshot is, without the drive outright marking the clusters / sectors that are problematic as ‘BAD’, an RMA is impossible. This would suggest to the cynic in me, that there might be something WD put into the ‘NASware 3.0’ firmware that outright resists marking clusters bad so as to prevent people from RMA’ing drives from what could be seen as a ‘minor’ issue with that drive.

However, I would say that after discovering this issue, and given that these drives are running 24/7 (you know, as they advertise!) that the drive exhibiting issues, cannot be trusted with my valuable data, outright bad sectors or not.

I’ve bought a replacement (WD40EFRX-68N32N0) and will attempt to test the problem drive until it marks the suspect sectors as bad (in order to RMA), or the problem resolves itself, but still have the remaining original 3 ‘WD40EFRX-68WT0N0’ drives still in the NAS. Anxious times ahead, no doubt.

I am also having a similar sounding problem QNAP TS-563 (5 bay) loaded with WD 4 TB RED (WD40EFRX) bay 4&5 say unplugged, or error. I switch the drives around and they work fine in bays 1-3. I’ve tried 6 different WD drives all work in bays 1-3, none work in bays 4&5. I’ve had no trouble with these drives if installed in Windows PC. Sent server to qnap they replaced the back pane tested it for 2 days and sent it back “working” 10 mins after I plug in the WD drives I get the error again. So the server is working fine and the drives are fine… What is the problem? Sorry no solution.

I have QNAP TS853A disks 2 - 4 -6 get ejected regardless of what drive is in (as I have changed them) and QNAP just advise more tests! this unit is only a few months old and has been doing this after just a few weeks. That would mean 3 out of my 8 disks would have to go bad get rejected then they are tested and found good, only to be repeated, over a lengthy period of time. My disks are WD80EFZX-68UW8N0. Question, is the controller in WD disks compatible with QNAP disk error checking? I have stopped QNAP disk checking and this seems to have sorted the problem although I need a longer test period really.