WDC WD5000AAKS wont reallocate bad sectors


#1

Hi.  My computer has a dual boot with Windows XP and the latest opensuse linux.  Recently, i saw an error message in the system log that said I had 1 unreadable (pending) sector and 1 offline uncorrectable sectors.  I read everything online including SMART’s homepage and decided to force my hardrive to reallocate the sector by writing to it.

I located the sector by running a short self test (smartctl -l short /dev/sda).  Next I determined the sector was in my Windows ntfs-3g partition.  I confirmed that it didn’t point to any files (this is irrevelent for discussion).  Finally I used dd to write to the sector.

The command returned with an error message.  I checked the system log which recorded the error.  In the system log’s error post, it said autoreallocate failed.  I checked the number of reallocated-sector-count in smartctl -A and it says 0, so i know i haven’t filled up the reserve sectors.  Why didn’t the hard drive’s firmware remap the bad sector?

The smart statistics and system error log are posted in the next post. 


#2

smartctl 5.39 2009-08-08 r2872~ [x86_64-unknown-linux-gnu] (openSUSE RPM)

Copyright © 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Blue Serial ATA family

Device Model:     WDC WD5000AAKS-22TMA0

Serial Number:    WD-WCAPW2092043

Firmware Version: 12.01C01

User Capacity:    500,107,862,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed May  5 12:27:59 2010 PDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 121) The previous self-test completed having

the read element of the test failed.

Total time to complete Offline 

data collection: (12000) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine 

recommended polling time: (   2) minutes.

Extended self-test routine

recommended polling time: ( 150) minutes.

Conveyance self-test routine

recommended polling time: (   6) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0003   168   165   021    Pre-fail  Always       -       6566

  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2485

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       9731

 10 Spin_Retry_Count        0x0012   100   100   051    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2470

192 Power-Off_Retract_Count 0x0032   198   198   000    Old_age   Always       -       1956

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2485

194 Temperature_Celsius     0x0022   107   101   000    Old_age   Always       -       43

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       1

198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       1

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0

SMART Error Log Version: 1

ATA Error Count: 1343 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It “wraps” after 49.710 days.

Error 1343 occurred at disk power-on lifetime: 9719 hours (404 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  – -- – -- – -- –

  40 51 00 73 9d 59 e1  Error: UNC at LBA = 0x01599d73 = 22650227

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  – -- – -- – -- – --  ----------------  --------------------

  c8 00 08 70 9d 59 01 00   6d+06:48:15.915  READ DMA

  27 00 00 00 00 00 00 00   6d+06:48:15.915  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 00 00   6d+06:48:15.912  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 00   6d+06:48:15.909  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 00   6d+06:48:15.909  READ NATIVE MAX ADDRESS EXT

Error 1342 occurred at disk power-on lifetime: 9719 hours (404 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  – -- – -- – -- –

  40 51 00 73 9d 59 e1  Error: UNC at LBA = 0x01599d73 = 22650227

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  – -- – -- – -- – --  ----------------  --------------------

  c8 00 08 70 9d 59 01 00   6d+06:48:13.292  READ DMA

  27 00 00 00 00 00 00 00   6d+06:48:13.292  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 00 00   6d+06:48:13.289  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 00   6d+06:48:13.286  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 00   6d+06:48:13.286  READ NATIVE MAX ADDRESS EXT

Error 1341 occurred at disk power-on lifetime: 9719 hours (404 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  – -- – -- – -- –

  40 51 00 73 9d 59 e1  Error: UNC at LBA = 0x01599d73 = 22650227

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  – -- – -- – -- – --  ----------------  --------------------

  c8 00 08 70 9d 59 01 00   6d+06:48:10.670  READ DMA

  27 00 00 00 00 00 00 00   6d+06:48:10.669  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 00 00   6d+06:48:10.666  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 00   6d+06:48:10.664  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 00   6d+06:48:10.663  READ NATIVE MAX ADDRESS EXT

Error 1340 occurred at disk power-on lifetime: 9719 hours (404 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  – -- – -- – -- –

  40 51 00 73 9d 59 e1  Error: UNC at LBA = 0x01599d73 = 22650227

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  – -- – -- – -- – --  ----------------  --------------------

  c8 00 08 70 9d 59 01 00   6d+06:48:08.043  READ DMA

  27 00 00 00 00 00 00 00   6d+06:48:08.042  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 00 00   6d+06:48:08.039  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 00   6d+06:48:08.037  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 00   6d+06:48:08.036  READ NATIVE MAX ADDRESS EXT

Error 1339 occurred at disk power-on lifetime: 9719 hours (404 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  – -- – -- – -- –

  40 51 00 73 9d 59 e1  Error: UNC at LBA = 0x01599d73 = 22650227

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  – -- – -- – -- – --  ----------------  --------------------

  c8 00 08 70 9d 59 01 00   6d+06:48:05.276  READ DMA

  27 00 00 00 00 00 00 00   6d+06:48:05.276  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 00 00   6d+06:48:05.272  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 00   6d+06:48:05.269  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 00   6d+06:48:05.260  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

1  Short offline       Completed: read failure       90%      9719         22650227

2  Short offline       Completed: read failure       10%      9713         22650227

3  Short offline       Aborted by host               10%      9713         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.


#3

May  5 00:38:43 mycomputer kernel: [85894.786104] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

May  5 00:38:43 mycomputer kernel: [85894.786129] ata1.00: BMDMA stat 0x24

May  5 00:38:43 mycomputer kernel: [85894.786153] ata1.00: cmd c8/00:08:70:9d:59/00:00:00:00:00/e1 tag 0 dma 4096 in

May  5 00:38:43 mycomputer kernel: [85894.786157]          res 51/40:00:73:9d:59/00:00:00:00:00/e1 Emask 0x9 (media error)

May  5 00:38:43 mycomputer kernel: [85894.786190] ata1.00: status: { DRDY ERR }

May  5 00:38:43 mycomputer kernel: [85894.786202] ata1.00: error: { UNC }

May  5 00:38:43 mycomputer kernel: [85894.795366] ata1.00: configured for UDMA/133

May  5 00:38:43 mycomputer kernel: [85894.795406] ata1: EH complete

May  5 00:38:46 mycomputer kernel: [85897.410921] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

May  5 00:38:46 mycomputer kernel: [85897.410950] ata1.00: BMDMA stat 0x24

May  5 00:38:46 mycomputer kernel: [85897.410977] ata1.00: cmd c8/00:08:70:9d:59/00:00:00:00:00/e1 tag 0 dma 4096 in

May  5 00:38:46 mycomputer kernel: [85897.410980]          res 51/40:00:73:9d:59/00:00:00:00:00/e1 Emask 0x9 (media error)

May  5 00:38:46 mycomputer kernel: [85897.411041] ata1.00: status: { DRDY ERR }

May  5 00:38:46 mycomputer kernel: [85897.411054] ata1.00: error: { UNC }

May  5 00:38:46 mycomputer kernel: [85897.420357] ata1.00: configured for UDMA/133

May  5 00:38:46 mycomputer kernel: [85897.420397] sd 0:0:0:0: [sda] Unhandled sense code

May  5 00:38:46 mycomputer kernel: [85897.420411] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

May  5 00:38:46 mycomputer kernel: [85897.420432] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]

May  5 00:38:46 mycomputer kernel: [85897.420458] Descriptor sense data with sense descriptors (in hex):

May  5 00:38:46 mycomputer kernel: [85897.420474]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 

May  5 00:38:46 mycomputer kernel: [85897.420514]         01 59 9d 73 

May  5 00:38:46 mycomputer kernel: [85897.420532] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed

May  5 00:38:46 mycomputer kernel: [85897.420561] end_request: I/O error, dev sda, sector 22650227

May  5 00:38:46 mycomputer kernel: [85897.420615] ata1: EH complete

May  5 00:47:22 mycomputer smartd[2239]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

May  5 00:47:22 mycomputer smartd[2239]: Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

May  5 00:47:22 mycomputer smartd[2239]: Device: /dev/sda [SAT], previous self-test completed with error (read test element)

May  5 00:47:22 mycomputer smartd[2239]: Device: /dev/sda [SAT], new Self-Test Log error at hour timestamp 9719

May  5 00:47:22 mycomputer smartd[2239]: Device: /dev/sda [SAT], ATA error count increased from 1277 to 1343


#4

I downloaded Western Digital’s hard drive repair software and ran the extensive test.  The result was “Errors Repaired” exit code 223.  When i rebooted into Linux and checked the SMART statistics, there was now zero current pending sectors but still 1 offline uncorrectable.

What was odd is that there are also zero reallocated sectors.  I ran smartctl -t short and it completed fine and then I used dd to read from the previously troublesome sector and it read fine (there were all zeros there).

I now have 3 questions:  (1) What did the disk repair software do?  Did it remap the bad sector or did it physically fix it somehow? (2) What does it mean that the drive still has 1 offline uncorrectable sector? and (3) Why didn’t the drive repair itself when i tried to write to the bad sector earlier?