Constant disk activity, power only, not SMR

dude · December 28, 2023, 11:59am

I have a couple of older WD drives (12+ years) that make a constant disk activity noise, even when there is no SATA cable connected. It sounds like an old fashioned cash register: tick tick tick brrrr, tick tick tick brrrr. The noise is constant, every 1 to 2 seconds and causing the drives to get hot, even when idle.

The drives are not SMR as far as I know, which probably didn’t even exist at the time these drives were designed. The drives have some read errors, but no pending or relocated bad sectors. I’ve already tried all sorts of surface tests using Hard Disk Sentinel, including re-initialize (took 18 hours) without finding any issues.

Again, the noise is even when there is no SATA cable connection, power only, telling me the problem is drive internal and not caused by the PC BIOS or the host system. The drives used to be quiet. My best guess is a firmware issue or the drives are wearing off.

Affected are: WD20EADS WD2001FASS

Any ideas?

Kiriakos-GR · December 28, 2023, 9:18pm

Please define hot HDD actual temperature?
Now give us clues of your room temperature with in past 24 hours.

Powered HDD with out SATA connection this is abnormal condition for all users.
Neither WD factory will ever test a drive as you tried to do.

dude · December 29, 2023, 1:28am

I still haven’t resolved the problem, but I may have discovered the reason.

There’s a small DOS program called WDIDLE3.EXE which allows to read, set, or disable the drive idle timeout. When the idle timer is up, I guess it’s supposed to park the heads.

The WD Black has the idle timer disabled. The WD Green has it set to 30 seconds. Upon reading the current timer setting (wdidle3 /r) the drive(s) seeking/reading activity stops and remains quiet.

Unfortunately, setting or disabling the timer makes no difference and after a system restart the constant drive activity that runs every couple of seconds is back. However, this tells me the problem is in the firmware.

I also noticed that the “power on hours” are most certainly incorrect. Both drives are 12+ years old and have been in use for at least over 5 years. The Black reports 36864 hours, and the Green shows 6 hours! Perhaps the counter resets after 65535 hours (16-bit)? Could this also explain the problem with the idle timer.

Well, WD may not have anticipated that the drives would even be running that long, although a typical MTBF (mean time between failure) is 1 million hours, if I recall correctly.

Maybe it would be enough to simply refresh the firmware.

fzabkar · December 29, 2023, 2:24am

Could you show us the SMART reports from CrystalDiskInfo?

[CrystalDiskInfo - Crystal Dew World [en]](CrystalDiskInfo - Crystal Dew World [en]

I believe that the normalised value of the Power-On-Time attribute counts the number of months that the drive has been in use. For example, a Current value of 40 would mean that the drive is 60 months old (= 100 - 40).

As for the idle activity, the drive could be performing background scanning to preemptively detect weak sectors.

Another possibility is that the headstack is being moved to prevent lubricant buildup when flying above the same track for any length of time.

dude · December 29, 2023, 3:16am

The power on time for WD Black reads 02BB (hex), or 699 hours. You think this is months? 58 years?! The value has obviously been reset not too long ago, just like the WD Green , which I bought in 2009, now showing 6 days. I have a bunch of other WE4 RE 2TB drives from 2011 showing 0x9004 (36868) hours. It’s clearly showing hours, not months.

As for the features you mention, do these old drives even have such? Please keep in mind that I’ve been using these drives for years, and know what sounds normal and what not. This latest development of drive activity is abnormal. As mentioned in my previous response, the drive activity appears to be in connection with the idle timer. The strange power on time is something I just noticed analyzing the drives.

fzabkar · December 29, 2023, 4:38am

I’m saying that you should also look at the normalised values of the POH attribute, not just the raw value. I agree that the unit for the raw values is hours, and that it has possibly rolled over. However, the unit for the normalised values (Current / Worst / Threshold) is months, at least in some models.

As for background scanning, this has been a feature for many years. In fact, there should be a SMART attribute called “Offline Scan Uncorrectable Sector Count”.

The other feature is called “Pre-emptive Wear Leveling (PWL)”.

https://wfcache.advantech.com/www/certified-peripherals/documents/96hd1tb-st-wd7ke1_datasheet.pdf

This WD feature provides a solution for protecting the recording media against mechanical wear. In cases where the drive is so busy with incoming commands that it is forced to stay in a same cylinder position for a long time, the PWL control engine initiates forced seeks so that disk lubricant maintains an even distribution and does not become depleted. This feature ensures reliability for applications that perform a high incidence of read/write operations at the same physical location on the disk.

dude · December 29, 2023, 5:00am

dude · December 29, 2023, 5:22am

I think it’s obvious that the drive can hardly have 6662 power on cycles in 162 days. So 162 days is nonsense. How are the “normalized” values going to tell me anything useful about the power on hours? How good or bad is it, showing me 100 (months?)

PWL? Perhaps, but when does it kick in? Why did I not hear this before? And why does wdidle3 to read out the idle timer stop the nasty and pesky disk activity until the next power cycle?

Kiriakos-GR · December 29, 2023, 8:12am

WinDlg_v1_37 will fix automatically any corruption that might be repairable.

dude · December 29, 2023, 1:23pm

I’ve written 1 TB of random data to see if it makes any difference, but it doesn’t. The activity is not PWL, which doesn’t run every 1 - 2 seconds.

dude · December 29, 2023, 1:24pm

DataLifeGuard reports 0 errors. There are no surface problems.

Kiriakos-GR · December 29, 2023, 2:36pm

In this case all chances for an Easy-Fix they were explored.

Find the nearest location of Recycling collection point.

dude · December 29, 2023, 4:17pm

Setting or disabling the idle timer using wdidle3 has no affect (including power-off), but simply reading the current setting solves the issue.

So the is question is, how can I run “wdidle3 /R” in Windows x64 at system startup? The program apparently requires DOS 16-bit, and is not 32-bit compatible.

Alternatively, refreshing the firmware might do the trick as well, but I can’t find any software.

Btw, I already tried compiling idle3-tools (open-source) using mingw under Windows, but it requires Linux libraries. Also tried WINEVDM and vDOS, but doesn’t work in this case.

fzabkar · December 29, 2023, 4:50pm

Yes, it does appear that there is a SMART bug. It’s not the first that I’ve seen.

As for PWL, this was introduced by IBM in response to a high failure rate in their “Deathstars”. I think they were referred to as “patrol seeks”.

fzabkar · December 29, 2023, 4:52pm

Wdidle3 tries to modify a parameter in firmware module 02. You can dump these SA modules using the demo version of WDMarvel.

If you wish to proceed in more depth, I invite you to repost your question to the HDD Oracle forum:

https://www.hddoracle.com/index.php

dude · December 29, 2023, 6:32pm

I want to try a few more things under Linux using hdparm. Perhaps it’s not really related to the idle timer, but just about accessing the firmware parameters. I am also considering to dump the firmware using a ch431a BIOS programmer, which I successfully used in the past to transfer the BIOS from a defective logic board, hence replacing the logic board and repairing the drive. I think I have a copy of the firmware and will compare it with a hex editor to see where the differences are. I have a copy of the WD black and green firmware. Maybe I can adapt the “new” firmware to work with the existing drive, reflash, and the problem is solved.

fzabkar · December 29, 2023, 6:56pm

Most of the firmware is stored in a reserved System Area (SA) on the platters. That’s where the idle timer is located.

I have written a ROM parsing tool:

https://web.archive.org/web/20230522150548/http://www.users.on.net/~fzabkar/FreeBasic_W32/WD/wdROMv17.bas

https://web.archive.org/web/20230522150548/http://www.users.on.net/~fzabkar/FreeBasic_W32/WD/wdROMv17.exe

WDMarvel can dump the ROM (“BIOS”) via SATA. No need for a programmer.

These tutorials should give you some insight into the firmware:

http://www.hddoracle.com/viewtopic.php?p=19087#p19087

https://forum.hddguru.com/viewtopic.php?f=16&t=6562

An alternative to WDMarvel is HDDSuperTool (Linux based). One of the included scripts is able to dump the firmware of older WD models.

https://drive.google.com/drive/folders/1VhE9sRsqtG5S9uNt2qSuDR5vGtMKvpVQ

https://github.com/ISpillMyDrink/OpenSuperClone

https://www.hddsuperclone.com/

dude · December 29, 2023, 10:34pm

Thanks a lot for the info. I will review it.

Meanwhile I tried hdparms -j and also compiled idle3-tools and ran:
./idle3ctl -d /dev/sdb
Idle3 timer disabled

It turns out that reading or setting the idle timer makes no difference to stop the disk activity using any of these tools. Unlike the wdidle3 DOS utility, which I cross-checked and verified several times.

So it’s probably not anything to do with idle timer after all, but something else that the DOS utility is doing to silence the drive.

dude · December 30, 2023, 12:27am

More tests:

root@gpubench:/media/dude/wdgreen# hdparm --idle-unload /dev/sdb
/dev/sdb:
issuing idle_immediate_unload command

After that the drive still spins, but no more disk activity. It’s supposed to park the heads and put the drive into a lower power state. This could be what the wdidle3 DOS utility is doing, but who knows?

After using the drive again, however, the disk activity is back, and remains on.
root@gpubench:/media/dude/wdgreen# dd if=/dev/random of=junk bs=2000M count=1
2097152000 bytes (2,1 GB, 2,0 GiB) copied, 18,0685 s, 116 MB/s

So back to square one. The periodic disk activity between the WD black and green is actually a different pattern. On the green drive it goes in 5 second intervals. On the WD black the disk activity is just constant, without any pauses. Very annoying.

dude · December 30, 2023, 5:11am

I noticed the following with smartctl -a /dev/sdb

Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (29160) seconds.

So I turned it on:
smartctl --offlineauto=on
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Automatic Offline Testing Enabled every four hours.

Let’s see what happens. Though I wonder if automatic offline testing was disabled, what is causing the constant disk activity.

The other thing I wonder:
193 Load_Cycle_Count 0x0032 124 124 000 Old_age Always - 230030

If the idle timer is disabled, why the high number? My understanding is that the idle timer is causing the load_cycle. Isn’t that a very unusual high value?