Bug Report - Ryzen AM4 intermittent WHEA errors with Western Digital hard drives

Myself and a few other users with various hardware configurations on the AM4 platform have reported very similar problems with intermittent WHEA fatal hardware errors reported in the Windows error logs. After converting the raw data in the error messages from hex code into text strings, we have found that the codes for Western Digital internal hard drives are the common denominator.

In my case, I have two Western Digital drives: 1) WDC WD10EZRX-00D8PB0, and 2) WDC WD10EADS-11M2B3. I have noticed that the error logs show two WHEA errors posted every five days or so, both within a short period of time (~30min). Both errors correspond to each respective hard drive. Both drives function perfectly fine and there’s no instability issues reported.

This has continued for months despite a number of bios updates on my B550 motherboard. The same issue has been reported on Ryzen 3000, Ryzen 5000, A500 and B500 motherboards (from different vendors), etc. This may be an issue with either Western Digital or AMD, but I’m not exactly sure at this stage. It needs to be investigated further, as it has likely caused many unnecessary RMA’s.

This finding explains why some users have ongoing WHEA errors while others don’t.

Did you get any further with this, or isolate the issue any more?

I have exactly the same issue, WHEA error every 4 or 5 days - I have to wait about a day though for the error to appear. System is completely stable. Im on X470 with Ryzen 3000.

Swapped over nearly every part in the computer until I found out it was the hard drives, I’m seeing this on other forums too, where they use a Hex to text converter for the RAW WHEA error and it names the drive at the route of the fault, but its mainly with the WD Blue drives I’ve seen people with issues with - I notice yours are the Greens though.

I have 3 identical Blue drives in the system so Im currently just at the stage of testing one at a time (I was hoping it was just one dodgy drive, but its worried me now yours have spat errors out on both).

Nah man, still getting the WHEA errors.

Something interesting to note, I actually replaced my 2 x WD Green drives with a brand new single WD Red NAS drive (for reasons unrelated to the errors) on the 8th August. You can see in the logs where the errors go from doubled-up entries to single entries:

image

Brand new Red NAS drive, no change in frequency of the errors. About twice a month. Except now they’re just single entries because I only have one WD drive now.

I did send a bug report to AMD and didn’t get any response. As far as I’m concerned I’ve done what I can on my end to get a fix happening. Really it’s a non-issue and doesn’t bother me anymore. But from the perspective of new buyers, this would have led to so many unnecessary RMA’s of Ryzen CPUs.

In terms of my system stability, it’s good as gold. Haven’t had a system crash for what feels like over six months now.

You’ve told me something I really didn’t want to hear! Which I’ve been worried about. I have parts for a second Ryzen build (a media pc), and I have recently bought 6x 14TB Red Plus drives for it (as well as 6x previous 4TB Red drives), I was hoping the issue was just with the Blue drives in my first build, but in the back of my mind I’ve been worried about whether I’ll get errors on the Red’s. No-one had mentioned about issues with the Reds with the WHEA errors so I convinced myself I’d be ok with them.

Its funny you say about the RMA’s, I’ve already had offers from manufactures to send back my Memory, my Motherboard, my GPU, and my CPU (as I did initially think it was one of these things causing the issue). It was only the fact I had spare parts for my second build that I’ve been able to swap them all over – and the errors still happen at the same frequency. The only thing I hadn’t swapped over was the motherboard as this is a hassle, but I do plan on doing this next week. I also have a Toshiba 3.5” drive I’m going to try to see if that gives an error. (but every other HDD I own is Western Digital). It made no difference with a different GPU (Nvidia or Radeon), tried different brands of memory, and a Ryzen 5 3600 and a 7 3700x CPU, nothing made any difference. Frequency of errors exactly the same, no matter what parts were used.

Its useful you mention about the two errors each time with two drives, I had mainly been getting three errors most of the time, and I have three of the same Blue drives, I had hoped maybe it was just one dodgy drive, that was throwing out three errors, but from what your saying its likely its all three drives giving one error each.

You were fortunate (I say that jokingly), in the fact that your errors appeared within 30 minutes, (so for testing purposes its quite quick), I’ve had to wait 2 or 3 days after each change to see if an error still appears, so its been very time consuming. So if I turn my computer off daily, id never get any errors.

I’m glad you can forgot about it, Id like to be able to do the same but its driving me mad, I cant seem to let it go. Probably to the point if I get the errors on the second build with my Red drives, then I’ll get rid of the Motherboard and CPU, and go over to Intel. Although system is completely stable, I don’t was to risk an issue with data corruption.

I do have a WD SSD drive in my system, and that hasn’t shown any errors, its just the 3.5” drives.

So I get an error every 2 or 3 days, strangely I had a period of 5 months with no errors at all, I’ve gone back and looked at logs and all sorts to see if I can work out why, and I cant seem to find any reason for it. This was before I realised I was getting the errors, so maybe I had upgraded the BIOS or something and it stopped the errors. I have tried the BIOS’s again that I thought I had in the past and I cant seem to get the errors to stop though.

Another thing I noticed was that I only seem to get a maximum of three errors (so possibly one per drive), regardless of how long I have the computer on for (I had computer on for 10 days testing it, had the initial error after 2 or 3 days, then no further ones). So what ever is causing it, is only causing it to do it once. Certainly over the last few months while I’ve been testing it, I notice the errors tend to come up more often than not around the 36 hour mark, which is strange.

Can I just ask a few questions:

What brand of motherboard do you have. (mine is ASUS X470 Crosshair Hero VII, so is the second one I have).

Do you run HWinfo program for monitoring (I just found out there was a known issue it was causing WHEA errors with GPUs, something to do with the sensors on long term monitoring, I think I’ve always had it running in the background).

Any correlation with your errors to the computer going to sleep (according to event viewer the first 3 months worth of errors I had happened just after the system woke from a sleep state each time), I havent had it sleep for about a year now and still got the errors though. As my errors happen after quite some time, I did wonder if it had anything to do with the drives waking / sleeping.

Is your Red drive the normal one, or are they Plus or Pro (my recent ones are Plus, the older ones are just the standard Red ones before they reclassified them).

What brand of PSU do you use. (mine is a BeQuiet, straight Power 11 750w).

Are you using Windows 10 or 11. (I’m on windows 10, at some point soon I will test on Windows 11).

Were yours Event ID 1 errors?

Thanks for replying, any info is helpful.

Yeah unfortunately it doesn’t seem to matter what the WD drive is. My upgraded drive is a 4TB WD Red Plus NAS drive, code “WDC WD40EFZX-68AWUN0”. WHEA Logger errors are Event ID 1, same story.

PC specs: Ryzen 5 5600X, Asus B550-F Gaming WiFi, 64GB GSkill Ripjaws V, 1TB Samsung 970 Evo SSD (which posts no errors btw), BeQuiet Straight Power 11 650W Platinum, Windows 10 Pro OS.

I don’t run any monitoring software in the background at all. I use Sleep mode a lot but that doesn’t seem to have any real impact.

To me the only logical connection is Windows 10, Ryzen and Western Digital. I think there is a hard drive maintenance program scheduled to run somewhere in Windows 10 that fails and posts the errors. There may be a way to get deeper into Windows to find out exactly what it is, but I lost interest before getting that far. Life gets busy and it doesn’t cause hardware instability so I ended up putting that project aside.

I agree with you about the three logical connections. I have noticed though with the others that are getting errors, there seems to be a lot of Asus boards, a lot of Samsung Evos, and a fair few BeQuiet power supplies. That could just be coincidence though.

I agree with the scheduler theory, as mine tends to always happen around the 36 hour mark, I’ve often thought something is running around that time and causing it.

I’ve taken time off work for a while so my life isnt busy, its not good though as its keeping me up late at night and kind of everything else is on hold while I spend so much time on this - I’ve spent a good solid month on this now, although I do feel I’m at least quite near the end, I just want to try the other Motherboard, a Toshiba drive, and Windows 11 - At least then it will cement what my issue is anyway.

I’ll probably swap the WD drives over in my main build for non-WD, what I’ll do with my second build with the ton of WD NAS drives I have, I do not know, it’ll bug me though I dont think I’ll be able to live with it spitting out errors, and possibly go over to Intel on that one.

Thanks for the info. I’ll post a final update once I’ve done the few remaining tests I have planned.

OK, so my issue has been resolved.

It was either one of the following; running SFC /Scannow or a loose wire. I ran SFC then straight away after I unplugged all drives and re-plugged in one at a time and tested each on its own, the errors havent shown up since. I stupidly didnt save a log of the scan. I think it was the scan that resolved the issue.

My feeling is it was a driver issue, I did find burried deep in event viewer under a storage folder, at the time of each error a “port reset - drive starting up” event occured. There were these events that didnt cause the error, so I dont know why sometimes when the drive span up it caused an error and sometimes it didnt, but it was certainly this spinning up that was the starting point of the error.

The WHEA error also mention “storport”, which is some kind of storage driver. These errors started right from the begining, so I dont believe there was driver corruption, I think maybe an issue with AMD chipset drivers possibly causing an issue.

I strangley had a 5 month gap without any errors in the middle, I cant see what stopped the errors, but it looks like I reinstalled the AMD chipset drivers around the time the errors started happening again. (could be coincidence).

If you havent done, I would suggest running SFC /Scannow - see if it comes back with any driver / file issues.