OS 5 Continually Over-Heating My PR4100

Updated to OS 5 (5.08.115) and have had nothing but problems with the PR4100 system overheating. Didn’t happen before update from OS 3.

Reading through the forum, I’ve seen other people with problems with very high CPU usage. I believe that may be the root cause of the over-heating. As suggested, I’ve turned off ALL Cloud Access in ALL shares and also Disabled Cloud Access totally. Supposedly this stops the incessent indexing.

What else can I do to stop the CPU from performing un-needed processing???

I’ve attached two screen shots of Dashboard status taken during a simple file copy from my desktop system to the PR4100.

  • #1 As you can see the CPU activity is going crazy with just a simple data copy to the PR4100 from my desktop system.
  • #2 System over heated 14 times here. Unfortunately I kept dismissing notifications prior to deciding to post this posting. I estimate overall there to have been around 50 notifications in total for this one copy (of course I also got an e-mail for each of these!!!).

In this example, when I started the copy, the fan went from off to jet engine loud in about 20 seconds from the start of the copy (with about 7 immediate overheating notifications in that 20 second time frame). Even during the copy when the fan was still running (at a more reasonable speed) the over-heating notifications kept coming in.

As I stated at the started this post, this overheating DID NOT HAPPEN prior to the OS 5 update. So I don’t need the usual wisdom of moving the server to another cooler spot, etc. That’s not the problem.

If the CPU is busy making thumbnails, etc. of media. DON"T, I don’t need it. I’m not planning on using Cloud Access anymore!!! It seems to be totally broken. Besides, I’ll let PLEX do what needs to be done with my media.

Does anyone (WD ???) have any thoughts on how to stop this insanity??? I’m really at the point of just shutting the server down until there is a fix. I don’t want to damage the server, either the fan or any electronics.

PLEASE HELP!!!

1 Like

@dpkline Have you contact support and provided the system logs?
If yes, what’s your case number?

If not, please collect the system logs, contact support and let me know the case number.

https://support-en.wd.com/app/ask

SBrown,

Thanks for the reply.

Case number: 201222-001249

@dpkline We’ve forward your issue and logs to our engineering group.

When doing a raid re-build - I use an extra small computer case fan aimed at my NAS units
a bit easy to do with top vented NAS but need side air blowing on PR units.

I did that to cool for 14 hours when indexing

SBrown, FYI. I’ve update the case with the post below and also attached another log.

Having more problems. System this morning went into thermal shutdown.

From idle (with fan off), I started copying files to the PR4100 (6 files totaling 6.45GB). Fan started almost immediately (10-15 sec.) after start of the copy, but was on a medium speed throughout the copy. During the copy of the 6th file, the system shutdown (“Thermal Shutdown Immediate” notification after re-boot). The fan never increased in speed before the shutdown to compensate. The shutdown came out of nowhere.

After rebooting, I finished copying other files I had planned on copying (111 files totaling 108.55GB).
During the copying I got 5 over-temp notifications, but the fan speed was again medium speed and the over-temp condition was back to normal after about 15sec. each time.

Don’t know what triggered the shutdown with so little work being done. As before, the CPU during all of the copying was high (spikes up to over 90%), but different behavior with the second batch of copies (i.e. no shutdown).

After the copying, I performed a compare of all 117 files (115GB) with no over-temp notifications. The CPU ran around 30% with some spikes up to 50%. The fan was still on (medium) from the copying, but slowed down over time as the compares went on.

WRITING to the drives under OS 5 seems to be compromising my system. I can’t trust it anymore.
READING from the system seems to have no noticable problems.

@dpkline Thanks for the update.
I’ve forward the update and sent you a PM for more information.

SBrown,
Added the requested info to the case, along with some other info (useful or not?).
Thanks for your assistance.

UPDATE:
Over the holidays I tried various experiments with my older WD EX4 NAS and subsequently with
the PR4100 that is over-heating. I now believe I know the reason for the over-heating conditions on my PR4100. I believe that OS 5 does not play well with an ENCRYPTED RAID 5 array! At least not MY specific PR4100.

I copied many files (totaling about 600GB) to my encrypted EX4. CPU usage averaged around 50% with spikes up to around 90%. But I never got any over-heating notifications.

I then re-formatted the EX4 to NOT be encrypted and re-ran the same copies.
CPU usage averaged around 30% with spikes up to around 65%. No over-heating.

The EX4 is no longer supported by WD and is therefore assumably running OS 3
(it only reports firmware level, not OS level).

Now back to the PR4100.
It was encrypted at the start of the tests. The same conditions as reported above that caused the over-heating. Did the same copies. CPU usage averaged around 60% with MANY spikes up to 100%, with subsequent flurry of over-heating conditions. I couldn’t leave the system unattended, as I did have a previous thermal shutdown condition (see above).

I re-formatted the PR4100 to NOT be encrypted and tried the same copies again.
CPU usage averaged around 20% with spikes up to around 55%. No over-heating!!!

Prior to OS 5, I never had over-heating problems with the PR4100 with an encrypted RAID 5 array.
Whatever OS 5 is doing it appears (in my opinion) it can’t successfully do both its base workload and encryption without potentially frying my NAS.

WD, is this a known problem with OS 5, in that it appears to not to be able to support encryption
without it being problematic? With an encrypted array, can the CPU usage be throttled back somehow where the 100% CPU usage spikes occur, hopefully avoiding the over-heating?

At this point, without some sort of timely solution from WD, my PR4100 is no longer of much value to me (at least not as my primary NAS). I need the encryption or another NAS that can handle the encryption.

After the re-formatting to do these tests, all data was wiped on the PR4100. I am at a cross-roads. I need to rebuild the NAS from scratch regardless. Since I have to re-build from scratch, do I have to do it on the PR4100 or do I move to newer technology. A quandry.

Hey dpkline,
I also have the PR4100 in a 24TB setup with RAID 5, and my CPU usage is more or less 99% 24h since 23.12.2020. But my PR4100 didn’t overheat a single time. Can you tell the temperatures when it switched off? How far do your temperatures rise during indexing? What apps are you running? Did you deactivate PLEX ? Do any other programs have access to your NAS ? I also didn’t copy any big data on or off the PR4100 while indexing, wich is kind of stupid if you need to work with the files, but everything wich gets the CPU more stress will increase the heat…
I really think that your encrypted array is the problem here, it causes some more steps than just reading for the CPU.

dustbin,

Sorry, never checked temps when the fan switched off (or on for that matter). Was always more concerned with the constant notifications. Had to turn off e-mail notifications, there were so many. :slight_smile:

As for indexing, I disabled ALL cloud services/activity (including greyed out ones in all shares). Supposedly this should have stopped indexing, at least from what I’d read in other posts on this site.

The only app I was originally running was PLEX. Not sure of the chronology, but after updating to OS 5,
I never got around to running PLEX again before my over-heating problems started. And definitely not now since I need to re-build the entire array. PLEX alone is 17.2TB that needs to be copied onto the PR4100. 4x10TB hardfiles.

Useless engineering group. They will provide you with no viable solution.

20220809 - Still HAPPENING!!!
Same here, been round and round with WD troubleshooting and finally had to add this device to SolarWinds to properly identify the issue. Every FW update the issue gets worse. So SW is reporting thermal shutdown - however, this device is running VERY cool to the touch, I mean, seriously cool to the touch - not much warmer than room temperature.

Cached Memory and Shared Memory are pegged out. This never changes.


FW - 5.23.114 - happening MULTIPLE time a day now
MIB:notifyPendingThermalshutdown : sysUpTime = 10 minutes 42.45 seconds, notifyPowerSupportNum = 1

Seriously?
MIB:notifyNoDrivesInstalled : sysUpTime = 3 hours 20 minutes 43.77 seconds, notifyInterfaceNum = 2

WD Support has ask me to compromise my network by enabling 10+ year old protocols. Thank goodness, 1 - I know better and 2 - I have access to a security team to confirm these outdated protocols are a security risk.

24TB RAID 10 - 65% full - Since engineering CANNOT figure out their FW updates, at this point, does anyone recommend a NAS solution that will handle encryption?

This is what I came into this morning.

image

I am very interested in a fix. I have to many machines dependent on the NAS uptime - I need to be able to take a vacation.
If there is a lack of thermal paste, or the need to add a heat sync, I AM IN! If it is as complex as ssh to stop a service - I can do that too (long time since I have done this), question though, will a FW update or reboot change the settings back to stock?

forgot an image
image

My bet is something has triggered the “indexing” process on the machine.

That is known to load up processors and RAM on all sorts of WD machines.

Turning on cloud services is the main indexing trigger. Indexing will take a few days or weeks.

Where did you find the breakdown of the CPU by core load?

SolarWinds install - huge analytics, does not report the same / as fast as the WD GUI - more of a longer snapshot, thus when cpus are maxed out for that long…something is wrong. Working with WD to remedy the issue.
Cheers

1 Like

Do you have any logs of temperatures and fan speed during the high CPU usage? I am experiencing high disk temperatures with idle fan speeds indicating the OS5 fan profile is not responding.

image