OS 5 Continually Over-Heating My PR4100

Do you have the corresponding CPU temperatures, drive temperatures, disk temperatures, and fan speeds?

I do not have that setup. I will look into it when time allows. My brief digging only allows what the NAS is willing to support.
I do get alerts once temp has gone to warning level. Just general averaged temp of the device (not like a custom PC build).
My disk temp is all over the place, recently have been running really cool. When the cpu(s) stay below 80% all has been stable.
One thing I must mention - the GUI was reporting cpu @ 10% to 15% while from my SW graphs you can clearly see one of them was maxed. The GUI is summing the average of all while 1 (one) is hung at 100% causing the overheating since no other processes can start or stop on those cores. While watching the processes, my normal under minimal load is high 20s, medium load - under 100, cpu maxed - well over 100, like the OS is starting processes to overcome stopping processes that are hung.
#notanengineer
Side note, I did upgrade my RAM to 16GB (still happening) and all machines running Acronis to current [(not maxed yet) jinkx]. I keep throwing so much at this, I am not sure which finite solution is the best. I know enough that through all previous communications with WD, this was a bad choice for use application. I should have gone with buying a new server with more HDD bays.

image
I have tried 250 different MIBs - none are compatible with WD

245 MIBs for cpu temp - none will talk to WD,
just get the overall generic temp alert

That is disappointing. I can only pull temperatures from the HDD SMART data on the drives. It is displayed on the WD My Cloud dashboard or it can be pulled via SSH with smartctl.
Is it possible the fan logic is dependent on the HDD temperatures rather than the CPU temperature, leading to your overheating?

I will look into creating a custom data point that SW can read.

Update:
Nope - not going to happen. WD will need to re-write their OID to allow for SNMP to trap these values.

Update,
9 days, 2 hours, and some minutes of uptime.
Here is the curse so far;

  • Maxed out RAM
  • updated acronis on all machines
  • rebooted all machines
    Since the updates, I have not had a temperature alert, all is running… jinx!
    Still get a lingering (near hang) once and a while, but seriously, I have to use SolarWinds to monitor CPU% so I know when to log into the drive to check its logs to see which PC is getting rejected calls.
    Since OS5 I cannot help but remember the link below.
    NSFW - Sony Releases New Stupid Piece Of ■■■■ That Doesn't ■■■■■■■ Work
    Still need a solution if anyone has experience with a NAS that can handle what it advertises.

image
Still no forced shutdown, but seriously, in a 30 device / user environment - well below the spec of this NAS - this is happening.


CPU does not go below 64% with only 8 devices communicating to it. It is sad to say WD does not care about backward compatibility in regard to previous versions of “their” software.

  • At least now that all devices are on the same version, the overheating issue is nill, the fan is kicking up as it should when it needs to.
    Thank you to all that helped with this matter, in the thread or via PM.
    Thank you!

20221109
Since this post, there have been nothing but issues with this drive again.
Again, if anyone comes across this page before they purchase - DO NOT BUY a WD NAS!

Just a fast comment : I am not having any of the aforementioned overheating issues. Makes me wonder about the ambient room temp of the units that are having problems . I am currently in the process of upgrading my somewhat aged PR4100 with 16 RAM, running raid 5 from four 4 TB wd red drives to four 10 TB wd red plus drives. Drive temps are 53, 53, 50, and 42 . The 42 temp drive is the last of the 4TB drives ; the other three are the already installed new 10 TB drives. As I write this, the PR4100 is in the process of rebuilding the third of the new 10TB drives . Once that is completed I will do the same with the fourth and final drive; then I will expand the raid so that I will have access to the full capacity of the new larger drives. Hopefully that will all work as I have described. From what I am reading my data should be retained throughout this process. When I first started this process, about 3 days ago , I did have one issue: I powered down the unit and installed the first of the new 10TB drives into bay #1 (far left) . The unit noted that the drive in bay 1 had been degraded (of course) and started to rebuild that drive . Several hours later the rebuild failed and drive 3 showed an error . My theory is that with the read/write stress , the drive in bay 3 had overheated because it was near failure due to a worn bearing. I have no way to confirm that. I removed the new drive from bay one, and put back the original drive in bay one . The unit then rebuilt the original drive in bay one . I did the fast disk test and all four of the original drives showed good. The drive in bay 3 no longer showed an error. At that point I powered the unit down and replaced the drive in bay 3 with a new 10 TB drive . I powered the unit up and used the manual rebuild feature (I had turned off auto rebuild) to start the rebuild of drive 3. All went well. After that i again powered the unit down, and took out the drive in bay 1, and put the same new 10 TB drive that I had already had in that bay back into that bay, and powered up the unit , and it rebuilt that drive . I am now in the process of doing the same for the drive in bay #2 . After it is completed I will do the drive in bay #4 and then I will use the expand feature (leaving the raid as raid 5) so that I can access all of the space on the new drives.

Since the introduction of OS/5; overheating is generally associated with the indexing process. Indexing is only performed if cloud services are enabled.