WD MC loses network connectivity

I know there have been many threads on this topic before but the issue was never resolved.

I wanted to share some findings which prretty much confirm what might be happening

First of i have this script set up as a cronjob on the WD drive:


#!/bin/bash
test_host=`netstat -nr | grep "UG" | awk '{ print $2}' | xargs ping -q -w 1 -c 1 | grep "received" | awk '{ print $4 }'`
if ["$test_host" == "0"] || [-z "$test_host"] ;
then
echo "service networking restart" >> /shares/data/Drivers/scripts/no_ip_reboot_output.txt
/etc/init.d/networking restart
sleep 60

test_host=`netstat -nr | grep "UG" | awk '{ print $2}' | xargs ping -q -w 1 -c 1 | grep "received" | awk '{ print $4 }'`
   if ["$test_host" == "0"] || [-z "$test_host"] ;
then
echo "Rebooting" >> /shares/data/Drivers/scripts/no_ip_reboot_output.txt 
shutdown -r now
   fi
fi

Today i encountered that infamous “network loss” issue where the drive looks to be up and running, solid blue led, normal network green leds on the back, yet the drive cannot be accessed in any manner and also cannot be pinged.

(I have static Ip assigenment FROM the router based on device MAC)

Once i found the drive in that state today, i left it like that until the next scheduled cronjob for my above mentioned script to see if it reboots its self. guess what IT DID NOT.

So there are  ONLY two possibilites left now.

1- The OS on the WD drive crashed/kernel panicked such that it was not even able to run a cron job

OR

2- My script works on the fact that if the drive cannot ping its gateway (my router) thats when it triggers the network restart and then eventually a reboot. So with that said, possibly the drive could ping the router and hence it never restarted?

If that would be the case, then why i cannot ping drive from my computer when the drive is in this state!

Your router should always be pingable as an indication it’s working.

What I would test for is if your device (mycloud) actually has an IP address assigned to itself and if not THEN restart/reboot accordingly.

I’m leaning toward the OS not working properly to the point that the cron job itself is not being run.

What was your device doing, if you know, prior to it reaching this state?  

well if the WD drive had lost its own IP, then shouldnt it be UNABLE to ping the router? Thats what my thinking.

Also in the case if the OS has gone down or something, then no script will make it reboot.

As for what it was doing prior to going down, thats a good point…I do have the default cronjobs of the drive modified according to my own schedule. Though that should not be the cause of the issue and also those were changed recently and i have run into this scenario in the past even with the default settings.

THat being said. I notice in the “user.log” that the drive woke up at 11:25PM to run the various cron Job and that seems to be the last entry… until i power cycled the drive in the morning at 8:42am.

Below is the user.log and also my crontab

Apr 21 22:18:48 nas1 logger: exit standby after 760 (since 2015-04-21 22:06:08.790199000 -0400)
Apr 21 23:14:07 nas1 logger: exit standby after 8 (since 2015-04-21 23:13:59.670199000 -0400)
Apr 21 23:25:08 nas1 logger: exit standby after 295 (since 2015-04-21 23:20:13.890199000 -0400)
Apr 22 08:42:34 nas1 S15mountDataVolume.sh: begin script: start
Apr 22 08:42:44 nas1 _: pkg: kernel-mindspeed-sequoia
Apr 22 08:42:44 nas1 _: pkg: wd-nas

Crontab:

m h dom mon dow user command

25 23 * * * root cd / && run-parts --report /etc/cron.hourly

26 23 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

27 23 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
28 23 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

0 8 1,16 * * root /shares/data/Drivers/scripts/rsync_with_delete.sh
28 3,8,12,16,18,21,23 * * * root /shares/data/Drivers/scripts/no_ip_reboot.sh

0 3,15 * * * root /shares/data/Drivers/scripts/rsync.sh

20 23 * * 1,5 root /shares/data/Drivers/scripts/wdmc_start.sh
20 09 * * 2,6 root /shares/data/Drivers/scripts/wdmc_stop.sh

It seems you still have access to the device if you were able to grab the logs.

How are you connecting to the device without an IP address?  I am thinking maybe a crossover cable but I just like to be sure.

Are you able to interact with the device in a way to manually run your cron job and see if it reboots itself?

I think you misunderstood me. The drive was only accessible after i had power cycled it. I grabbed the logs after that.

Once the drive into the limbo state, there NO WAY to access it. its no ping able, no ssh, no ftp, no web gui, no samba, no NFS, nothing…

If i unplug the router from the drive and then run my cronjob, it detects that the gateway is not there, hence the condition becomes true and it reboots as per the script.

However earlier today in the morning i had found the drive in the limbo state. at around 8:15am…

i waited till 8:28 to see if the cronjob would kick in to reboot it, but it never did, thats when i realized that the OS itself might have crahsed!!!

From your last post it would be the more logical assumption that the operating system did in fact crash in some way.

You power cycled it and it came back up as normal.  Typically I would monitor this type of situation to see if it occurs again and if it occurs again on some pattern (time of day, certain activities etc.).

Aside from that I can only think to look at IP lease settings on the router side but since you stated you have it set to static I’m going on the hunch that you have your leases set properly for your particular network.

Correct, the ip lease was still active from the router at the time. I even rebooted the router first but that help, that’s when I had power cycled the drive…
I don’t think there’s any pattern to this. It’s happened to me in the past, at random times during the day, sometimes even twice in a day…anyways…this issue happens to many others also. There have been a lot of threads on this in this forum

Hello,

I suggest the following steps:

  1. Enabled extended logging on the unit

    Settings > Utilities

  1. When the issue occurs again
  • power off/on the unit
  • Create and Save System Report to your local PC
  • open a support case with WD and attach the logs

When opening the case, mention the Community Thread

  1. Disable Extended logging and reboot

I believe the controller board insde these drives have a serial port …i might look into hooking it up the next time i run into the same problem…