So anther update. I had improved my script further so that does a few checks.
1-Ping the NAS itself,
2- Ping the router,
3- Check if local port 80(http) or 22(ssh) are responding.
If any of the checks fail, reboot the drive. The script itself works!!
Yesterday my drive went into the limbo state again, solid blue light, the IP stayed pingable(i dont know for how long)
no ssh, no http(dashabord) etc…
I left it like that for 12 hours. During that time my script would have run few times via cron.
Eventually the drive never rebooted, so i power cycled it manually next day. Looked at the cron log and noticed that my cronjob(s) never triggered during that period. I have few scripts setup in cron but none of them ran.
So I can 100% confirm that this is an OS lever issue where the OS just crashes and the drive is not able to recover it self.
UPDATE. One thing i notice is that even though there are several cron jobs that run every day, even the WD default ones, in the “cron.log” file im missing an entire day, yesterday JUL 15. Note that my drive only crashed at about 8PM EST but there are no cron entries for all of JULY 15…
i power cycled the drive today in the morning, July 16 7:58 am EST. Could it be a NTP issue? The date/time are always good on thr drive.
ul 13 23:15:01 nas1 /USR/SBIN/CRON[23195]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 13 23:20:01 nas1 /USR/SBIN/CRON[23386]: (root) CMD ( /shares/data/Drivers/scripts/wdmc_start.sh)
Jul 13 23:20:01 nas1 /USR/SBIN/CRON[23387]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorSmartStatus.sh )
Jul 13 23:20:10 nas1 /USR/SBIN/CRON[23384]: (CRON) info (No MTA installed, discarding output)
Jul 13 23:25:01 nas1 /USR/SBIN/CRON[23728]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 13 23:25:01 nas1 /USR/SBIN/CRON[23729]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 13 23:25:01 nas1 /USR/SBIN/CRON[23730]: (root) CMD ( /shares/data/Drivers/scripts/re_nice.sh)
“cron.log” 517L, 65056C
Jul 14 22:35:01 nas1 /USR/SBIN/CRON[25732]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 14 22:39:01 nas1 /USR/SBIN/CRON[25938]: (root) CMD ( [! -f /tmp/standby] && [-x /usr/lib/php5/maxlifetime] && [-d /var/lib/php5] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
Jul 14 22:40:01 nas1 /USR/SBIN/CRON[26013]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorSmartStatus.sh )
Jul 14 22:45:01 nas1 /USR/SBIN/CRON[26395]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 14 22:55:01 nas1 /USR/SBIN/CRON[27061]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 14 23:00:01 nas1 /USR/SBIN/CRON[27420]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/userDataRAIDMonitor.sh)
Jul 14 23:00:01 nas1 /USR/SBIN/CRON[27421]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorSmartStatus.sh )
Jul 14 23:00:01 nas1 /USR/SBIN/CRON[27422]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorVolume.sh)
Jul 14 23:00:02 nas1 /USR/SBIN/CRON[27417]: (CRON) info (No MTA installed, discarding output)
Jul 14 23:05:01 nas1 /USR/SBIN/CRON[27869]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 14 23:09:01 nas1 /USR/SBIN/CRON[28181]: (root) CMD ( [! -f /tmp/standby] && [-x /usr/lib/php5/maxlifetime] && [-d /var/lib/php5] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
Jul 14 23:15:01 nas1 /USR/SBIN/CRON[28559]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 14 23:20:01 nas1 /USR/SBIN/CRON[28748]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorSmartStatus.sh )
Jul 14 23:25:01 nas1 /USR/SBIN/CRON[28935]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 14 23:25:01 nas1 /USR/SBIN/CRON[28936]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 14 23:26:02 nas1 /USR/SBIN/CRON[28977]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Jul 14 23:26:03 nas1 cracklib: no dictionary update necessary.
Jul 16 07:58:26 nas1 /usr/sbin/cron[6561]: (CRON) INFO (pidfile fd = 3)
Jul 16 07:58:26 nas1 /usr/sbin/cron[6562]: (CRON) STARTUP (fork ok)
Jul 16 07:58:26 nas1 /usr/sbin/cron[6562]: (CRON) INFO (Running @reboot jobs)
Jul 16 07:58:26 nas1 /USR/SBIN/CRON[6583]: (root) CMD (/usr/local/sbin/monitorVolume.sh)
Jul 16 07:58:26 nas1 /USR/SBIN/CRON[6582]: (root) CMD (/usr/local/sbin/20-checkRAID.sh reboot)
Jul 16 07:58:26 nas1 /USR/SBIN/CRON[6580]: (root) CMD (/usr/local/bin/transmission-daemon)
Jul 16 07:58:26 nas1 /USR/SBIN/CRON[6581]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/userDataRAIDMonitor.sh)
Jul 16 07:58:29 nas1 /USR/SBIN/CRON[6568]: (CRON) info (No MTA installed, discarding output)
Jul 16 08:00:01 nas1 /USR/SBIN/CRON[7045]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorVolume.sh)
Jul 16 08:00:01 nas1 /USR/SBIN/CRON[7046]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/userDataRAIDMonitor.sh)
Jul 16 08:00:01 nas1 /USR/SBIN/CRON[7044]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorSmartStatus.sh )
Jul 16 08:00:03 nas1 /USR/SBIN/CRON[7041]: (CRON) info (No MTA installed, discarding output)
Jul 16 08:02:01 nas1 /USR/SBIN/CRON[7362]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/rotateLogs.sh)
Jul 16 08:02:01 nas1 /USR/SBIN/CRON[7363]: (root) CMD ([! -f /tmp/standby] && /etc/init.d/saveclock.sh reload)
Jul 16 08:02:02 nas1 /USR/SBIN/CRON[7360]: (CRON) info (No MTA installed, discarding output)
Jul 16 08:05:01 nas1 /USR/SBIN/CRON[7606]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 16 08:09:01 nas1 /USR/SBIN/CRON[7951]: (root) CMD ( [! -f /tmp/standby] && [-x /usr/lib/php5/maxlifetime] && [-d /var/lib/php5] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
Jul 16 08:15:01 nas1 /USR/SBIN/CRON[8392]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 16 08:20:01 nas1 /USR/SBIN/CRON[8644]: (root) CMD ([! -f /tmp/standby] && /usr/local/sbin/monitorSmartStatus.sh )
Jul 16 08:25:01 nas1 /USR/SBIN/CRON[8979]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 16 08:28:01 nas1 /USR/SBIN/CRON[9171]: (root) CMD (/no_ip_reboot_nas1.sh )