[SOLVED] Solid White Light (not blinking) Boot Loop Every 42 Seconds

usury · February 15, 2020, 4:31am

After reading numerous posts about recovering bricked MyCloud devices (special thanks to Fox_eye), and every post related to solid white light (slightly orangish I always thought), and then thinking I’d have to resort to TTL UART Access, I managed to find a solution to my problem. Perhaps this solution applies to your device. I’ve tried to include enough relevant terms so anyone searching for a related problem in the future might find this post.

Mine is a Gen 1 (with the Debian-flavor OS), not the Gen 2 (with the Busybox-flavor OS).

Background:
I managed to lock myself out of my otherwise normally working MyCloud Gen1. By this I mean I could no longer SSH into my device. This was courtesy of a series of configurations with unintended consequences on my part. Dashboard web UI was still accessabile and ssh could be enabled/disable. That didn’t help my situation since I had edited /etc/ssh/sshd_config to disable password auth and require public key auth. The factory sshd_config is NOT restored during a 40 second reset.

For the really currious, I had set up some bind mountpoints (in /etc/fstab) to appear as dirs under /home. However, those dirs actually live on /DataVolume so the contents survive firmware upgrades. I subsequently learned (after a reboot) that /DataVolume itself isn’t mounted until some point after /etc/fstab is reached during normal initialization. Ergo, /etc/fstab could NOT bind-mount my relocated /home dirs. Ergo no authorized_keys file for ssh to find in a user dir. Ergo, locked out.

Problem:
At this point, the device still had it’s normal blue light. I powered down the device through the web UI and dissassembled it. I removed the HDD and put it into my SATA docking station attached my linux workstation (Fedora 31, but that probably doesn’t really matter). The drive shows up. I can easily mount the raid partitions. I edit the proper etc/ssh/sshd_config to remove my restrictions which should allow me to ssh into the device normally.

I put it all back together and… solid white light (not blinking). I mention not blinking because blinking white light is a well-known published indicator for “device initializing”. There are no normal HDD noises. The network port on the device doesn’t indicate a link, though the link activity light flickers from time to time. The network port it’s connected to at the switch occasionally thinks there’s a link.

Furthermore, exactly every 42 seconds, the white light blinks off (for less than a second). This repeats indefinitely. For hours. I left it overnight and nothing about this changed. I read oh so many anecdotes about this happening to people. None were specific about having previously removed the drive from the enclosure and attaching it to another system like I had done, but perhaps they had.

What Happened
I pieced together what must have happened after reading lots and lots of posts, and some particularly excellent posts/replies from Fox_eye, like the one at the bottom of this post.

In one post, a contributor mentions that when attaching the HDD to a different host system (like my linux workstation, though it could be a “live” boot cd/usb linux), the raid partitions appear as /dev/md_127 instead of /dev/md0. That was the case for me, but I hadn’t thought anything of it.

I did start thinking about the boot process and trying to capture a log of it.

I got curious about the contents of the other partitions on the HDD. Again using Fox_eye information I learned that partion5/6 hold identical copies of the kernel (it’s the same thing that appears in the file /boot/uImage). Partition7/8 are nearly identical to each other as well. (Partitions7/8 ultimately come from the files /usr/local/share/k1m0.env and k1m1.env respectively). The file /boot/boot.env also contains very similar information.

bootargs=“console=ttyS0,115200n8, init=/sbin/init”
bootargs=“$bootargs root=/dev/md0 raid=autodetect”
bootargs=“$bootargs rootfstype=ext3 rw noinitrd debug initcall_debug swapaccount=1 panic=3”
bootargs=“$bootargs mac_addr=$eth0.ethaddr”
bootargs=“$bootargs model=$model serial=$serial board_test=$board_test btn_status=$btn_status”
bootm /dev/mem.uImage

One of my two WD MyCloud Gen 1 machines looks for /dev/md0.
The other one looks for /dev/md1.

Apparently by attaching it to my linux workstation, it trampled the device ID the raid is known by. I’m an experienced Linux user but I know next to nothing about raid setup. I had been under the impression that all device ID’s are assigned by the system the device is attached to. Attach it to a different system, and the device will get a different device ID. Apparently not for raids? Or perhaps it was an artifact of connecting it to my workstation via SATA-USB docking station?

The Fix
I reattached the HDD to my workstation and used the information appearing on this page from Fox_eye’s Website.

mdadm --stop /dev/md_127
mdadm --zero-superblock --force /dev/sdX1
mdadm --zero-superblock --force /dev/sdX2
sync
mdadm --create /dev/md0 --level=1 --metadata=0.9 --raid-devices=2 /dev/sdX1 /dev/sdX2
mdadm is the commandline tool for managing raids
in sdX, replace X with the actual letter your system assigns to the HDD
This recreates the raid as /dev/md0

I was already many trials-and-fails into device de-bricking, so recreating the raid was just another trial. However, it might have been possible to do something less intensive, like the following, as a first attempt.

mdadm --stop /dev/md_127
mdadm -A /dev/md0 /dev/sdX1 /dev/sdX2

At this point I believe I also “flashed” the new firmware onto the raid and the other partitions, again using instructions from Fox_eye’s Website.

Basically, download the latest firmware directly from WD
Unzip the download, then unpack the *.deb archive
Then using dd…
copy rootfs.img into /dev/md0
copy uImage (the kernel) into /dev/sdX5 and sdX6 (same source for both destinations)
copy k1m0.env into /dev/sdX7 and kwm1.env into /dev/sdX8
Where to find these things in the *.deb archive and how to perform these steps is outlined in a number of posts, and again, Fox_eye’s Website.

After reattaching the main board to the HDD and connecting ethernet/power, it was apparent the device was already acting differently - normally. Normal HDD noises. The white light stayed on (again, not blinking) like it normally had, without repeating every 42 seconds.

It took about 3-4 minutes, but the blue light came on and everything worked. I could get in via ssh and do a better job of handling bind mountpoints and /etc/fstab, more below if you’re interested.

Side Observations
MyCloud Gen1 uses runlevel 2 (init 2). All the startup scripts can be found under /etc/rc2.d/. The prefix S* for start and K* for kill (stop). /DataVolume/ isn’t mounted until S15 on mine, though I suspect the order could be different depending on which services may be configured to run.

You can enable (or permanently disable, at least until the next firmware upgrade) any services that appear in /etc/init.d by using the command update-rc.d. For example…

$> update-rc.d wdmcserverd disable
$> update-rc.d wdphotodbmergerd disable

The old-school /etc/rc.local is another “service” as far as the OS is concerned. This is where I had orignially put a call to mount -a to ensure my custom bind mountpoints are reached.

/etc/rc.local It is NOT configured to run for init 2 (nor any runlevel). Furthermore, the contents of /etc/rc.local wouldn’t have survived firmware upgrades.

However, there is an S98user-start->/CacheVolume/user-start which is configured to run. Like /DataVolume, /CacheVolume survives firmware upgrades. /CacheVolume/user-start is a great place to put any custom startup commands or anything at all that a person would have otherwise put in /etc/rc/local.

–edits: typos, clarity, and a bit more info

brannonb · February 16, 2020, 8:50pm

Very well written write up, I had always wondered why when i connected the HDD to my linux box the flashing white light, I assumed was an eth0 problem because like you said, the ethernet port blinks like once evert 42 second. This is a well thought out, well written and very understandable write up on this device. Thanks for your taking the time to share with us.

Brannon