Broken Software Mirror

derchris · July 29, 2014, 7:36pm

Hi all,

today I purchased a WD MC 4TB.

After installing it, I updated the FW to the latest version (v04.00.00-607)

As I like to know what I have to deal with, I enabled SSH to understand how it all works.

This is when I discovered that there is a what looks like to me a faulty Software Raid config:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Thu Jul 12 20:25:55 2012
     Raid Level : raid1
     Array Size : 1999808 (1953.27 MiB 2047.80 MB)
  Used Dev Size : 1999808 (1953.27 MiB 2047.80 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Jul 29 21:23:21 2014
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 49266c09:bc456f15:83decb8c:900f3e2c
         Events : 0.742

    Number Major Minor RaidDevice State
       0 0 0 0 removed
       1 8 2 1 active sync /dev/sda2

# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Tue Jul 29 20:48:49 2014
     Raid Level : raid1
     Array Size : 1999808 (1953.27 MiB 2047.80 MB)
  Used Dev Size : 1999808 (1953.27 MiB 2047.80 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Tue Jul 29 21:31:57 2014
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : e073f245:c99fa050:997ceb89:519129b4 (local to host WDMyCloud)
         Events : 0.1071

    Number Major Minor RaidDevice State
       0 8 1 0 active sync /dev/sda1
       1 0 0 1 removed

md0 is missing sda1, and md1 is missing sda2

I would say there should be only one md0, with both sda1 and sda2 as Raid members.

From the size it looks like it is the rootfs, which makes sense to keep it on a Raid1 volume.

I can get the md0 back in working order, but the question is how can this happen, and is this maybe a bug in the latest FW?

Thanks,

Christian

Cybernut1 · July 30, 2014, 8:37am

Based on your other post ( http://community.wd.com/t5/WD-My-Cloud/Admin-Folder-Disappeared-on-first-outing-with-MyCloud/m-p/772149#M18820 ) it is unclear what exactly you have. Do you have the My Cloud or the My Cloud Mirror? They are different beasts as far as hardware and firmware is concerned. My Cloud is a single bay device while the Mirror is a dual bay device. You posted this in the Mirror sub-forum while the other post is in the My Cloud sub-forum.

EDIT: Please ignore my comment…as moderators have moved this post to My Cloud board now.

derchris · July 30, 2014, 5:36pm

Yes, they moved it by accident to the Cloud Mirror board, but it is now back in the Cloud board where it should belong

derchris · August 9, 2014, 12:57pm

Coming back to this.

I have not looked into this further, and it looks like it cleared itself:

WDMyCloud:~# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sda2[1] sda1[0]
      1999808 blocks [2/2] [UU]
      
unused devices: <none>
WDMyCloud:~# crontab -l
no crontab for root
WDMyCloud:~# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Tue Jul 29 20:48:49 2014
     Raid Level : raid1
     Array Size : 1999808 (1953.27 MiB 2047.80 MB)
  Used Dev Size : 1999808 (1953.27 MiB 2047.80 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Aug 9 14:41:42 2014
          State : active 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : e073f245:c99fa050:997ceb89:519129b4 (local to host WDMyCloud)
         Events : 0.25556

    Number Major Minor RaidDevice State
       0 8 1 0 active sync /dev/sda1
       1 8 2 1 active sync /dev/sda2

But I also get the following messages now:

Aug 9 03:05:05 WDMyCloud kernel: [918569.259787] EXT4-fs (sda4): re-mounted. Opts: user_xattr,barrier=0,data=writeback,noinit_itable
Aug 9 03:05:06 WDMyCloud kernel: [918569.883426] md: cannot remove active disk sda1 from md1 ...
Aug 9 03:05:06 WDMyCloud kernel: [918569.972771] md: cannot remove active disk sda2 from md1 ...
Aug 9 03:05:08 WDMyCloud kernel: [918572.027727] EXT4-fs (sda4): re-mounted. Opts: user_xattr,barrier=0,data=writeback,noinit_itable,init_itable=10

Always at the same time, started around Jul 30 with this

Jul 30 03:05:03 WDMyCloud kernel: [54421.389433] EXT4-fs (sda4): re-mounted. Opts: user_xattr,barrier=0,data=writeback,noinit_itable
Jul 30 03:05:03 WDMyCloud kernel: [54421.530411] md0: detected capacity change from 2047803392 to 0
Jul 30 03:05:03 WDMyCloud kernel: [54421.536299] md: md0 stopped.
Jul 30 03:05:03 WDMyCloud kernel: [54421.539354] md: unbind<sda2>
Jul 30 03:05:03 WDMyCloud kernel: [54421.598860] md: export_rdev(sda2)
Jul 30 03:05:05 WDMyCloud kernel: [54423.238730] md: cannot remove active disk sda1 from md1 ...
Jul 30 03:05:05 WDMyCloud kernel: [54423.514785] md: bind<sda2>
Jul 30 03:05:05 WDMyCloud kernel: [54423.583396] md: recovery of RAID array md1
Jul 30 03:05:05 WDMyCloud kernel: [54423.587537] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Jul 30 03:05:05 WDMyCloud kernel: [54423.593567] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Jul 30 03:05:05 WDMyCloud kernel: [54423.603258] md: using 2048k window, over a total of 1999808k.
Jul 30 03:06:38 WDMyCloud kernel: [54516.424670] md: md1: recovery done.
Jul 30 03:06:39 WDMyCloud kernel: [54517.914815] EXT4-fs (sda4): re-mounted. Opts: user_xattr,barrier=0,data=writeback,noinit_itable,init_itable=10

Looking for cronjobs, it is caused by the following script:

/etc/cron.d/20-checkRAID

Which runs this script at exact the time I get the messages:

/usr/local/sbin/20-checkRAID.sh

I haven’t check the script yet, but I believe it is looking for md0 rather then md1, and thats why it is trying to remove disks from the raid.

This is rather worrying.

niallp · November 29, 2014, 7:14am

derchris wrote:

Coming back to this.

… [snip] …

I haven’t check the script yet, but I believe it is looking for md0 rather then md1, and thats why it is trying to remove disks from the raid.

This is rather worrying.

Just noticed similar messages on my drive, also timed to the 20-checkRAID.sh script running but in my case it is md0 instead of md1

[556683.000417] EXT4-fs (sda4): re-mounted. Opts: user_xattr,barrier=0,data=writeback,noinit_itable
[556683.258876] md: cannot remove active disk sda1 from md0 ...
[556683.339855] md: cannot remove active disk sda2 from md0 ...
[556684.466608] EXT4-fs (sda4): re-mounted. Opts: user_xattr,barrier=0,data=writeback,noinit_itable,init_itable=10

Haven’t much experience with RAID, are these warnings that should be sorted out ?