PR4100 stuck on "Resizing"

I have a PR4100 that had 4 8TB drives in it with RAID 5.

I replaced the 4 8TB drives with 4 10TB drives, one at a time, letting it rebuild the RAID in between each drive swap out.

I then went through the procedure to expand the RAID 5 volume to include the extra space.

Since then, it’s been saying “Resizing”, but doesn’t seem to actually be doing anything.

That was November 7th, almost two months ago. Now I’m stuck. The reason I upgraded the drives is because I was running out of room (Plex), so there’s about 20TB of data, and nothing large enough to back it up to so that I can wipe NAS and start setup from the beginning. It won’t let me upgrade to the new firmware. I won’t even let me power down.

I know how to SSH in to it, but I’m not a Linux person, so don’t know my way around in the OS very well. Is there anything that can be done within the OS to fix the this? If so, please go slow and explain things so a non-Linux person can follow along.

Thanks for any help anyone can give.

Upgrading 4 8TB drives to 10TB drives is not recommended as the data is at risk (degraded raid) for more than a week during the rebuild.
Anyway you got quite far.
Please share the raid status

cat /proc/mdstat

And list the filesystems and their size.

df -h

I tried to keep I/O to a minimum during the rebuilds (mostly to access the web interface for checking on the rebuild status), and made sure the RAID was in a healthy state before swapping the next drive.


Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sda2[4] sdd2[7] sdc2[6] sdb2[5]
29286720960 blocks super 1.0 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 2/10 pages [8KB], 524288KB chunk

md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
2097088 blocks [4/4] [UUUU]
bitmap: 0/128 pages [0KB], 8KB chunk


Filesystem Size Used Available Use% Mounted on
mdev 7.8G 12.0K 7.8G 0% /dev
/dev/loop0 98.3M 98.3M 0 100% /usr/local/modules
/dev/mmcblk0p6 18.4M 339.0K 16.6M 2% /usr/local/tmp_wdnas_config
tmpfs 1.0M 0 1.0M 0% /mnt
tmpfs 40.0M 9.0M 31.0M 23% /var/log
tmpfs 100.0M 3.3M 96.7M 3% /tmp
/dev/sda4 975.9M 2.3M 957.6M 0% /mnt/HD_a4
/dev/sdb4 975.9M 1.3M 958.6M 0% /mnt/HD_b4
/dev/sdc4 975.9M 1.3M 958.6M 0% /mnt/HD_c4
/dev/sdd4 975.9M 1.3M 958.6M 0% /mnt/HD_d4
/dev/sde2 4.5T 3.6T 960.2G 79% /mnt/USB/USB2_e2
/dev/sdf2 4.5T 4.1T 498.7G 89% /mnt/USB/USB2_f2
/dev/sdg2 4.5T 3.4T 1.1T 75% /mnt/USB/USB3_g2
/dev/sdh2 4.5T 3.4T 1.1T 75% /mnt/USB/USB4_h2
/dev/md1 21.7T 18.4T 3.1T 86% /mnt/HD/HD_a2
cgroup 7.8G 0 7.8G 0% /sys/fs/cgroup

Mandatory disclaimer: all these commands are at your own risk. It is recommended to call WD support first.

EDIT: growing the main partition can be done while mounted.

resize2fs /dev/md1
Original post

The first command cat /proc/mdstat shows the raid array stats.
Your main volume is /dev/md1 and it is stable, otherwise it would say ‘rebuilding’.
Your data is safe.

The filesystem currently still at 21.7T capacity.
I believe the disk partitions are layed out properly on a restore, but your box may have failed to grow the filesystem on those partitions due to leftover running processes.

In order to grow a filesystem, you need to unmount it.
You could try to

umount /dev/md1

But it may say the device is busy.
Find the processes using it and kill those process IDs (pids)

fuser -cv /mnt/HD/HD_a2

You must ensure all 3rd party apps are stopped and cleaned up.
List them with

ls /shares/Volume_1/Nas_Prog

Stop them by running

cd /shares/Volume_1/Nas_Prog/some_app
./stop.sh $(pwd)
./clean.sh $(pwd)

Then try to unmount again

umount /dev/md1

If that succeeds, check and grow the raid partition (it defaults to the max available disk space).

e2fsck /dev/md1
resize2fs /dev/md1

Then reboot.

If it didn’t grow, it could be that the partitions are still to small.
In that case, please show the partition table.

gdisk /dev/sda

Press p to print the partition table. Press q to quit.
Minor remark: please add markup in the posts here on the forum by selecting the text and press CTRL-SHIFT-C.

Again the disclaimer: use at your own risk.

Thank you so much for this help. Your instructions were extremely easy to follow and very helpful.

I started the process using your original walk-thru instructions.

I shut down the apps (Plex) through the web UI before starting.
I then connected using PuTTY and tried to unmount and it said it was busy (like you’d said it might do).
I ran the check for running processes. There were a couple, so I did a quick Google search and killed them.
I checked for running apps, and it didn’t list any (Plex is all I run, and it’d already been stopped).
It then allowed me to do the unmount.
I did the check, and it said it was clean, and gave the number of files and blocks.
I then started the resize. That was around 4:30am EST. It echoed back the resize2fs version and date, and it’s been sitting there since (over 13 hours) without returning the cursor to me. I’m hearing no sound, feeling no vibration, and seeing no flashing HD access lights on the NAS - no feedback at all on if anything is actually happening. Is there a command to check the status, if I connect with a second instance of PuTTY?

Should I continue to wait? If so, how long?

Thanks,
Scott

It’s odd that it doesn’t do anything… there should be similar to a rebuild.
Try to open a second putty session and see if the partition is growing with

watch df /dev/md1

This command shows the md1 filesystem every 2 seconds.
If it really is stuck and doing nothing, you may CTRL-C. EDIT: and reboot

The resize2fs command also has a -p flag to display the progress.

I started the watch a couple minutes after 6:00am EST. There wasn’t any change to any of the numbers for over an hour, so I killed the resize.

I went through all the steps again, just to make sure I didn’t miss anything. Still got the same results - when I run the resize2fs, it just sits there.

I did a bit of Googling on “resize2fs not working”, and trying some of the things I found. One of the first things I found is that resize2fs is supposed to display

resize2fs 1.42.9 (28-Dec-2013)
Resizing the filesystem on /dev/md1 to XXXXXX blocks.

as soon as it starts. When I run it, it’s not doing that - it just displays the version number and sits there, never displaying the second line.

Next thing I found is that people were saying that an fdisk needed to be run to change the partition size before it could be resized. I tried fdisk (no parameters), just to see what it’d say, and it says that it’s not found, so that’s out.

After thinking about it for a while, about why it wouldn’t run, and wondered if maybe it wouldn’t run because there was an instance of it already being run by the WD OS. So, I searched for how to find out if resize2fs was already running, and found “top”. Ran it, and it shows this:

PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
10618 10617 root R 26160 0.1 3 25.0 resize2fs -fp /dev/md1 28600297M

Since that’s not the parameters that I used (and I’d killed mine), it must be coming from the WD OS.

So, it looks like my options are:

  1. Just let the version from the OS keep running, though if it didn’t complete in two months, it may never complete.
  2. Kill the one from the OS that’s running, and try and start my own.
  3. Buy 20TB of HD, copy my stuff over to it, and rebuild the NAS from the ground up.
  4. Wipe everything out and rebuild the NAS from the ground up. It’s TV shows I’ve put into Plex, mostly from the 80’s / 90’s / 00’s, when I was overseas in the military and didn’t get to see them back then. I could always get them again, but redownloading 18TB of data is a daunting prospect - took me years to get the stuff.

Looks like, at least for the moment, I’m at a standstill until I figure out what way I want to go with this.

Thanks for all your help. Hopefully, if nothing else, others that run into a resizing issue will see this thread and it’ll help them out.

Good find.
Here’s some numbers for shrinking a partition:

Here’s a guy that killed the process to reboot after 800 hours:
https://bbs.archlinux.org/viewtopic.php?pid=1174040#p1174040
He says that the old data was fine…

Borrowing a 20TB NAS might be an option for you… or just an empty NAS to seat the old 8TB disks.
After a backup of the critical data, I’d dare to kill the process, run fsck and resize2fs -p -d 14 manually.

WD My Cloud uses GPT partition tables, managed with gdisk instead of fdisk. Here’s a lucky google result for more info. You don’t need to make any changes with gdisk here.

Main takeaway: don’t resize 20TB+, just get another box and transfer the data. It’s much faster and you don’t put your data at risk by running on a degraded array for a week.

You may want to look into Debian + OMV + ZFS for more control over your array.
It allows you to monitor the status of the scans and the so-called resilver process pretty easily… here with ext4 you’re left in the dark.

Did you ever find a solution other than starting over and reformatting and rebuilding a new empty RAID 5 volume and restoring all your data?

Unfortunately, no. After letting it sit for several days in the stalled state, I finally just had to get an EX4100 with 4 10TB drives, copy the Dara over, then wipe the box back nothing and start like it was new.

On the up side, I ended up doubling my NAS capacity because of the EX4100. On the down side, that wasn’t cheap.

Interesting thread… I got two PR4100s stuffed with 6TB reds and I got a stack of 10TB reds ready to go. Both on plain RAID5 with three accounts, PLEX, smb and shares.

One 4100 is back up to the other ( I believe in back ups )… My plan has been to swap the 6TBs from one and pop in the 10TBs and then copy the data back in.

I didn’t know it supported resizing.

My concern with swapping the drives is that I’ll lose the configuration. I was informed a while back that that resides in the storage, not the internal “drive”… Is that true?