Disks OK but deteriorated RAID and data lost

laucha13 · November 10, 2023, 12:23am

Hi,

I have an EX4100 with 2 x 4TB Red disks in RAID 1.
First, one of them changed the case led from blue to red. Now both led are red.
Diagnostic indicates:

disks are OK
RAID is deteriorated.

In one of shared folders, a lot of pictures (those of our marriage and kids, between others) are not shown anymore even if they are there because the disks are still full at 95% .

I’m transferring the second shared folder that seems to have all the data OK, to a new disk.

How can I restore the lost pictures ?
Is a good idea to delete the second folder once transferred ?

Any fix or idea is welcome, if I don’t answer is because my wife already killed me

Thank you all !

laucha13 · November 10, 2023, 12:34am

Hi Cerberus … I have no idea of what is that (I’m going to read about), and for now I think it’s better to wait the second folder transfer to finish, no?
It will take 2 more hours.

Mi problem is with disks on slots 1 and 2 (volume 2), are the commands still the same ?

Thank you.

laucha13 · November 10, 2023, 12:45am

I know you know but I don’t know at all how to do that, the link you send doesn’t explains all but there other links on that post, so, to do what you’re asking I need to read.
I send you the results once I have them, probably tomorrow and one more time thank you for your help!

laucha13 · November 10, 2023, 1:14am

Nothing is crystal clear for me because I don’t know nothing about it but I will find, I’ve installed PuTTY need to know what is it and how to use it … take some rest and thank you.

laucha13 · November 10, 2023, 2:02am

Forum doesn’t allow me to post all the information,
I already did what you ask but the information seems to be for the my news 10TB, my problem is in the old 4TB

NAS_user · November 10, 2023, 2:09am

The WD NAS boxes run a customized version of the Linux O/S.

The dashboard you normally access through settings is a simple graphical user interface to the Linux O/S.

Using putty; you are essentially opening a “command line window” into the WD Nas box; which then allows you to enter Linux O/S commands. This is exactly like opening a Command Prompt window in Windows; and then executing DOS commands.

The commands he is having you enter are disk diagnostic commands which will provide much more information than the read-outs available on the WD NAS dashboard. From your description; the money is on ONE of the two drives being bad. . .and it needs to be replaced.

You ALWAYS want a backup for any data on ANY system. So - - -starting by getting data off the NAS is a good start.

Not sure what you mean by “deleting second folder”. Is this a second share on the drive? and between the two shares the drive is 95% full?

I wouldn’t delete anything just yet. And since this is Raid 1; the two drives should be duplicates of each other. There are multiple ways to access data on the disks. . . how are you doing it?

Are you using windows file explorer or MAC finder?
are you using the web app?

You can also access the files (in the Linux directories on the drive) directly using WinSCP. OR - - you can pull the drives and access data directly using a PC (and a Linux reader program) (but you are not to that point yet)

NAS_user · November 10, 2023, 2:11am

Break the text into multiple chuncks to post.

What do you mean “information seems to be for my news 10TB, my problem is in the old 4tb”.
Did you do something like swap disks in the unit?
For Raid 1 to work properly; both disks must be the same size. If they are not; the “Raid array” will be the size of the SMALLER drive.

There is a way to migrate from 2 x 4tb to 2x10tb; but there are a number of steps along the way.

Djeep · November 10, 2023, 11:08am

Try rebuilding the RAID first. If that doesn’t work, consider using data recovery software for the missing pictures. Hold off on deleting the second folder until you’re sure your data is safe.

laucha13 · November 12, 2023, 1:50am

Hi Djeep,

I succeed to transfer one of 2 shared folders from my 4TB HDDs to my news 10 TB.

On the other shared folder where I had the pictures, all the folders starting by numbers (dates) are missing, at least using Explorer.

Do you have any link about how to rebuild the RAID?

Thank you for your help!

laucha13 · November 12, 2023, 2:18am

Hi NAS_user,

I have 2 x 4 TB Red RAID 1 in slots 1 and 2, I have 2 shared folders on them. One of the disks started to show the “deteriorated” status. Disks are full at 95%.

I bought 2 x 10 TB Red Plus.

To be safe I turned off the system, I pulled out the 2 x 4 TB from slots 1 and 2 and I inserted the 2 x10 TB on slots 3 and 4. I formatted them RAID 1.
Turned system off again and I reinsert the 2 x 4 TB on slots 1 and 2.

So now I have the 4 slots filled.

Don’t know exactly when, the second 4TB disk started to show the deteriorated status.

On the 2 x 4 TB I have 2 shared folders, I succeed to transfer one of them to my 10 TB using Windows file Explorer transfer, so I thought maybe to delete it to have more space in the disk (not done yet).

On the other shared folder (with the pictures) all the folders starting by numbers (Ex. 20211031_Eva 1st year birthday) are missing, at least using my file explorer on Win10.

I executed the commands requested by Cerberus but the information seems to be for the 10 TB disks only, I will post the results tomorrow in many pieces.

If you have any idea let me know please and thank you for your help!

Cerberus · November 12, 2023, 6:26am

Lets try this again.

Enable SSH, then run the following commands, one at a time, and post the results. Run a single command, post the results of that command, then run another command, etc…

smartctl -a /dev/sda;
smartctl -a /dev/sdb;
smartctl -a /dev/sdc;
smartctl -a /dev/sdd;

How to Access WD My Cloud Using SSH (Secure Shell)

laucha13 · November 12, 2023, 3:28pm

Hi Cerberus,

I send you the print screens of the 4 commands, damaged disks are in command sdc and sdd.

I will send the good prints disks in next post.

Thank you.

laucha13 · November 12, 2023, 3:30pm

laucha13 · November 12, 2023, 3:32pm

root@WDMyCloudEX4100 ~ # smartctl -a /dev/sdc;
smartctl 7.2 2020-12-30 r5155 [armv7l-linux-4.14.22-armada-18.09.3] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E7LKH4J1
LU WWN Device Id: 5 0014ee 2b6781ee8
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Nov 12 10:13:33 2023 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51840) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.

laucha13 · November 12, 2023, 3:33pm

Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 518) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 182
3 Spin_Up_Time 0x0027 182 175 021 Pre-fail Always - 7875
4 Start_Stop_Count 0x0032 090 090 000 Old_age Always - 10101
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 040 040 000 Old_age Always - 44435
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 146
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 102
193 Load_Cycle_Count 0x0032 197 197 000 Old_age Always - 9994
194 Temperature_Celsius 0x0022 125 111 000 Old_age Always - 27
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 1 -

2 Short offline Completed without error 00% 1 -

3 Short offline Completed without error 00% 0 -

4 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

laucha13 · November 12, 2023, 3:34pm

root@WDMyCloudEX4100 ~ # smartctl -a /dev/sdd;
smartctl 7.2 2020-12-30 r5155 [armv7l-linux-4.14.22-armada-18.09.3] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E3SVJS27
LU WWN Device Id: 5 0014ee 2b678d648
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Nov 12 10:17:48 2023 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

laucha13 · November 12, 2023, 3:35pm

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (52980) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 530) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 182 178 021 Pre-fail Always - 7883
4 Start_Stop_Count 0x0032 090 090 000 Old_age Always - 10255
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 044 044 000 Old_age Always - 40895
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 146
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 105
193 Load_Cycle_Count 0x0032 197 197 000 Old_age Always - 10143
194 Temperature_Celsius 0x0022 123 114 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 1 -

2 Short offline Completed without error 00% 1 -

3 Short offline Completed without error 00% 0 -

4 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

laucha13 · November 12, 2023, 3:38pm

So I sent you the 4 prints, as said, problematic discs are in commands sdc, sdd.
Also sent them in text format, divided in 2 posts per command, so 2 first text posts are command sdc and last 2 text posts are for command sdd.

Thank you for your help.

Cerberus · November 12, 2023, 3:51pm

There’s no need, the forum can handle the output of a single command.

Ignoring the 10 TB drives (/dev/sda and /dev/sdb) from now on. The first 4 TB drive (/dev/sdc) has issues, the second 4 TB drive (/dev/sdd) is ok.

/dev/sdc - Raw_Read_Error_Rate 182
/dev/sdd - No problems detected

Run the following commands, one at a time, then post the results. Text please, screenshots are a PITA to read.

cat /proc/mdstat;
mdadm --detail /dev/md1;
dmesg -t -l warn,emerg,alert,crit,err;

The output of the first two commands should fit within a single post, but the output of the third command may require a separate post.

laucha13 · November 12, 2023, 3:56pm

root@WDMyCloudEX4100 ~ # cat /proc/mdstat;
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sdd2[1]
3902822264 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [32KB], 262144KB chunk

md1 : active raid1 sdb2[0] sda2[1]
9762240320 blocks super 1.0 [2/2] [UU]
bitmap: 0/10 pages [0KB], 65536KB chunk

md0 : active raid1 sda1[3] sdb1[2] sdd1[1] sdc1[0]
2094080 blocks super 1.2 [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:
root@WDMyCloudEX4100 ~ # mdadm --detail /dev/md1;
/dev/md1:
Version : 1.0
Creation Time : Wed Nov 8 18:41:32 2023
Raid Level : raid1
Array Size : 9762240320 (9310.00 GiB 9996.53 GB)
Used Dev Size : 9762240320 (9310.00 GiB 9996.53 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Sun Nov 12 10:55:54 2023
         State : clean
Active Devices : 2

Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Consistency Policy : bitmap

          Name : WDMyCloudEX4100:1  (local to host WDMyCloudEX4100)
          UUID : 264015ac:0e40e9ec:0684827a:bc239a93
        Events : 2

Number   Major   Minor   RaidDevice State
   0       8       18        0      active sync   /dev/sdb2
   1       8        2        1      active sync   /dev/sda2