No configured RAID volumes

Only part of the command failed. But the important part is that it was able to start the RAID 5 array on the /dev/md1 device with 3 drives (out of 4). This is what the later Linux commands were intended to do, in the event that the first command failed entirely. It probably worked because because the /dev/md1 device was manually created using the mknod /dev/md1 b 9 1 command prior to trying to start the RAID 5 array.

Now, it’s important to identify exactly which drive bay is using the /dev/sdd2 device name, because something happened to cause that drive to be kicked out of the RAID 5 array. Typically, the the /dev/sdd2 device name means partition 2 of the drive in bay 4.

You may have inadvertently removed a good drive, because nothing indicates that the /dev/sda2 drive had any problems. Typically, the the /dev/sda2 device name means partition 2 of the drive in bay 1.

This stage of the process is VERY DANGEROUS, and one wrong move can cause the loss of all data. Therefore, I STRONGLY advise against experimenting by pulling random drives.

What happens if you power off the NAS and return the first drive back to bay 1, then power on the NAS again?

Hi,

I reinstalled the drive and now they are all working however it is now telling me that Disk 1 is failing. This was the one that gave me a problem and I was running a test on it when I had the power failure.

So as it says I will have to copy the data and then get a new drive. i just need to work out how to copy from drive 1 to the others

It’s not saying “copy the drive”. . . .it’s saying “backup the data”.

In other words; the system is saying it has a bad drive; and It’s time to make sure the entire array is backed up. (you should do this even if you had 4 healthy drives).

Once you have a backup; THEN replace the “bad” drive (if it truly is bad)

With a fresh drive installed, the NAS will rebuild the array by copying the stripped data from drives 2,3,and 4 onto drive 1. THEN you will have redundancy restored.

It is important you do the backup FIRST before attempting to rebuild the array in case you have a second questionable drive. If you only have one read cycle left. . .you want to use it by making a backup rather than attempting a raid array rebuild.

1 Like

Then it’s the drive in bay 1 afterall. This is why I said it’s important to identify exactly which drive bay is using the /dev/sdd2 device name, because it’s the drive that was kicked out of thge RAID array. It often gets complicated because Linux device names are not guaranteed to be assigned to a particular device, and they’re subject to change.

Why didn’t you say this in the first place? Had I known, I would have simply asked you to remove that drive before attempting to get the /dev/md1 RAID device back online.

Regardless, at least it’s back online, and I STRONGLY suggest creating backups BEFORE replacing the failed drive and attempting to rebuild the RAID array. Drives often fail during the process, and if you lose a second drive, all of your data is toast.

For a final snapshot of the health of each drive, run the following commands and post the results.

  • smartctl -a /dev/sda
  • smartctl -a /dev/sdb
  • smartctl -a /dev/sdc
  • smartctl -a /dev/sdd

They’re similar to commands I asked you to run previously, but provide more detailed information about each drive.

Hi Cerberus,

Thank you for your help, it is really appreciated.

Fortunately the machine is still in it’s 3 year warranty, by a month, so I should be able to get a replacement drive.

Annoyingly it is now showing the discs as ok, but i still think drive 1 is on it’s way out

root@MyCloudPR4100 ~ # smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-4.14.22] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0
Serial Number: WD-WX52D60DSJ9U
LU WWN Device Id: 5 0014ee 2682f4db1
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Nov 8 20:00:59 2023 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 17) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: ( 6420) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 22) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 207 201 021 Pre-fail Always - 2650
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 69
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 065 065 000 Old_age Always - 25678
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 46
194 Temperature_Celsius 0x0022 113 102 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Aborted by host 10% 25678 -

2 Short offline Completed without error 00% 25676 -

3 Extended offline Aborted by host 10% 25647 -

4 Short offline Completed without error 00% 25646 -

5 Short offline Aborted by host 90% 25646 -

6 Extended offline Aborted by host 10% 25646 -

7 Extended offline Aborted by host 10% 25634 -

8 Short offline Completed without error 00% 25623 -

9 Short offline Aborted by host 90% 25623 -

#10 Short offline Completed without error 00% 25623 -
#11 Short offline Completed without error 00% 24807 -
#12 Extended offline Aborted by host 70% 24806 -
#13 Short offline Completed without error 00% 24805 -
#14 Extended offline Aborted by host 10% 24805 -
#15 Short offline Completed without error 00% 24789 -
#16 Short offline Completed without error 00% 24784 -
#17 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And the next test

root@MyCloudPR4100 ~ # smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-4.14.22] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0
Serial Number: WD-WX52D60H761R
LU WWN Device Id: 5 0014ee 212d9cb9f
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Nov 8 20:03:31 2023 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 21) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: (16380) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 344) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 205 201 021 Pre-fail Always - 2725
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 39
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 065 065 000 Old_age Always - 25679
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 17
194 Temperature_Celsius 0x0022 113 102 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Aborted by host 50% 25679 -

2 Short offline Completed without error 00% 25677 -

3 Short offline Completed without error 00% 25674 -

4 Extended offline Aborted by host 90% 25647 -

5 Short offline Completed without error 00% 25646 -

6 Short offline Aborted by host 90% 25646 -

7 Extended offline Completed without error 00% 25640 -

8 Extended offline Completed without error 00% 25629 -

9 Short offline Completed without error 00% 25623 -

#10 Short offline Aborted by host 90% 25623 -
#11 Short offline Completed without error 00% 25623 -
#12 Extended offline Aborted by host 90% 24806 -
#13 Short offline Completed without error 00% 24805 -
#14 Extended offline Aborted by host 10% 24805 -
#15 Short offline Completed without error 00% 24789 -
#16 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

and the third

root@MyCloudPR4100 ~ # smartctl -a /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-4.14.22] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0
Serial Number: WD-WX62D60C48V7
LU WWN Device Id: 5 0014ee 212d9f3e3
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Nov 8 20:04:30 2023 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 21) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: (12840) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 345) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 212 200 021 Pre-fail Always - 2383
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 42
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 065 065 000 Old_age Always - 25679
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 32
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19
194 Temperature_Celsius 0x0022 113 102 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Aborted by host 50% 25679 -

2 Short offline Completed without error 00% 25677 -

3 Short offline Completed without error 00% 25674 -

4 Extended offline Aborted by host 90% 25647 -

5 Short offline Completed without error 00% 25646 -

6 Short offline Aborted by host 90% 25646 -

7 Extended offline Completed without error 00% 25640 -

8 Extended offline Completed without error 00% 25629 -

9 Short offline Completed without error 00% 25623 -

#10 Short offline Aborted by host 90% 25623 -
#11 Short offline Completed without error 00% 25623 -
#12 Extended offline Aborted by host 90% 24806 -
#13 Short offline Completed without error 00% 24806 -
#14 Extended offline Aborted by host 10% 24806 -
#15 Short offline Completed without error 00% 24789 -
#16 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

and the last one

root@MyCloudPR4100 ~ # smartctl -a /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-4.14.22] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0
Serial Number: WD-WX62D60AUCPR
LU WWN Device Id: 5 0014ee 2682f4303
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Nov 8 20:10:10 2023 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 21) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: (16020) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 344) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 206 199 021 Pre-fail Always - 2691
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 42
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 065 065 000 Old_age Always - 25679
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 32
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19
194 Temperature_Celsius 0x0022 114 102 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Aborted by host 50% 25679 -

2 Short offline Completed without error 00% 25676 -

3 Short offline Completed without error 00% 25674 -

4 Extended offline Aborted by host 90% 25647 -

5 Short offline Completed without error 00% 25646 -

6 Short offline Aborted by host 90% 25646 -

7 Extended offline Completed without error 00% 25640 -

8 Extended offline Completed without error 00% 25629 -

9 Short offline Completed without error 00% 25623 -

#10 Short offline Aborted by host 90% 25623 -
#11 Short offline Completed without error 00% 25623 -
#12 Extended offline Aborted by host 90% 24806 -
#13 Short offline Completed without error 00% 24806 -
#14 Extended offline Aborted by host 10% 24806 -
#15 Short offline Completed without error 00% 24789 -
#16 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

1 Like

At last, the mystery of why the /dev/sdd drive was dropped from the RAID array is solved. And it’s not good, because it means it will happen again, and again, eventually resulting in complete data loss.

The problem was caused by the infamous Western Digital Red (SMR) hard drives, which are not suitable for use in RAID arrays. Google it, and you’ll see what I mean.

Heed my warning, create backups before it’s too late, because all four of the drives are Western Digital Red (SMR) hard drives.

/dev/sda

Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0

/dev/sdb

Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0

/dev/sdc

Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0

/dev/sdd

Model Family: Western Digital Red (SMR)
Device Model: WDC WD40EFAX-68JH4N0

For backups, you can buy 12TB hard drives for about $200 USD each, where a single drive should easily hold all of your data. My personal preference is Toshiba N300 or X300 hard drives.

I have both types, which have been quiet and reliable for more than five years of continuous use.

Are you saying that because they are SMR drives; or something else you are seeing in the data dump?

Btw: I don’t disagree with your assessment of SMR drives in the general case.

1 Like

That’s basically it in a nutshell.

The performance of SMR hard drives is abysmal, which makes them totally unsuitable for use in a RAID environment. In this case, all subsequent errors were a chain of events caused by a single SMR drive getting kicked out of the RAID array because it didn’t respond quickly enough.

SMR hard drives also don’t last very long, having a mere fraction of the life expectancy of CMR hard drives. SMR technology is good for sales, but bad for reliable data storage, so I’m sure you can guess why WD tried to sneak SMR drives into NAS channels as WD “Red” drives. However, corporate greed backfired, and WD’s reputation was shattered.

I am still mourning the transition of small 2.5" HDD’s to SMR. I used to buy them like candy. . . .

Now I have bought a few 1 and 4 TB SSD drives from a competitor - - -they are nice - - -but the cost/performance ratio on these are all wrong. (Plus; the 5TB was a good size for me)

1 Like

For 2.5" drives, I wouldn’t waste my money on spinners, even if they used CMR technology. For small portable drives, SSD is the way to go, but not anything made by WD because they burned that bridge long ago and I don’t trust them.

Hello Cerberus

Just to let you know that WD have agreed to replace all four drives under warranty, so thank you very much for the link regarding the issue in your previous post. :grinning: :grinning:

Hopefully there will be some Black Friday deals on back up drives. I need an external one as I don’t have anywhere to install an internal drive.

Cheers

1 Like

You should insist that WD replace the SMR hard drives with new CMR hard drives. They may balk at first, but keep pushing it up the chain of command until you get a more favorable response. Others have done it.

I’ve got more than 100 TB of data spread out among multiple NAS boxes, and I literally have a case full of hard drives with backups, so that where my thoughts naturally go first.

You could always use an external drive dock, but if an external USB hard drive works best for you, that’s ok too. The important thing is creating backups ASAP.

Good Day! @Cerberus I have same issue with lambo911, I want to try all the commands that you are asking, but when I’m in the command, “cat /proc/mdstat;” we have different outpu, mine is:

root@MyCloudPR4100 dev # cat /proc/mdstat;
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid1 sde1[4] sdd13 sdc1[2] sdb1[1] sda1[0]
2094080 blocks super 1.2 [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:

can I still continue with the remaining commands that you ask?