Best practice for preemptive RAID disk replacement

A couple of years ago two disks on my old (now defunct) EX4 died without warning at about the same time. (The second drive failed while rebuilding the RAID 5 after replacing the failed drive.) So everything lost. Bummer.

I’d probably kept the old disks going past their expected lifetime, but was a bit grumpy that the EX4 didn’t give any kind of warning of impending death. I guess I was expecting too much. Live and learn. The fact that the EX4 was getting pretty flakey may have contributed to the problem, too.

I now have an EX4100 running RAID 5 + spare. I’m not sure the spare is that much better than a 4-disk RAID 5, though the automatic activation of the spare would be handy. (The EX4100 does that, right? The manual doesn’t say.)

But, to my question: I’d like to protect myself against this kind of double failure in the future, and wonder how other EX Series users deal with this. One option, of course, would be to have a couple of spare drives on the shelf ready to swap in as needed. (I have one already, probably should get another.)

I’m using the WD Red drives, by the way. Whether that’s the best idea is another question, but kind of the idea of RAID is you can use cheaper (less reliable) disks and deal with the consequences.

Another option would be to replace the oldest disk every L/4 time, where L is the expected life of a disk. Expected life would probably be more than the warrantee period, but maybe not a lot more. And I’m thinking that the newest disk should be the new spare.

I’d be interested to hear thoughts and experience on this.

Definitely have thoughts on this.

Raid 5+ Spare. . . .is that not Raid 6? Raid which is tolerant to TWO disk failures?

So - - -first – - - recognize that a NAS has common mode failure potential not just from the disks, but from the NAS hardware itself. Or FIRE. To address this. . . one should maintain an independent backup of the data on the NAS.

Second - you want offsite backup. In my case, my data “lives” on an external drive. It is “backed up” to the NAS. There is also a second backup that (generally older) lives outside the house.

Third, my “data” drive is occasionally retired and replaced with a fresh drive.

  • This provides a “snapshot in time” that protects against ransomware
  • This provides a “snapshot in time” that protects on my being stupid and deleting something without realizing for three months.
  • I usually have to migrate to larger drives when I make this switch.

Of note, with the advent of SMR tech in external drives, I have migrated to lower capacity SSD drives for “primary data”.

I deal in an industry (non-IT) that has backup equipment. We have similar issues with machinery that have the same number or run hours all failing at the same time. Generally run machinery on a 90/10 rule (One machine runs 27 days a month; the other runs 3 days a month) If I was panicky about a 4 bay NAS, I would replace one driver per year.

I like a mirrored raid so I can get going faster on a one drive fail.

Ie faster that doing a reload from USB external disk - total backup.

( the raid NAS is a backup )

When you retire both older NAS disks because the SMART date error count is going up they can be a older file copy of your date.

JBOD for Mac and RAID 10 ( 4 disk NAS units for Windows 10 - (I want the extra speed of Raid 10 not the extra storage)

Some NAS system do watch SMART date to give drive fail warning -

but best to have 3 sets of backups.

Raid 5+ Spare. . . .is that not Raid 6? Raid which is tolerant to TWO disk failures?

No, they’re different. RAID 6 uses two sets of parity bits, so if any two disks fail it can recover. RAID 5+spare leaves one disk unused and the remainder RAID 5. Let’s not discuss relative merits here, but I suspect the thinking is that the idle disk, being idle, is much less likely to fail, and system can automatically start rebuilding using the spare if another fails. It can recover from a failed spare plus one other, but of course, not from a failure of two non-spare disks.

I’m beginning to think that RAID 10 would be a better approach, though that reduces capacity.

So - - -first – - - recognize that a NAS has common mode failure potential not just from the disks, but from the NAS hardware itself. Or FIRE. To address this. . . one should maintain an independent backup of the data on the NAS.

Second - you want offsite backup. In my case, my data “lives” on an external drive. It is “backed up” to the NAS. There is also a second backup that (generally older) lives outside the house.

Oh, yes, of course. Actually, my NAS is used primarily for two things:

  • Time Machine backups for several home computers.
  • An archive of stuff not generally needed, but might be wanted. And is available on all computers.

That’s all for quick and easy access. The NAS itself doesn’t handle “backup” in the sense of retrieving old files, that’s the job of Time Machine, and it works quite well.

But you’re right, the NAS itself is still a point of failure, though unless it takes multiple disks with it when it dies, it can be replaced.

So, off-site backup, and “belt and suspenders.” I take a second Time Machine backup to a local disk on each computer, then occasionally swap them with off-site disks.

So that means:

  • If a computer fails, either the NAS or my Time Machine disk can be used to rebuild. (The local disk is much faster.)
  • If I need to recover a deleted file, the NAS backup will have it unless it’s past the Time Machine history. If the NAS fails at about the same time, the file should be on at least one of the Time Machine disks, but I might have to retrieve the right one from off-site.
  • If a Time Machine disk fails I can replace it with the off-site copy, then start a new one.
  • If my house burns down or ransomware attacks, I at least only lose what I created since the last Time Machine disk swap.

The “archive” stuff on the NAS doesn’t change often, and I don’t worry about recovering deleted files. I occasionally copy the archive off to a disk kept off site, though I’m thinking about one of the cloud backup services that the EX4 supports. (Not intending to start a discussion on this. I’ll consult the forum.)

If I was panicky about a 4 bay NAS, I would replace one driver (drive?) per year.

That at least addresses my original question, and is about in line with what I was thinking. I’d probably make the oldest disk the new spare, then replace it with the new disk.

Huh. Raid 5+spare. . . not sure I would ever pick that choice.

TBH: For home use. . . not sure any RAID structure makes a whole lot of sense.

  • Raid with stripes for performance only is logical if your PC’s are connected via a LAN cable. Who does that anymore?

  • Raid for redundancy. . .meh. . .with multiple backups still necessary. . .what does the extra disk in the NAS really buy you?

Full disclosure: I have two 2-bay NAS units. One is a backup deviced; mirrored disks. I recognize now that I have this only because it looks cool. Second device: Something I just play around with. . . .Had Raid 1 with OS/5. Now Raid 1 with OS/3. Soon to be JBOD with OS/3. :slight_smile:

larger storage facilities experience problems as well so servers have rows of 15 disks and these are stacked 2 or 3 and even 4 to a server. The 15 disk row is 2 disk tolerant for faults. Then the rows across several servers are then given another layer of redundancy. So that whole servers down does not break the storage pool.