I’ve got the same situation over here; just purchased 2 x 4TB WD Red WD40EFAX and I’m trying to use them with ZFS in Linux (zfs 0.8.3 and kernel 4.17.14), on a HP Gen8 Microserver.
The SMART info looks clean, except for the IDNF errors in the extended log. They show up during ZFS resilvering. Ran a complete badblocks test, as well as the short / long SMART tests and no further errors occured.
My complete SMART output + kernel errors: pastebin log.
I will move them to a windows machine and use WD’s test tool on them, but until then I’m doing a couple more tests with a new ZFS pool, to see if I can reproduce the problem. I also plan on testing btrfs (although i don’t think this looks like a software - zfs - issue)
Hmm, I had no idea they were SMR drives (didn’t even know about this technology). Are you saying that this could be the problem?
I’ve tested the drive in a windows box, other than the fact that their own tool (data lifeguard) doesn’t see the drive, it’s fine (no kernel error messages, smart checks up ok in various tool, performed a full surface test with no issues…).
Benchmark testing is fairly conclusively pointing to the drives being SMR.
I’m talking to OpenZFS experts at the moment including a couple of vendors who recommend REDs in their products - they’re alarmed to find that SMRs are in the channel without being differentiated and will be testing the EFAXs in the nest few days to try and verify my findings.
Of course one of the more painful problems with WD drives is the inability to upgrade their firmware in Linux or BSD…
*HINT HINT - Some of us don’t actually HAVE windows boxes - even in the work environment.
Does the WD Windows tool report the FW version as current? (Firmware Version: 82.00A82 here)
I’m assuming that as the drives are brand new into the channel there’s no update but you never know.
My issues are on a home ZFS setup on Linux/ZFS 0.8.3 too.
As we’re about to pull the trigger at $orkplace on a (LARGE) setup to replace a 400TB TrueNAS that’s been running flawlessly for the last 5 years I communicated my concerns/experience to the vendor I’ve been dealing with because we had “lots and lots and lots” of problems with previous products and I really don’t want the users to find any hint of latencies or instabilities which can be attributed to the fileserver - they have a blanket policy that SMR is best kept “far far away” from performance arrays and really don’t like the idea that they’re being submarined into the marketplace like this as it’s the kind of thing that drives up warranty/support costs rapidly.
I did a bit of googling and it seems it’s a known fact (?) that WD?0EFAX are SMR. I should have suspected there’s something fishy when I saw they were cheaper than WD40EFR, having a larger cache, too.
My drives have the same firmware, 82.00A82 (it’s in the pastebin of my original post too); their windows tool (wd data lifeguard 1.36) didn’t even detect the drive so I don’t know if there’s the possibility of an upgrade.
I’m returning the drives tomorrow. They “seem” fine, they just don’t want to work with ZFS in my setup. I’ll probably try my luck with some IronWolfs again (internet says they’re not SMR, but I didn’t find any official info), although last time i ordered, 1 out of 2 was DoA…
This issue is starting to get traction in a few forums and it’s been confirmed the drives cannot be used to rebuild RAID6 arrays either
EU laws are quite tough on false advertising. Changing the underlaying characteristics on drives advertised as suitable for NAS and RAID use isn’t going to go down at all well with regulators.
confirmed inasmuch as: EFAX appear to be SMR whilst EFRX are “CMR” (conventional)
Best not to use “PMR” as a term for the older drives - SMR (shingling) is an extension on top of PMR technology and I just had a WD regional marketing manager latch onto “PMR” to claim “the drives are PMR” and therefore there isn’t an issue.
The concensus is that in this particular instance the issue is rotten firmware and there’s no good reason why the drives should be returning these codes
I’m also getting feedback that it’s difficult-to-impossible to rebuild RAID5/RAID6 arrays using EFAX drives, not just RAIDZ/Z2/Z3 arrays.
It’s not just WD pulling this silliness. Examples have been cited of disguised DM-SMR units from SG too (eg: ST3000DM-007 and some Ironwolf models have been confirmed)
I’ve sent a heads-up to the smartmontools developer list to let them know what’s going on.
Hopefully ways will be quickly developed to flag disguised DM-SMR drives
Just to add more fun: TDMR (Two Dimensional Magnetic Recording) is a way of describing the zoning and block reassigning(indirection) functions necessary in a SMR drive and you essentially can’t have one without the other - there’s no need for this functionality in a CMR drive. That means the implications for issues are intertwined if you see drives described using either SMR or TDMR.
Thank you for raising this issue. This article suggests that all Reds up to 6TB use SMR. Do you have any info to confirm that EFRX models are still CMR?
ixSystems (makers of TrueNAS and FreeNAS) have confirmed my findings that this is a firmware bug:
“At least one of the WD Red DM-SMR models (the 4TB WD40EFAX with firmware rev 82.00A82) does have a ZFS compatibility issue which can cause it to enter a faulty state under heavy write loads, including resilvering. This was confirmed in our labs this week during testing, causing this drive model to be disqualified from our products. We expect that the other WD Red DM-SMR drives with the same firmware will have the same issue, but testing is still ongoing to validate that assumption.
In the faulty state, the WD Red DM-SMR drive returns IDNF errors, becomes unusable, and is treated as a drive failure by ZFS. In this state, data on that drive can be lost. Data within a vdev or pool can be lost if multiple drives fail.”