How to fix my ZFS pool mistakes

@abies_exarchia@lemm.ee · 2 years ago

How to fix my ZFS pool mistakes

Max-P · 2 years ago

That article is from 2013, so I’m a bit skeptical about the claims about under 1 TB drives. It was probably reasonable advice back then when 1 TB capacities were sorta cutting edge. Now we have 20+ TB hard drives, nobody’s gonna be making arrays of 750 GB drives.

I have two 4TB drives in a simple mirror configuration and have resilvered it a few times due to oopsies and it’s been fine, even with my shitty SATA ports.

The main concern is bigger drives take longer to resilver because well, it’s got much more data to shuffle around. So logically, if you have 3 drives that are the same age and have gotten the same amount of activity and usage, when one gives up it would be likely for the other 2 to be getting close as well. If you only have 1 drive of redundancy, then this can be bad because temporarily, you have no redundancy so one more drive failure and the zpool is gone. If you’re concerned about them all failing at the same time, the best defense is either different drive brands, or different drive ages.

But you do have backups, so, if that pool dies, it’s not the end of the world. You can pull it back from your 18TB mirror array. And it’s different drives, so those are unlikely to fail at the same time as your 3x4TB drives, let alone 2 more of them. You need 4 drives to give up in total in your particular case before your data is truly gone. That’s not that bad.

It’s a risk management question. How much risk do you tolerate? How’s your uptime requirements? For my use case, I deemed a simple 2 drive mirror to be sufficient for my needs, and I have a good offsite backup on a single USB external drive, and an encrypted cloud copy of things that are really critical and I can’t possibly lose like my Keepass database.

@Hopfgeist@feddit.de · 1 year ago

Bit error rates have barely improved since then. So the probability of an error whenr reading a substantial fraction of a disk is now higher than it was in 2013.

But as others have pointed out. RAID is not, and never was, a substitute for a backup. Its purpose is to increase availability. And if that is critical to your enterprise, these things need to be taken into account, and it may turn out that raidz1 with 8 TB disks is fine for your application, or it may not. For private use, I wouldn’t fret. but make frequent backups.

This article was not about total disk failure, but about the much more insidious undetected bit error.