Back when I was even less experienced in self-hosting I setup my media/backup server using a RAIDZ1 array and 3 x 8TB disks. It’s been running well for a while and I haven’t had any problems and no disk errors.

But today I read a post about ‘pool design rules’ stating that RAIDZ1 configurations should not have drives over 1TB because the chances of errors occurring during re-silvering are high. I wish I had known this sooner.

What can I do about this? I send ZFS snapshots to 2 single large (18TB) hardrives for cold backups, so I have the capacity to do a migration to a new pool layout. But which layout? The same article I referenced above says to not use RAIDZ2 or RAIDZ3 with any less than 6 drives…I don’t want to buy 3 more drives. Do I buy an additional 8TB drive (for a total of 4 x 8TB) and stripe across two sets of mirrors? Does that make any sense?

Thank you!

  • Max-P
    link
    fedilink
    English
    38 months ago

    That article is from 2013, so I’m a bit skeptical about the claims about under 1 TB drives. It was probably reasonable advice back then when 1 TB capacities were sorta cutting edge. Now we have 20+ TB hard drives, nobody’s gonna be making arrays of 750 GB drives.

    I have two 4TB drives in a simple mirror configuration and have resilvered it a few times due to oopsies and it’s been fine, even with my shitty SATA ports.

    The main concern is bigger drives take longer to resilver because well, it’s got much more data to shuffle around. So logically, if you have 3 drives that are the same age and have gotten the same amount of activity and usage, when one gives up it would be likely for the other 2 to be getting close as well. If you only have 1 drive of redundancy, then this can be bad because temporarily, you have no redundancy so one more drive failure and the zpool is gone. If you’re concerned about them all failing at the same time, the best defense is either different drive brands, or different drive ages.

    But you do have backups, so, if that pool dies, it’s not the end of the world. You can pull it back from your 18TB mirror array. And it’s different drives, so those are unlikely to fail at the same time as your 3x4TB drives, let alone 2 more of them. You need 4 drives to give up in total in your particular case before your data is truly gone. That’s not that bad.

    It’s a risk management question. How much risk do you tolerate? How’s your uptime requirements? For my use case, I deemed a simple 2 drive mirror to be sufficient for my needs, and I have a good offsite backup on a single USB external drive, and an encrypted cloud copy of things that are really critical and I can’t possibly lose like my Keepass database.

    • @Hopfgeist@feddit.de
      link
      fedilink
      English
      08 months ago

      Bit error rates have barely improved since then. So the probability of an error whenr reading a substantial fraction of a disk is now higher than it was in 2013.

      But as others have pointed out. RAID is not, and never was, a substitute for a backup. Its purpose is to increase availability. And if that is critical to your enterprise, these things need to be taken into account, and it may turn out that raidz1 with 8 TB disks is fine for your application, or it may not. For private use, I wouldn’t fret. but make frequent backups.

      This article was not about total disk failure, but about the much more insidious undetected bit error.