The longer the time, the more likely the rebuild will fail.
That said, modern raid is much more robust against this kind of fault, but still: if you have one parity drive, one dead drive, and a raid rebuild, if you lose another drive you’re fucked.
Just rebuilt onto Ceph and it’s a game changer. Drive fails? Who cares, replace it with a bigger drive and go about your day. If total drive count is large enough, and depends if using EC or replication, it could mean pulling data from tons of drives instead of a handful.
It’s still the same issue, RAID or Ceph. If a physical drive can only write 100 MB/s, a 36TB drive will take 360,000 seconds (6000 minutes or 100 hours) to write. During the 100-hour window, you’ll be down a drive, and be vulnerable to a second failure. Both RAID and Ceph can be configured for more redundancy at the cost of less storage capacity, but even Ceph fails (down to read only mode, or data loss) if too many physical drives fail.
It’s raid rebuild times.
The bigger the drive, the longer the time.
The longer the time, the more likely the rebuild will fail.
That said, modern raid is much more robust against this kind of fault, but still: if you have one parity drive, one dead drive, and a raid rebuild, if you lose another drive you’re fucked.
Just rebuilt onto Ceph and it’s a game changer. Drive fails? Who cares, replace it with a bigger drive and go about your day. If total drive count is large enough, and depends if using EC or replication, it could mean pulling data from tons of drives instead of a handful.
It’s still the same issue, RAID or Ceph. If a physical drive can only write 100 MB/s, a 36TB drive will take 360,000 seconds (6000 minutes or 100 hours) to write. During the 100-hour window, you’ll be down a drive, and be vulnerable to a second failure. Both RAID and Ceph can be configured for more redundancy at the cost of less storage capacity, but even Ceph fails (down to read only mode, or data loss) if too many physical drives fail.