Re: [wlug] Hard Drive Failure Rates

28 Apr 2015

      ...
...
...
This is why I don’t bother with SMART. Sit down and work out the
maths: relying on SMART greatly increases your rate of replacement
of drives, without a corresponding increase in the reliability of
your data.
I’m struggling to understand your reasoning here.
It’s a well-known phenomenon in the mathematics of probability, known
as the base-rate fallacy. Remember that people’s intuitions about
probability are notoriously misleading. That’s why you have to actually
do the maths.
I see. Although I’m still not sure it applies here, if you follow my earlier comments: Pay attention to SMART only when it tells you bad things are happening.  Drives that are in a good condition and are not failing quite simply do *not* have increasing SMART counters (of the relevant counters like reallocated sector count, uncorrectable errors, etc).  Drives that do have these counts increasing are quite simply going to fail, and the rate at which they fail is very closed correlated with the rate at which these counters increase. There’s no base rate fallacy here, because any drive that is showing increasing counters is a problem drive.

I re-read your first email on this subject and you even acknowledged that backblaze make the same point I am, but you don’t put any weight on avoiding downtime. That’s up to you - it’s not how I’d approach it though :)

Re: [wlug] Hard Drive Failure Rates

Daniel Lawson