BackBlaze Q4 2015 results are in!

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

smitbret

Diamond Member
Jul 27, 2006
3,389
23
81
backblaze is a bunch of retards when it comes to data presentation

they release their data in such a mess of incomparable results that it's worthless as is

they talk about failure rates but they're comparing across different age drives which you just can't do

i mean saying these drives only have a 1% failure rate but we've only had them 6 months, is that better or worse that a drive that has a 10% failure rate after 5 years? No one knows

what they need to do is very simple:
a chart of cumulative failure rate by age

what % died by 6 months, 1 year, 2 years, etc

some drives will only have 6 months of data as they are new, some will go all the way through 5 years, that's fine, but at least the numbers at those points that they do have in common will be directly comparable

I find their presentation to be just fine. I don't need to know a 5 year failure rate vs. any other brand if the 1st year rate is 20%; I am not buying that drive. I don't think the BackBlaze data intends to tell you how Drive A compares to Drive B but it is incredibly useful in identifying drives at the extremes of the bell curve; i.e. HGST as a brand seems to be in a league of their own for reliability and you'd have to be an idiot to trust a 1.5TB or 3TB Seagate.

If I was paying them for research then I might feel justified in complaining about their data presentation but to get this much info about some broad reliability scores across this many consumer grade HDDs, and to get it for free,
 
Last edited:

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
I find their presentation to be just fine. I don't need to know a 5 year failure rate vs. any other brand if the 1st year rate is 20%

where do you see the 1st year rate for any drive?

it's not there, which is exactly my point
 

smitbret

Diamond Member
Jul 27, 2006
3,389
23
81
where do you see the 1st year rate for any drive?

it's not there, which is exactly my point

Sure, then pay someone for that information. My point is that their data is complete enough that you can easily find the dogs, especially if just follow along as data is released. You can hit their blog and look back, too. The data is there, it's just not spoon fed and probably shouldn't be.

Here's a link with annual failure rates going back to 2013:
https://www.backblaze.com/blog/hard-drive-reliability-q3-2015/
Pretty easy to find the losers there.

Here's links to the stored data:
https://www.backblaze.com/hard-drive-test-data.html
 

24601

Golden Member
Jun 10, 2007
1,683
39
86
http://imgur.com/a/rH6Ot

Ross Lazarus a day ago
In case anyone's interested in survival (or failure) methods applied to the most up to date raw backblaze data, kaplan meier curves for the 31 million or so rows by manufacturer and model at http://imgur.com/a/rH6Ot if anyone cares? Generated in R after some python processing of the raw csv. Happy to share code.
I think they give interesting subtleties and details you simply cannot get from failure rates. These ignore the unreliable uptime hours column and simply take time under observation between first and last appearance of the model_serial combination. KM deals with right censoring properly so you can distinguish early and late failure patterns from manufacturers and models. Models with <500 obs are filtered out to simplify the plot.
Oh, and if you're wondering about the bogus manufacturer "ST400LM012" on the KM by manufacturer, it comes from (eg) 2014/2014-09-26.csv as 2014-09-26,S2ZYJ9DDB27222,ST500LM012 HN,500107862016,0 so this method helps reveal duff data so typical of large datasets too - in a sense all the backblaze and other analysis results are slightly wrong because of that small drive naming error - at least I assume the drive is a mislabelled seagate? Relatively small error but shows how hard it is to keep big data honest.

Ross Lazarus Andy Klein a day ago
kaplan meier curves for the 31 million or so rows by manufacturer and model at http://imgur.com/a/rH6Ot if anyone cares? Generated in R after some python processing of the raw csv.
I think they give interesting details you just don't get from failure rates. These ignore the unreliable uptime hours column and simply take time under observation between first and last appearance of the model_serial combination. KM deals with right censoring properly so you can distinguish early and late failure patterns from manufacturers and models. Models with <500 obs are filtered out to simplify the plot.
Oh, and if you're wondering about the bogus manufacturer "ST400LM012" on the KM by manufacturer, it comes from (eg) 2014/2014-09-26.csv as 2014-09-26,S2ZYJ9DDB27222,ST500LM012 HN,500107862016,0 so this method helps reveal duff data too!
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
The data is there, it's just not spoon fed and probably shouldn't be.

People should intentionally make data visualizations obtuse and uninformative? why?

because only super clever people deserve to have good drives?
 

24601

Golden Member
Jun 10, 2007
1,683
39
86
People should intentionally make data visualizations obtuse and uninformative? why?

because only super clever people deserve to have good drives?

People get paid big bucks for that kind of thing.

They release all the data, so the entire complaint is moot.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
People get paid big bucks for that kind of thing.

is backblaze offering a 'free' view of the data and a 'paid' view of the data? no? so your comment is moot

They release all the data, so the entire complaint is moot.

incorrect. it mitigates it, but it does not make their visualization choices any less stupid
 

24601

Golden Member
Jun 10, 2007
1,683
39
86
is backblaze offering a 'free' view of the data and a 'paid' view of the data? no? so your comment is moot



incorrect. it mitigates it, but it does not make their visualization choices any less stupid

And there's my cue to stop spoon-feeding people on this forum and go back into lurk mode after my hiatus for 2 years.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
backblaze is a bunch of retards when it comes to data presentation

they release their data in such a mess of incomparable results that it's worthless as is

they talk about failure rates but they're comparing across different age drives which you just can't do

i mean saying these drives only have a 1% failure rate but we've only had them 6 months, is that better or worse that a drive that has a 10% failure rate after 5 years? No one knows

what they need to do is very simple:
a chart of cumulative failure rate by age

what % died by 6 months, 1 year, 2 years, etc

some drives will only have 6 months of data as they are new, some will go all the way through 5 years, that's fine, but at least the numbers at those points that they do have in common will be directly comparable

Download the data and give it a try yourself. Let's see what you come up with.
 

Elixer

Lifer
May 7, 2002
10,376
762
126
Well crap, just lost a wd30efrx with under 1000 hours on it, and this drive did go through a 24 hour burn-in as well.
SMART had no errors at all.
Went to write to the drive and... it started throwing disk errors in window's event logs.
"Reset to device, \Device\RaidPort0, was issued." (Which pretty much freezes the system for a time)
That is 2 Reds that have died on me with no warnings.
Cable swaps didn't help, nor did trying it on another machine.

3TB HDs seem to be problematic for both WD & Seagate.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |