Why isn't ECC memory used more?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NP Complete

Member
Jul 16, 2010
57
0
0
Interesting disk failure rate research: http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf

Seems to indicate ~8% check sum hard disk failure rate on SATA disks (see nearline failure rates) - roughly in line with ~8% DIMM failure rate. I couldn't find the pupblication date, but since some references are cited ~2008, I'm betting publication is 2008-2009.

Once again, the disk research shows that subsequent errors have a high correlation with initial errors.

Definitely an eye opener: we have ~16% error rate in storage/memory subsystems. Confoundingly, it seems that failures aren't normally distributed - they distribution seems to be "none" or "lots". Maybe companies are banking on the so called "bathtub" curve, and counting on their devices failing hard rather than lingering in a state causing lots of silent data corruption?

In any case, I'm very curious about how this pertains to my own home system - maybe the ECC and ZFS proponents are onto something? I still have a bit of faith left in hardware and OS companies business analysis, but I definitely want some independent data now.
 

anikhtos

Senior member
May 1, 2011
289
1
0
hmm
8% at dimm
8% at hard drive
well the computer is not looking that safe anymore
well i had my share of crashes
and files corupted on the hard drive
so thinking seriously for a a xeon machine with zfs to store data
16% is quite alarmingggggggggggggggg
 

paperwastage

Golden Member
May 25, 2010
1,848
2
76
1. ECC doesn't cost $100 extra
2. You're confusing uptime and correctness

Consumers don't need five 9 uptime so paying for dual powersupplies and similar redundancy doesn't make sense.

But they still need data integrity. If the computer is running, it should not be corrupting their data.

Maybe a more accurate way of my reasoning:

I'm talking about reliability though here, and tradeoffs for reliability

A PSU has 0.000001% (making up a number here) chance of failing, but it is possible. Should we include dual-PSUs in all machines? No, because its not cost effective

A RAM has 0.1% of failing, while ECC RAM halves that to 0.05% (i'm making up numbers). Should we include ECC in all? Well, depends whether its cost effective... hence, you look at tradeoffs between cost and performance






let me give you why Intel isn't requiring ECC, and shouldn't be required to use ECC.

If Intel gives ECC support to Consumer boards, boards will be slightly more expensive (mobo manu has to pay a little more to certify ECC traces, test all the different types of ECC ram there are... ECC is sometimes finiky to)

Consumers who care about cost will shift over to AMD(cheaper). Intel loses $

Consumers who were at AMD(because they support ECC) might come over to Intel Consumer. Intel gains $.

Consumers who were paying for Intel high-end Xeon ECC will shift over to consumer. Intel loses $

Net: If SUM(Intel loses $ and gains $) < 0, then Intel won't do it

(or Intel raises the $ of consumer market to regain margin.... now, do ALL consumers like this increase? The ones who don't care about ECC do not like it, and the tradeoff b/w function and cost is negative for those consumers)

Now, substitute market data, Intel focus groups, surveys (i.e. real data) into this. Intel decides it is not good, dont do it.

Consumers aren't worse off as there IS A INTEL ECC version they could buy... just at a more expensive rate
 
Last edited:

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Maybe a more accurate way of my reasoning:

I'm talking about reliability though here, and tradeoffs for reliability

you're talking about uptime and i'm talking about correctness/data integrity

A PSU has 0.000001% (making up a number here) chance of failing, but it is possible. Should we include dual-PSUs in all machines? No, because its not cost effective

consumers don't need 5 nine uptime, a computer can break and that's ok

what's NOT OK is silently corrupting data


let me give you why Intel isn't requiring ECC, and shouldn't be required to use ECC.

If Intel gives ECC support to Consumer boards, boards will be slightly more expensive (mobo manu has to pay a little more to certify ECC traces, test all the different types of ECC ram there are... ECC is sometimes finiky to)

Consumers who care about cost will shift over to AMD(cheaper). Intel loses $

you might actually have a point if
1) ECC cost that much more
2) AMD was half-way competitive

but it doesn't and they aren't

Intel is doing it solely so they differentiate their xeon line and increase margins there

in the meantime they are knowingly selling defective systems to consumers

Consumers aren't worse off as there IS A INTEL ECC version they could buy... just at a more expensive rate

safety is not an option

Consumers aren't given the option of the gastank that is cheaper but more likely to explode
Consumers aren't given the option of the restaurant that can't pass the health inspection
Consumers shouldn't be given the option of buying non-ECC memory
 

anikhtos

Senior member
May 1, 2011
289
1
0
well i do not think that server cpus and comercial cpus all the diferense is the ecc support??

could easily give ecc to all and give virtualization to the server cpus
it is fair
why on earth i get a virtualization cpu for a consumer?!?!?
ecc would make more sence for a stable pc
after all most consumers can hardly do anything more than install games
and play them
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
Consumers aren't given the option of the gastank that is cheaper but more likely to explode
Consumers aren't given the option of the restaurant that can't pass the health inspection
Consumers shouldn't be given the option of buying non-ECC memory
Consumers shouldn't be given the option of buying vinyl records since they lose data when you read from them.
 

anikhtos

Senior member
May 1, 2011
289
1
0
Consumers shouldn't be given the option of buying vinyl records since they lose data when you read from them.
it is not the same
vinyl was grea twhen it apperead
and what happened when a media aperead that could not loose data??
ohhhhhhh noooooooo vinyl is dead long live the cd
so
in the one case the whole industry forced us to make the change
by cd player rebuy our music from vinyl to cd again
how much that cost?
and we argue that industry fears 10-15$ raise in cost to switch to ecc ram only
and when that switch will be made there will be no comparison
so now hard drives are so expensive peopel stoped buying them or buying pc cause they have to pay more??? than it used to pay?
of course not
maybe the market of indepented hard drive is rather lower
but the pc still sell
cause in all the systems the hard drive cost the same
so there is no offer with a pc with hard drives pre flood sooooo???
 

fuzzymath10

Senior member
Feb 17, 2010
520
2
81
I think the message we need to take home is that

1) ECC and non-ECC memory have the same risk due to flaws, but non-ECC theoretically has a higher probability of the flaw occurring.
2) We are given the option to purchase both types
3) The "fancier" version costs more. Perhaps not much more, but more

Almost every product available for purchase is segmented based on these three rules. There is no reason that customers are being "deceived" because the alternate product does not eliminate the flaw causing "deception", but does typically offer reduced risk of the flaw occurring. We underestimate the desire for people to pay less (even pennies less).

Nobody ever said non-ECC memory is 100% reliable, nor did anybody ever claim that hard drives, optical drives, or any other piece of hardware is 100% reliable. A warranty is offered for the off-chance that the hardware fails.

Should cheap blank CD/DVD media be sold when high quality discs are just a bit more expensive? A cheap disc might burn properly but become slowly unreadable at a faster rate than better quality media. People will still choose to buy a spindle for $10 vs $12.

Cheap fans have terrible bearings that will result in more noise and reduced cooling ability (and increase risk of hardware failure) compared to high quality fans. Should we go after all products with cheap fans because it might result in premature failure of the cooling system? This could be devastating while gaming.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
1) ECC and non-ECC memory have the same risk due to flaws, but non-ECC theoretically has a higher probability of the flaw occurring.

nothing theoretical about it, non-ECC has more undetected errors

We underestimate the desire for people to pay less (even pennies less).

No, that is precisely why we have government regulation to save consumers from themselves. Otherwise manufacturers would be in a constant race to sacrifice safety for savings.

It's why the government regulates auto safety
It's why the government regulates food safety
It's why the government regulates airlines safety
It's why Intel SHOULD step-up and stop making safety optional

Cheap fans have terrible bearings that will result in more noise and reduced cooling ability (and increase risk of hardware failure) compared to high quality fans. Should we go after all products with cheap fans because it might result in premature failure of the cooling system?

This isn't about hardware failure. This is about SILENTLY CORRUPTING DATA.

Hardware components can fail, that's a given and it's 'ok'

Silently corrupting data is not ok

Safety is not an option.
 

fuzzymath10

Senior member
Feb 17, 2010
520
2
81
It's why the government regulates auto safety
It's why the government regulates food safety
It's why the government regulates airlines safety
It's why Intel SHOULD step-up and stop making safety optional

Yet we still have car crashes, we still get food poisoning, airplanes still crash. The threshold is arbitrary, and based on a REASONABLE level of likelihood of adverse events occurring. Data corruption does not appear to be a major problem (even if we agree it does occur with some degree of frequency) to warrant moving to tighter tolerances, just as the government may decide that the current regulated levels are acceptable. If we wanted to have greater security, then nobody could fly, we'd either be spoon-fed by govt officials or grow our own food, and the roads would be empty because no car is 100% safe (or even 99.999% safe for that matter)

I'm sure chinese coach buses, greyhound, and megabus all have differing levels of crash frequencies, but all are hopefully above a minimum level required by the government. The fact that they all do not have the same safety level is a consumer choice, as long as they are all approved to operate by the regulator.

You are allowed to choose where to do your banking, but your money is not 100% safe anywhere, even if all banks are approved to operate. Bank A might have lower solvency than Bank B, but both are adequately solvent. Even if the government has deposit insurance, that guarantee is only as good as the government offering it.

Silently corrupting data is not ok

Safety is not an option.

Safety is an option, as long as its probability is sufficiently high. We cannot say for sure if it's 99%, 99.9%, or 99.99%. Just because non-ECC memory silently corrupts more frequently than ECC memory doesn't mean that it is dangerous, if both are above an adequate security level, which appears to be the case based on how frequently debates about the topic show up .
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
If we wanted to have greater security, then nobody could fly

false, we would fly everywhere because it's the safest mode of transportation

You are allowed to choose where to do your banking, but your money is not 100% safe anywhere

So because we can't be 100% safe we shouldn't try to improve safety at all? Is that what you're saying?

ECC is ORDERS OF MAGNITUDE safer than non-ECC. We're not talking 10% safer or 50% safer, it's closer to 10,000% safer.

And for the minimal cost, this is VERY LOW HANGING FRUIT, in fact I wouldn't be surprised if Intel was the target of class-action lawsuits in the future for knowingly selling defective components because they didn't take even the simplest steps to ensure data integrity.

Just because non-ECC memory silently corrupts more frequently than ECC memory doesn't mean that it is dangerous

when the difference is that great, yes it does

It is beyond negligent to continue to support non-ECC memory
 

anikhtos

Senior member
May 1, 2011
289
1
0
after every plane crash the goverment see
1)if all the regulations were met
if not the people not meeting them get punished

if the regulation was met then it searches why the plane crashed
if they find something is wrong with the plane design then the issue an order
and all planes fix that design flaw

1)airplanes is the safest way to travel
2) all train accidents are human fault nowdays

nope i agree ecc should have been the norm
it other thing an non ecc memory to cause you bsod more often
and another thing to store wrong data.
to save some $ sonsumers wont mind a bsod now and then??
that is not the case everyone hates bsod
and usually they happen when you do the same think
many times i tries to rn even a game and th esystem was crashing??
poor game programming poor memory
something was wrong and the system was crashing
that was not pleasant

and the consumers are not educated nor want to
what the consumers wants is for the goverment to look after their wallfare
you do not know what makes a good plane
but what you know is that the goverment makes sure than only good planes travel
many do not know what the part of the car are
but it does not matter all cars have to pass safety tests

what about computers?!?!?!?!
there is nothing
the market is free to the companies
what a shame
how many of you remmeber when a gb=1024 mb??
later some companies could not make large hard drive so they changes gb=1000mb
and what the goverment did nothing to protect the consumers
adter 5-6 years of chaos
we introdused the gib in the place of the gb
wow
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
AMD does support ECC on their CPUs, BIOS and motherboard design willing. But it is in their memory-controller. Intel actually goes out of their way to disable the ECC support on their consumer CPUs.

Artificial product differentiation.
Intel works very hard to make absolutely sure that ECC cannot be used unless you purchase an approved enterprise processor for a massive markup.
http://en.wikipedia.org/wiki/Xeon

ECC is good computing and should be mandated for all RAM on all systems. Its cost delta is insignificant, except that it has been forcibly inflated in order to siphon more money from companies that MUST have ECC in their servers.
 

NP Complete

Member
Jul 16, 2010
57
0
0
ECC is good computing and should be mandated for all RAM on all systems. Its cost delta is insignificant, except that it has been forcibly inflated in order to siphon more money from companies that MUST have ECC in their servers.

ECC is good. Errors are bad. Intel is inflating prices through segmentation.

These are all facts that can be discussed rationally.

Cost delta, acceptable error rates and others are also facts that can be discussed, but there are no hard numbers you provide.

Everything else is zealotry - especially stating that ECC should be mandated on ALL systems. Why not mandate UPS on all systems? RAID arrays, CPU error detection modules sold on high end (Itanic, etc) parts are "good", and help prevent data loss or errors. I think you're ignoring costs, and instances where reliability isn't as important as cost.

Either back up your assertions with numbers, or don't state them as facts.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Why not mandate UPS on all systems?

if you do your part (save frequently) it doesn't matter

RAID arrays

if you do your part (backup frequently) it doesn't matter

The thing about memory errors is that even if you do everything you should, you still get screwed

CPU error detection modules sold on high end (Itanic, etc) parts are "good", and help prevent data loss or errors.

I'm not familiar enough with those to comment on them

I think you're ignoring costs

Not so. ECC is only marginally more expensive while providing hugely substantial protections.

and instances where reliability isn't as important as cost.

any system that is sold to a consumer COULD (and LIKELY WILL) be used for something critical. Thus all consumer models should do everything reasonable to protect their data. ECC is beyond reasonable, it is willful negligence to exclude it.

Either back up your assertions with numbers, or don't state them as facts.

How can you deny that Intel is using ECC to artifically segment the market for monetary reasons?
 
Last edited:

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Everything else is zealotry - especially stating that ECC should be mandated on ALL systems. Why not mandate UPS on all systems? RAID arrays, CPU error detection modules sold on high end (Itanic, etc) parts are "good", and help prevent data loss or errors. I think you're ignoring costs, and instances where reliability isn't as important as cost.

Mandated not in terms of "it is the law" but in terms of "our CPUs support only ECC ram"
Having a memory controller support both ECC and non ECC inflates costs.
Mandatory the same way intel CPUs "mandate" that you use DDR3 rather then giving you a choice of DDR3, 2, 1 and non DDR memory.

From Least to Most expensive:
1. No ECC
2. ECC
3. Both ECC and non-ECC supported simultaneously.
 
Last edited:

NP Complete

Member
Jul 16, 2010
57
0
0
You're ignoring a company's moral and legal obligation to take reasonable care of customers' data and not sell defective products.

There is no legal obligation what so ever - CPUs, RAM, etc all come with warranties which expressly limit liability due to damages from data loss. No company in their right mind would ever sell a part that warrantied data loss unless they added the cost of data loss into the price of the product.

Case in point - imagine you've sold <insert computer part> to CEO of large company X. He's editing the final draft of a sale proposal to buy out another company for $$$ billions. <insert computer part> crashes/errors out and the CEO loses the sale. If you've sold him a part with non data loss guarantee you're on the line for $$$ billions due to data loss.

There is no guarantee on any RAM, ECC or non-ECC. Warranties are limited to the cost of parts only, and time limited as well.

Yes, there are such things as anti-lemon laws, where a part is bought with certain expectations, but you have in no way demonstrated that there is any illegal activity going on in RAM sales, or RAM is sold with any implied warranty against data loss, corruption or failures.

In a similar vein, how is it immoral to sell non-ECC RAM? Anyone who buys RAM believing that it'll always work flawlessly is clearly delusional. Anyone who assumes their data is 100% safe is also delusional. Show me a clause in a RAM warranty guaranteeing data integrity, and you may have a point about "illegal" and "immoral".
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
There is no legal obligation what so ever - CPUs, RAM, etc all come with warranties which expressly limit liability due to damages from data loss.

And such limitations have been repeatedly shown to not be worth the electrons they're written on

No company in their right mind would ever sell a part that warrantied data loss unless they added the cost of data loss into the price of the product.

Stop making straw men. No one's saying warranty against ALL data loss. Rather, reasonable and prudent steps should be taken. Bad memory is a known issue. There is a known solution that is affordable and reasonable. Not implementing said solution that addresses 99.999% of cases thus becomes a huge liability issue. You know there's a problem, you know the solution, you INTENTIONALLY withhold the solution to boost your bottom line.

That is the stuff of a trial lawyer's dreams.

Case in point - imagine you've sold <insert computer part> to CEO of large company X. He's editing the final draft of a sale proposal to buy out another company for $$$ billions. <insert computer part> crashes/errors out and the CEO loses the sale. If you've sold him a part with non data loss guarantee you're on the line for $$$ billions due to data loss.

Again, it's not about guaranteeing anything, it's about taking reasonable precautions. When this case comes before a jury, it will be shown that it knew memory errors were a huge problem and yet took precisely ZERO steps to prevent and/or detect them. THAT is what is going to kill them.

Not that somebody lost data, but that Intel intentionally took steps to make it FAR more likely to happen. That is where their liability comes in.


There is no guarantee on any RAM, ECC or non-ECC. Warranties are limited to the cost of parts only, and time limited as well.

Again, not about guarantees.

Yes, there are such things as anti-lemon laws, where a part is bought with certain expectations

This isn't about a 'lemon' that has to be repeatedly returned to a shop, this is about negligence and liability

An engine that repeatedly fails to start falls under the lemon law

An engine that explodes in a fireball falls under negligence and liability

See the difference?


If an engine due to some huge fluke explodes once, the manufacturer will probably not get hit hard.

If the manufacturer KNEW the engine was likely to explode yet decided to keep making it year after year, they're going to get crushed in court.

See the difference?

People will forgive one-off issues, they don't forgive known design-flaws that aren't addressed, or even worse, intentionally added simply to save a few bucks.
 

moriz

Member
Mar 11, 2009
196
0
0
Here's an interesting question: what is the chance of a memory error changing, say, $50 to $5 on a spread sheet? That is to say, how much of that 8% annual memory errors on non ecc ram actually translate to tangible errors that make a difference?

Something tells me that the actual percentage is too low to really matter. It would've been noticed a long time ago otherwise.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Stop making straw men. No one's saying warranty against ALL data loss. Rather, reasonable and prudent steps should be taken. Bad memory is a known issue. There is a known solution that is affordable and reasonable. Not implementing said solution that addresses 99.999% of cases thus becomes a huge liability issue. You know there's a problem, you know the solution, you INTENTIONALLY withhold the solution to boost your bottom line.

That is the stuff of a trial lawyer's dreams.

This is a fairly good argument.
Trial lawyers + jury... I could see this perhaps working.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Something tells me that the actual percentage is too low to really matter. It would've been noticed a long time ago otherwise.

It matters to those who are impacted.
 
Last edited:

bradley

Diamond Member
Jan 9, 2000
3,671
2
81
Virtuallarry asks a great question. I had ECC modules in my old Abit IT5H. We haven't regressed since then, but have greatly advanced in manufacturing standards. ECC would be easy to incorporate for the home user without ever missing a beat, not that such a standard could ever be completely necessary. Although I had a friend with dirty EMF BSODs fixed by using ECC memory.

Once ARM starts to become a threat, you will probably see Intel flip on ECC and hyperthreading etc. AMD never gets any credit for not playing those games. It's just weird to see how many defend status quo segmentation with some faux stake in nothingness.
 

moriz

Member
Mar 11, 2009
196
0
0
It matters to those who are impacted.

And the chance of being impacted is pretty damn small.

Don't get me wrong, I have no problems with ecc memory. I simply cannot justify the additional cost and the very small chance of it making a difference. Similarly, I won't object if the feature is available at small price premiums, except that's not the case, and I'm not willing to pay for features that will likely never benefit me. Absolute bit accuracy is not required for me, and I keep thorough backups of critical stuff.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
And the chance of being impacted is pretty damn small.

More than 15% of all machines will be impacted, that's not 'pretty damn small'

I keep thorough backups of critical stuff.

What good is a backup of a corrupted file?

Similarly, I won't object if the feature is available at small price premiums, except that's not the case, and I'm not willing to pay for features that will likely never benefit me.

People like you are EXACTLY why it needs to be mandatory.

If the government didn't regulate airline safety, you would always go with the cheapest airline because it was 'unlikely' your particular flight would crash.

If it's not mandatory, market pressure will push manufacturers to shave pennies and eliminate it, meaning consumers who DO need it (which is most of them) won't get it.

Consumers shouldn't have to know anything about ECC just like you shouldn't have to inspect the kitchen of every restaurant you visit.

It should just work and be safe.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |