Why isn't ECC memory used more?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mark R

Diamond Member
Oct 9, 1999
8,513
14
81
ECC doesn't fix defective chips. A defective ECC DIMM will behave the same way as as a regular DIMM.

Actually, no it won't.

With a regular DIMM, you won't know if corrupt data has been used in a calculation or saved.

With ECC, corruption in a DIMM will, if minimal, have no effect on the system, except to trigger an error report in the OS. The corruption will be silently repaired, and the system will operate normally.

If moderate, the error is guaranteed to be detected immediately, and the OS will immediately trigger a critical error condition (BSOD on windows), so that the corrupted data cannot be saved to disk, or transmitted to another computer.

If the corruption is catastrophic, with ECC there will still be a chance that the system will detect the error and insta-BSOD, but this level of corruption may slip past the checks.

There are also different ways of building ECC DIMMs that makes a difference. Most Memory chips are 8 bit - 8 are installed on a 64 bit DIMM, 9 on a regular ECC DIMM.

If you build an ECC DIMM from 18 4-bit chips, then it is possible to wire the chips up so that if any one chip malfunctions, the error is guaranteed to be corrected in real-time, and if any 2 chips fail, detection is guaranteed. This has been called "chipkill" as an entire chip could fall off the DIMM and the system would continue to operate without error. The more expensive RAM and more complex wiring on the DIMM PCB make this a premium option.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
i guess the $ for consumers(extra cost in RAM, latency) isn't really worth the benefits of ECC...

Tell that to the consumer whose computer keeps crashing because of bad ram, yet they are given no indication what the actual problem is.

Intel is being foolishly shortsighted and is going to end up losing customers to 'alternative' platforms (tablets and whatnot) even quicker


Now imagine Intel included a 'random error generator' in their chips and differentiated their product line by how often it was triggered. A $50 chip might be set to 1 error/hour while a $120 chip would be 1 error/week and the $180 chip was 1 error/month. And if you wanted the error generator disabled completely, you had to pay $800.

Would anyone stand for that sort of pricing scheme?
 

anikhtos

Senior member
May 1, 2011
289
1
0
Tell that to the consumer whose computer keeps crashing because of bad ram, yet they are given no indication what the actual problem is.

Intel is being foolishly shortsighted and is going to end up losing customers to 'alternative' platforms (tablets and whatnot) even quicker


Now imagine Intel included a 'random error generator' in their chips and differentiated their product line by how often it was triggered. A $50 chip might be set to 1 error/hour while a $120 chip would be 1 error/week and the $180 chip was 1 error/month. And if you wanted the error generator disabled completely, you had to pay $800.

Would anyone stand for that sort of pricing scheme?
intel divides market in so much diferent parts
but really ecc should be a mus tin computers not a server option
virtualization yes that feuture should be for the server but not the ecc
which make the system overall stable.
and by the way then it would be at the consumer to install ecc for a stable system or install non ecc for a few % more perfomance

but if ecc was the norm i bet we could have chips running in better speeds and lowr cas as it now with ram non ecc
 

VirtualLarry

No Lifer
Aug 25, 2001
56,477
10,137
126
Tell that to the consumer whose computer keeps crashing because of bad ram, yet they are given no indication what the actual problem is.

Intel is being foolishly shortsighted and is going to end up losing customers to 'alternative' platforms (tablets and whatnot) even quicker


Now imagine Intel included a 'random error generator' in their chips and differentiated their product line by how often it was triggered. A $50 chip might be set to 1 error/hour while a $120 chip would be 1 error/week and the $180 chip was 1 error/month. And if you wanted the error generator disabled completely, you had to pay $800.

Would anyone stand for that sort of pricing scheme?

Wow, that's a really great analogy, thanks.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Really? You'll be able to see logs of errors with a non-ECC DIMM? I have had an ECC DIMM that just locked a machine up, but never gave data errors, so yes, it happens, but that's not the general case.

What I meant was ECC isn't designed to correct defective memory. ECC is designed to correct random bit errors, such as the errors created by cosmic rays.
 

NP Complete

Member
Jul 16, 2010
57
0
0
Tell that to the consumer whose computer keeps crashing because of bad ram, yet they are given no indication what the actual problem is.

Intel is being foolishly shortsighted and is going to end up losing customers to 'alternative' platforms (tablets and whatnot) even quicker


Now imagine Intel included a 'random error generator' in their chips and differentiated their product line by how often it was triggered. A $50 chip might be set to 1 error/hour while a $120 chip would be 1 error/week and the $180 chip was 1 error/month. And if you wanted the error generator disabled completely, you had to pay $800.

Would anyone stand for that sort of pricing scheme?

This is a horrible analogy - your conflating two issues: someone selling an purposely defective device to generate more profit, versus a someone selling a device to mitigate inherent defects to generate more profit.

To borrow from a slightly, but always popular analogy - would people pay more to buy a car that goes 100K miles before a tune up versus one that only goest 50K? Yes, with more people buying the car that goes 100k miles without a tune up the smaller the price differential. Some people either value reliability more, or have more money, but not everyone does.

Errors are inherent in chip design - besides hard errors (those being discussed here), there are issues such as metastability & the like. Everything has a failure rate - designers analyze the failure rate and weigh cost of reducing the failure rate versus the overall cost of the device.

Should Intel charge more for enabling the ECC device? Perhaps not - AMD seems to be able to sell all their chips with ECC support enabled and still make money. But then again, Intel is vastly more profitable than AMD, which partially enables them to make faster chips... the question is not a black & white issue, but rather one with shades of grey.

And this doesn't even begin to address actual chance of failures - if you perform analysis on number published by google, you'll see that the error rate is relatively low for a typical consumer device. The number in the paper look big because google is big. Unfortunately, google merely presents some raw numbers in their paper and does no economic analysis of costs of implementing ECC vs non-ECC - I think those results would be very interesting.
 

bigted41

Junior Member
Mar 22, 2008
15
0
0
i know that 1366 xeons support the ecc and intel made that xeon server chip compatible with the consumer desktop boards, so if i were so inclined i could be running a xeon on my 1366 and use some ECC ram at home.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
14
81
i know that 1366 xeons support the ecc and intel made that xeon server chip compatible with the consumer desktop boards, so if i were so inclined i could be running a xeon on my 1366 and use some ECC ram at home.

That assumes that the consumer level board manufacturers connected up all the ECC pins (instead of leaving them unconnected) and added ECC support to the BIOS.

Historically, a number of consumer level board manufacturers have deliberately disbaled ECC on their boards, because they didn't want to pay for the testing.
 

paperwastage

Golden Member
May 25, 2010
1,848
2
76
Tell that to the consumer whose computer keeps crashing because of bad ram, yet they are given no indication what the actual problem is.

What I meant was ECC isn't designed to correct defective memory. ECC is designed to correct random bit errors, such as the errors created by cosmic rays.

^^ that... ECC isn't supposed to protect against defective ram, just bits accidentally flipped b/c of cosmic rays

http://en.wikipedia.org/wiki/ECC_memory#Problem_background
 

exdeath

Lifer
Jan 29, 2004
13,679
10
81
I like to use cars as an example.

Making V6 and V8 engines on two separate lines is expensive.

Instead make all of them V8s. Except you remove two spark plugs and sell a "V6" model for $10k to justify and prop up the perceived higher value of marketing the V8 model as a "premium" item for $30k even when the cost the same to produce.

When people discover they can just enable the two missing spark plugs and pay the real price for the item, you start welding the holes shut to protect your false $20k markup. It's that last step that makes it malicious and vindictive.

At least Intel has the K models now and isn't waging war on overclocking like they used to (and lying that it's to stop counterfeiting) Cache amounts, number of cores, max stable clocks, etc, I understand yields and disabling defective parts of chips and binning them because of that, the same way you use all product of a beef cow (diff cuts, ground beef, etc) to avoid throwing dies away and wasting money.

But going the extra mile in disabling something that actually *works* is unacceptable. Doubly so later in the mature and refined manufacturing process when your yields approach 100% and now you aren't disabling defective units to make viable products at different price points, but are now damaging perfectly functional units just to maintain an arbitrary market segmentation and prop up prices on undamaged "premium" parts.

Actually they do this with cars to. Sometimes the only difference between the X model and the Z model is 20 HP achieved by a different tune file, then they try to block after market access to the ECU to make the change yourself.

Go f**k yourself. Once I've bought something I will do what I please with it, it's mine now.
 
Last edited:

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
^^ that... ECC isn't supposed to protect against defective ram, just bits accidentally flipped b/c of cosmic rays

http://en.wikipedia.org/wiki/ECC_memory#Problem_background

ECC doesn't care whether cosmic rays, defective chips or santa claus caused the bit error, a bit error is a bit error and ECC will fix it regardless

If it's a multi-bit error and ECC can't fix it, at least ECC will alert you to the fact that something is wrong with the memory vs leaving you wondering what happened.

This nonsensical distinction between 'defective ram' and 'random error' needs to stop now.

It just doesn't matter, ECC is always better.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
This is a horrible analogy - your conflating two issues: someone selling an purposely defective device to generate more profit, versus a someone selling a device to mitigate inherent defects to generate more profit.

A computer system without ECC IS purposely defective.


To borrow from a slightly, but always popular analogy - would people pay more to buy a car that goes 100K miles before a tune up versus one that only goest 50K?

You're completely missing the point. I don't have a problem with something breaking, as long as it's obvious it's broke. SILENTLY CORRUPTING DATA IS NEVER ACCEPTABLE PERIOD

Big servers pay for redundancy to ensure that they never go down. Thus you have stuff like redundant power supplies. One power supply goes down, the other keeps you running until the bad unit is replaced.

Consumers don't need the guaranteed uptime. If a power supply fails, well their system will be down till they get it replaced.

What consumers do need and deserve is GUARANTEED CORRECTNESS. If their computer is up and running, they should be confident that it isn't corrupting their tax returns or making that document they've been working on for hours unreadable.

Yes, with more people buying the car that goes 100k miles without a tune up the smaller the price differential. Some people either value reliability more, or have more money, but not everyone does.

This isn't about reliability, it's about correctness.

To use your car analogy, if a manufacturer wanted to use cheaper spark plugs that didn't last as long, that's ok. If they wanted to use cheaper spark plugs that had a chance of causing your car to explode in a fireball that's NOT OK.

No matter how cheap the design, IT MUST FAIL SAFE.

Anything else is DEFECTIVE.

Data correctness is NOT AN OPTION.

Everything has a failure rate - designers analyze the failure rate and weigh cost of reducing the failure rate versus the overall cost of the device.

It's not about the failure rate, it's ensuring that failures don't corrupt your data.

A design that allows a failure to corrupt your data is DEFECTIVE, plain and simple.

Should Intel charge more for enabling the ECC device?

Not only should they not charge extra, they should make it mandatory so all systems require it. And not only should they make it mandatory, they should be faced with a massive class action lawsuit for knowingly selling defective products.

And this doesn't even begin to address actual chance of failures - if you perform analysis on number published by google, you'll see that the error rate is relatively low for a typical consumer device. The number in the paper look big because google is big.

8% of all DIMMs suffered at least one error in a year. 8% is 8% and doesn't matter how big Google is.
 
Last edited:

Smoblikat

Diamond Member
Nov 19, 2011
5,184
107
106
I like to use cars as an example.

Making V6 and V8 engines on two separate lines is expensive.

Instead make all of them V8s. Except you remove two spark plugs and sell a "V6" model for $10k to justify and prop up the perceived higher value of marketing the V8 model as a "premium" item for $30k even when the cost the same to produce.

When people discover they can just enable the two missing spark plugs and pay the real price for the item, you start welding the holes shut to protect your false $20k markup. It's that last step that makes it malicious and vindictive.

At least Intel has the K models now and isn't waging war on overclocking like they used to (and lying that it's to stop counterfeiting) Cache amounts, number of cores, max stable clocks, etc, I understand yields and disabling defective parts of chips and binning them because of that, the same way you use all product of a beef cow (diff cuts, ground beef, etc) to avoid throwing dies away and wasting money.

But going the extra mile in disabling something that actually *works* is unacceptable. Doubly so later in the mature and refined manufacturing process when your yields approach 100% and now you aren't disabling defective units to make viable products at different price points, but are now damaging perfectly functional units just to maintain an arbitrary market segmentation and prop up prices on undamaged "premium" parts.

Actually they do this with cars to. Sometimes the only difference between the X model and the Z model is 20 HP achieved by a different tune file, then they try to block after market access to the ECU to make the change yourself.

Go f**k yourself. Once I've bought something I will do what I please with it, it's mine now.

Damn straight dude. Im still angry that intel disables hyperthreading on the 2500K although it SUPPORTS IT. Look at AID64 i beleive, on my I5 it says HTT: Supported
HTT Status: Disabled.
Just like EC or overclocking, its just pure evil and greed, thats why i support AMD, they might not have any cool features but at least they dont block the ones they do have
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Damn straight dude. Im still angry that intel disables hyperthreading on the 2500K although it SUPPORTS IT. Look at AID64 i beleive, on my I5 it says HTT: Supported
HTT Status: Disabled.
Just like EC or overclocking, its just pure evil and greed, thats why i support AMD, they might not have any cool features but at least they dont block the ones they do have

Now I actually don't have a problem with Intel disabling optional features (like hyperthreading) to help differentiate their product lines.

It's not unlike Nvidia using one chip and then fusing off different amount of shaders to create different products.

But, ECC is not optional. It's like a car company offering to upgrade your gas tank so it won't explode in a fireball in an accident. That is a mandatory requirement no matter the level of the car.

All cars are required to meet certain passenger safety standards and likewise all chips should be required to meet certain data safety standards.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
ECC doesn't care whether cosmic rays, defective chips or santa claus caused the bit error, a bit error is a bit error and ECC will fix it regardless

How about you look it up before you go spouting off bad information?
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
Intel is probably listening to it's customers. Do OEMs like Dell and HP want to have to increase their costs by $10-$20 per PC to enable ECC? I doubt it. Enabling ECC costs more in terms of memory, MB validation/testing, and CPU testing/validation. It may not seem like much to us (enthusiasts) and I would love ECC, but it is probably a cost decision.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,477
10,137
126
Intel is probably listening to it's customers. Do OEMs like Dell and HP want to have to increase their costs by $10-$20 per PC to enable ECC? I doubt it. Enabling ECC costs more in terms of memory, MB validation/testing, and CPU testing/validation. It may not seem like much to us (enthusiasts) and I would love ECC, but it is probably a cost decision.

Would it be only a "cost decision", to release CPUs on the market that only pass functional testing 99% of the time? That at times, they might make errors during use, or be unstable?

Because that's what the lack of ECC is.

It's really ironic that AMD got a bad rep back in the day for being unstable, etc., when it's Intel that is more unstable than AMD these days, due to the lack of ECC when using large amounts of RAM, and running processing 24/7.
 

exdeath

Lifer
Jan 29, 2004
13,679
10
81
I'm not even saying I care for or want ECC. I'm just saying it starts getting devious when things that are already there and are actually working are intentionally disabled to prop up perceived value of the "superior" product, rather than being disabled and binned as lower parts due to random yield errors.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Please define a real world hardware failure mode that would cause a single bit error.

a weak cell that isn't always read reliably

your turn: define a real world hardware failure where ECC isn't helpful (stick is totally dead and machine won't boot doesn't count)
 

NP Complete

Member
Jul 16, 2010
57
0
0
Now I actually don't have a problem with Intel disabling optional features (like hyperthreading) to help differentiate their product lines.

It's not unlike Nvidia using one chip and then fusing off different amount of shaders to create different products.

But, ECC is not optional. It's like a car company offering to upgrade your gas tank so it won't explode in a fireball in an accident. That is a mandatory requirement no matter the level of the car.

All cars are required to meet certain passenger safety standards and likewise all chips should be required to meet certain data safety standards.

How is ECC "not optional" like hyperthreading, or even more cores on a CUDA chip? Your statement hyperbolizes the issue - you make it seem like without ECC computers will be crashing every day, and documents will "rot" into an unusable state quickly.

Companies actually do perform quality analysis on the products they ship - if fatal errors were as high as you seem to indicate, then most companies wouldn't be stupid enough to ship the product.

Yes, ECC decreases the chance of crashes and bit rot, but the incidence is already low enough for most consumers that it isn't worth extra money. So what if I get a single pixel corrupted in my picture album? So what if my OS becomes unstable every 2-3 years? I'm willing to re-install my OS, or re-touch up my pictures occasionally if it means I can upgrade my computer more often.

Would I store nuclear launch codes on a consumer-grade desktop computer? Hell no, I'd probably use several servers, all with ECC to make sure that I didn't lose incredibly important and sensitive data.

If you want ECC, and want the extra assurance ECC provides, have at it! I also hope you're follow good backup procedures since even ECC has uncorrectable error rates of a few hundred per year, which can result in data loss.
 

NP Complete

Member
Jul 16, 2010
57
0
0
a weak cell that isn't always read reliably

your turn: define a real world hardware failure where ECC isn't helpful (stick is totally dead and machine won't boot doesn't count)


Just for fun - what happens if your single bit error is in your ECC code? ECC codes are stored in memory like the rest of the data and aren't immune to corruption.

If you have a single bit error in your ECC code, it'll likely show up as an uncorrectable error to the OS.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Just for fun - what happens if your single bit error is in your ECC code? ECC codes are stored in memory like the rest of the data and aren't immune to corruption.

If you have a single bit error in your ECC code, it'll likely show up as an uncorrectable error to the OS.
I'm not sure about Ye Old Days, but with Chipkill and equivalents, that should not be the case.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |