Is ECC a big deal?

Caiwyn

Member
May 19, 2000
82
0
0
Well, I've come to find that the i815 chipsets don't support ECC (error-checking) RAM. So now I have to decide what to do. Should I just get non-ECC RAM and write it off as unnecessary?

As I've mentioned before, the machine I'm building is for an audio recording workstation. That's a heavy content creation workload with memory-intensive applications.

So my question is this... how important do you think ECC would be? Is it not that big a deal? Is it useless? Is it a good idea to have? Tell me!
 

Citadel535

Senior member
Jan 16, 2001
816
0
0
If you read on Crucial's big memory guide on their web site. It is really used in servers that it is critical for fault tolerance. It is actually slower than non-ECC memory.

With the stability of most systems today (non-ECC memory included) you shouldn't have anything to worry about.
 

TunaBoo

Diamond Member
May 6, 2001
3,280
0
0


<< dont worry about it for desktop or gaming machines. >>



Thats not what he needs it for.


But for his needs, he does not need ECC. ECC doesnt have fault tolerence, just fixes random flips of bits (happens every 1 day to 1 year depending on who you ask).
 

Caiwyn

Member
May 19, 2000
82
0
0
Well, I checked out Crucial's site... I understand that ECC checks to make sure that data has been written to the memory module correctly, and corrects some errors but only detects others. My question now is, if there -is- an error and you don't have error checking, will you experience a crash in either your application or the entire system, due to memory error? Or does it just pass along the errors in that data to wherever it's going? I can handle possible crash, but I'm not keen on my data being less-than-perfect. This is an audio recording workstation, after all.
 

TunaBoo

Diamond Member
May 6, 2001
3,280
0
0
It could possible do either. I would say tho 99% would be a crash, and 1% data corruption.

You are never 100% safe. But with normal SDRAM you are pretty darn safe.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Do you need ECC? This is an interesting question that can best be answered by answering the questions &quot;how important is the data on this machine?&quot; and &quot;how important is the uptime on this machine?&quot; If the answer to either of these is &quot;very important&quot; then you need ECC. If the answer to either is &quot;pretty important&quot;, then play off the cost of an ECC-capable motherboard and ECC memory against the cost of data corruption and/or downtime.

It most certainly is not useless. Studies have shown that soft-error induced bit flips at sea-level occur approximately once in 3 months of power-on time per 256MB. Some studies show this number to be higher (Micron), some studies show it to be lower (IBM), but this an approximate guess. When an SER event occurs, it could corrupt data, or cause instability in the system, or it in the more unlikely event it could cause system failure requiring a OS rebuild. Whether or not you need it depends on how bad these events would be on you use of the system.



<< It is actually slower than non-ECC memory. >>

This is true but only by a very small amount. ECC takes a performance hit only on read-after-modified-write situations where a new value is written back to memory and is read again within a cycle or two. In which case there will be a small latency hit. In most real-world cases (including games) this performance hit is less than 1%.



<< ECC isn't any fullproof error checking either. >>

No, but to bypass it you need two bit flips in the ECC memory line (64-bits on SDRAM DIMMs). This is fairly unlikely - especially if the chipset does a memory scrub on an ECC detect. ECC is not fullproof, but it is pretty good for everything except mission-critical situations in which case you may want to consider some additional form of redundancy.



<< ECC doesnt have fault tolerence, just fixes random flips of bits. >>

ECC is fault tolerant. It fixes single-bit memory errors in a memory line on the fly.



<< It could possible do either. I would say tho 99% would be a crash, and 1% data corruption. >>

If you look at the typical instruction/data mix on memory reads and writes, it is much closer to 50%/50% rather than 1%/99%, but it is clearly closely related to application. In this case, where they are memory intensive applications, it is more likely to cause data corruption.

The common that answer that I get after my posts about ECC is: &quot;pm, if ECC is so important then why don't all PC's use it&quot; and the answer is cost. Consumer PC's are a cut-throat low-margin business and every penny counts. ECC is additional cost for a gain that is unrecognizable to most consumers. The average consumer won't leave their computer on 24/7 and for them loss of data or downtime on the system is an annoyance but simply is &quot;just Windows doing it's thing.&quot;.

For commercial applications, I recommend ECC since usually downtime is money, data corruption can be a disaster and the cost of ECC is very minor.

Patrick Mahoney
IPF Microprocessor Design
Intel Corp.
 

TravisBickle

Platinum Member
Dec 3, 2000
2,037
0
0
pm I don't know why you bother most of these people would turn off their cache ECC and fire alarms if they thought it would help to get 2 fps more in QIII to brag about.
did someone mention something about the stability of most systems today? yeah, right..
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Travis, I think that ECC is poorly understood, and I have a personal interest in it because it looks my next project will be running simulations to check for SER rates on various memory structures for future Intel CPUs.
 

Caiwyn

Member
May 19, 2000
82
0
0
Wow, PM... thanks for the info. My next question is:

How minor can the data corruption be? It's one thing to lose a file to data corruption. Believe it or not, that's an acceptable problem. It's another to have static in the file during the recording process (again, this is an audio recording workstation). Will ECC ensure a cleaner file, or will it simply ensure that the file isn't completely trashed?
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
It's hard to say what will happen. It really depends on the file format used - whether or not there are checksums on the file and how it's decoded. Most likely if there are no checksums and the decoding method is simply a stream, then it will cause sound corruption not file corruption. But if there is some form of error checking (like checksums) built into the format then it would cause an invalid file. Since most sound formats are built around small size, I would say that the former is more likely than the latter.

I guess my main point is that a ECC-enabled motherboard is not that much more expensive, and ECC memory is only a very small increase (if we are talking high-quality memory such as Crucial), so we are talking maybe 5% more system cost (probably less) to save from potential problems. It's an insurance policy. Whether or not the insurance is worth it depends on how important the thing you are insuring (uptime and data integrity) is to you.
 

Caiwyn

Member
May 19, 2000
82
0
0
Oh, don't get me wrong... I would love to have ECC if I could get it. But I'm running a P3 1gHz chip, and the only chipsets that support PC133 SDRAM with ECC are VIA boards... I'm not fond of VIA chipsets - my experience with Intel's chipsets as far as stability goes is far better. The 820 chipset supports ECC RDRAM, and I'm not averse to buying RDRAM if I have to, but apparently the 820 chipset doesn't support ATA/100, and that's more important for what I'm doing (believe it or not, it makes a big difference when you're recording a five-minute track while playing back 20 others mixed together, which is standard recording procedure).

Now, I do happen to be recording in standard PCM WAV format. If you have any idea what the possibilities are for sound corruption using such formats, I'd really appreciate it.
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
IF you happened to get a data error in the middle of a PCM audio stream, you're just going to mess up one of the samples (16-bit samples at 44kHz), so we're not talking major here. If the bit happened to be the most significant bit of the sample, then you could end up sending the level +/- 32,768 away from what it should have been. If the error is at the least significant bit side, it could only cause a +/- 1 level alteration. And note that this will affect ONE of the 44,000 samples that occurred in a second.

I'm not sure what header information is stored with PCM audio streams, but if that happened to get corrupted, then the problem would be significantly worse; but the header is small relative to the size of the audio stream.

I have ECC memory in my system because I do scientific computing calculations and cache simulations in an academic environment, and I don't want ANYTHING corrupting our data. But I have to admit that the likelihood of a random bit error occurring on these chips is pretty rare.

IBM has done years of studies on these &quot;soft errors&quot; which you can read about here. Basically, radioactivity and cosmic rays are the two culprits that cause these random errors to occur. Cosmic rays are bombarding earth from space and can penetrate multiple story buildings. Higher altitudes are more susceptible to cosmic ray induced errors than altitudes near sea level. In all cases, with appropriate shielding (and we're talking LOTS of shielding), the error rate could be reduced to near zero. But regardless, IBM found that as chip densities increase, error rates have gone up. I can't remember where I read this, but one estimate suggested that at higher altitudes a typical 128 MB SDRAM chip might see 1 or 2 one-bit errors a month; it's just blind luck whether the BIT actually ends up affecting anything important. At lower altitudes and in buildings with extensive shielding, the rate drops to around 1 error a year.

ECC memory can fix these single bit errors on the fly without a hiccup. In addition, it can catch most multi-bit errors and let the operating system know that an error has occurred (assuming you're running a professional OS such as Linux or Windows NT/2k).

It is true that ECC memory is slightly slower; most people have suggested that memory performance takes a 1-2% hit.

As for motherboard support, that's a tough one. Because few people understand ECC and many more are simply not interested when they hear about the small performance hit, some motherboard manufacturers no longer implement ECC support in their BIOSes, even if the chipset has an ECC memory controller. For instance, Via's Kt133 chipset DOES have an ECC memory controller onboard, but the Iwill KK series boards do not support ECC SDRAM. MSI's K7T-Turbo uses the same chipset and DOES support ECC memory.

As for stability, that's a different concern. I've never had a problem with my Via Apollo Pro 133a-based board (made by Gigabyte) under Win2k using 512 MB of CAS2 ECC PC133 Crucial SDRAM. YMMV, of course.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Nice link, kylef. I'd read it before, but it's nice to brush up on it.

FWIW, I live in Colorado at ~ 1 mile above sea level, so my probability is 4x that of the crowd at sea level. At 2 miles above, the probability jumps to 10x (due to reduced air pressure resulting in fewer 'air' atoms resulting in less shielding).

I have noticed that it appears that IBM's papers are seem more pessimistic on error rates... (meaning they make things sound bad) Micron's are the other way around. I'm personally convinced that it's because IBM is trying to sell you their &quot;Chipkill&quot; solution which is a step beyond ECC... but maybe I'm just being cynical and reading too much between the lines.

As far as performance, I agree with the 1-2% number - although I've seen programs that run in the cache that end up with essentially no degradation, so you could say 0-2%. 1-2% is so insignificant though. Upgrading from NT4.0 to Windows 2000 will do worse to your system's performance.

And as far as stability, I concur on the Via 133a chipset. I have the Asus P3V4X (w/ 256MB of Crucial ECC) and it runs stable. I reboot rarely (once or twice a month) and I use it constantly. No complaints with the 133A here either.
 

Sukhoi

Elite Member
Dec 5, 1999
15,313
88
91
pm, did you move? I thought you lived in California. Very informative post too, I'm glad you came back to the forums.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
I moved back in 1997...? I used to live in SiliValley, but gave up and headed for more normal locales - even if the ambient radiation level is 4x higher. Thanks!
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |