Conroe bug can cause corruption

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mucker

Platinum Member
Apr 28, 2001
2,833
0
0
Originally posted by: phile
Most of this goes over my head, so I prefer to consider this problem in terms of how it will affect my use of the product. All I know is that I've been pounding a E6300 and E6600 for 3 weeks without so much as a hiccup. I've been playing games, using office apps, photoshop, video encoding, benchmarking, etc., and haven't noticed any peculiarities. How could a major problem go unnoticed for several weeks of intense use? I'll remain concerned, but unworried.

-phil

:thumbsup:

Errata has always been with us. I am sure intel will make good if prob's arise......

 

ZOXXO

Golden Member
Feb 1, 2003
1,281
0
76
Originally posted by: Viditor
Originally posted by: Henny
Well guess what. Athlon 64 currently has 249 errata items and many of them can also cause data corruption:

AMD Errata

Errata IS, HAS, and ALWAYS will be part of any silicon design.

Comment: 249 is not A64, that's all AMD lines.

Question: which ones can cause corrupted data?

Search the pdf document for corruption. You'll get 13 instances.
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
What makes that any worse than one like this?:

(E)CX May Get Incorrectly Updated When Performing Fast String REP
AI30 X X Plan Fix
MOVS or Fast String REP STOS With Large Data Structures

This sounds like it could cause data corruption too.
 

Conky

Lifer
May 9, 2001
10,709
0
0
Originally posted by: Henny

Well guess what. Athlon 64 currently has 249 errata items and many of them can also cause data corruption:

AMD Errata

Errata IS, HAS, and ALWAYS will be part of any silicon design.
Darn it, and I thought we found Achille's heel! :laugh:


P.S. Anyone else find it humorous that not only one or two but three AMD ads were found on that the "theinquirer" link? The "theinquirer.net" has always been a dubious source of info on computers... sorta the aljazeera of computer reporting.

 

Mucker

Platinum Member
Apr 28, 2001
2,833
0
0
Originally posted by: Beachboy
Originally posted by: Henny

Well guess what. Athlon 64 currently has 249 errata items and many of them can also cause data corruption:

AMD Errata

Errata IS, HAS, and ALWAYS will be part of any silicon design.
Darn it, and I thought we found Achille's heel! :laugh:


P.S. Anyone else find it humorous that not only one or two but three AMD ads were found on that the "theinquirer" link? The "theinquirer.net" has always been a dubious source of info on computers... sorta the aljazeera of computer reporting.

Did you click on the link to the intel pdf in the article? TI just conveying the message....nothing really dubious about it.....
 

Pabster

Lifer
Apr 15, 2001
16,987
1
0
Yes, Yes, Serious Error! Time to dump all our Conroes!

And there's never been errata in an AMD processor.

This seems like fanboy drivel.
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
sheesh, talk about an overreaction based on absolute ignorance of the actual bug signature. if only caches were as simple as that example given.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
So I guess the sky is falling down yet again? All the benchmarks and all the reviews we have seen of Core 2 Duo for the past 1/2 a year did not seem to mention any sort of data corruption. Problems with mobo BIOS's that support Core 2 Duo? Yes. Corruption? No.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
39 Errata for a NGMA CPU is quite amazingly good.

But yes, this is sensationalist. The bug is quite hard if not impossible to replicate by consumers. People who have no idea wtf they're talking about shouldnt be spilling their BS. I find it so ironic the people posting the BS are posting from A64's with 100+ errata's.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,743
14,775
136
Originally posted by: Pabster
Originally posted by: dexvx
I find it so ironic the people posting the BS are posting from A64's with 100+ errata's.

Ditto

Not that I don't agree with you, and its a nit, but as said, the 259 was for ALL AMD processors, not Athlon 64's. the 39 is just Conroe, right ?

But yes, sounds like an over reaction to me.
 

koitsu

Member
Feb 13, 2004
69
0
76
I just finished reading through the following two specifications from Intel, as well as other information:

http://www.intel.com/design/processor/datashts/313278.htm
http://www.intel.com/design/processor/designex/313685.htm

There's no mention of the two cores being able to share L1 caches. This simply does not happen in present-day architecture. The only "sharing" of L1 cache I know if is the fact that each individual CPUs' L1 cache is split into instructional and data portions. That's something completely unrelated though.

This discussion reminds me of a problem I pondered back when working on a project that required two parallel PIC chips: cache coherency. I was introduced to the term "dirty caching" -- where one CPU has different data in its L1 or L2 cache compared to another CPU. On older CPUs, the solution was simple: one processor would spinlock (equivalent of a mutex lock in threaded applications), while the other would write its data **back to system memory**, release its lock, and let the other processor pick the modified data back up. The performance hit? Major. The lock itself wastes cycles, and the actual read-to-cache-then-write-back-to-memory operation wastes cycles as well. About locks:

http://en.wikipedia.org/wiki/Spinlock

This is why MESI was created. MESI is basically a processor protocol that allows multiple CPUs to inform one another of what sort-of state data is in across each cache line. Rather than explain it, read about it:

http://en.wikipedia.org/wiki/MESI_protocol
http://www.tecchannel.de/ueberblick/archiv/402092/index4.html

A form of MESI was implemented by Intel with the PPro. There's a brief mention of it here on Intel's site:

http://www.intel.com/support/processors/pentiumpro/sb/CS-011163.htm

I had to do some digging to find out whether or not since the PPro if Intel kept using MESI -- and they do:

http://www.aceshardware.com/Spades/read.php?article_id=30000187

I know for a fact AMD CPUs support MESI, at least since the Athlon series (maybe earlier, not sure!)

MESI is probably what's causing this bug not to happen. But then I am wondering why Intel is even bothering to mention the problem as "data corruption", when with MESI, this shouldn't happen...

My impression of the Errata document is that there's something else going on here, or that Intel knows of a particular situation where MESI is ignored and a cache fetch results in dirty data.

Someone really needs to contact Intel and find out what exactly they mean by their little Errata snippet.
 

koitsu

Member
Feb 13, 2004
69
0
76
@fbrdphreak

Please read my previous post. Also, I'm not sure if you're familiar with actual processor cache misses -- by your description of your experience, you're describing memory page hit/misses, which are not the same thing as CPU cache hits/misses.

If you want to really analyse what effect an application has on the CPU itself, you need to spend some time looking at PMCs (performance monitoring counters) that are implemented *in the CPU themselves*. The first Intel CPU to have these was the PPro. PMCs are documented (in Intel CPUs) here; see Chapter 18.11:

ftp://download.intel.com/design/Pentium4/manuals/25366920.pdf

I'm going to keep reading more about the Core 2 Duo implementations via the IA-32 docs from Intel. Lots to read (lots to ignore too), but you might find it interesting depending upon how low-level of a programmer you are. Don't let the URL fool you:

http://www.intel.com/design/pentium4/manuals/index_new.htm

Have fun. I'll post more later, once I've had a chance to read more...
 

Some1ne

Senior member
Apr 21, 2005
862
0
0
Intel states in the Errata that they plan on releasing a fix for this (in English: a newer stepping (version) of the CPU should resolve the problem). For those who already own CPUs, you *might* be able to get a replacement CPU with a later stepping.

So is there a form to sign up for these replacements, or are they all made up? My chip seems to be fine, though if given a chance to swap it out for one that might overclock even better, free of charge, I certainly would. Especially if I could keep the old one too...why would Intel want it back anyways, "defective" as it is? Bet ebay wouldn't complain about taking it, though.
 

fbrdphreak

Lifer
Apr 17, 2004
17,556
1
0
Originally posted by: koitsu
@fbrdphreak

Please read my previous post. Also, I'm not sure if you're familiar with actual processor cache misses -- by your description of your experience, you're describing memory page hit/misses, which are not the same thing as CPU cache hits/misses.

If you want to really analyse what effect an application has on the CPU itself, you need to spend some time looking at PMCs (performance monitoring counters) that are implemented *in the CPU themselves*. The first Intel CPU to have these was the PPro. PMCs are documented (in Intel CPUs) here; see Chapter 18.11:

<a target=_blank class=ftalternatingbarlinklarge href="ftp://download.intel.com/design/Pentium4/manuals/25366920.pdf"><a target=_blank class=ftalternatingbarlinklarge href="ftp://download.intel.com/design/Pentium4/manuals/25366920.pdf">ftp://download.intel.com/design/Pentium4/manuals/25366920.pdf</a></a>

I'm going to keep reading more about the Core 2 Duo implementations via the IA-32 docs from Intel. Lots to read (lots to ignore too), but you might find it interesting depending upon how low-level of a programmer you are. Don't let the URL fool you:

http://www.intel.com/design/pentium4/manuals/index_new.htm

Have fun. I'll post more later, once I've had a chance to read more...
I'm familiar with cache misses. I got to write a branch predictor in Java (by my choice, I know more Java than C, LOL). I took this class along with getting my Computer Engineering degree from NCSU. I don't claim to be an expert, but I did test several dozen cache configurations (varying L1/L2 sizes) using several different cache activity simulations to measure the difference cache sizes had on hits/misses & subsequent performance. I also studied pretty deeply in general CPU pipelines & cache execution.

To clarify, I never said the L1's are shared. The errata implied (from how I understood it) that C1 could somehow reference C2-L1, and I have no idea how/why. In retrospect, it might mean that when C1-L1 misses, it goes to the shared L2. At that point, if it is retrieving data that is being modified by C2, there is the potential for corruption. Maybe that's what they mean, I'm not entirely sure.

I'd love to read documents like that, I find the truly nitty-gritty, technical details of microprocessors very interesting. Unfortunately with my little project, two part time jobs, final class to get my EE degree, job hunting, and relationships/family, I have literally negative amounts of free time. :frown:

I do love a good discussion tho
 

koitsu

Member
Feb 13, 2004
69
0
76
Originally posted by: Some1ne
So is there a form to sign up for these replacements, or are they all made up? My chip seems to be fine, though if given a chance to swap it out for one that might overclock even better, free of charge, I certainly would. Especially if I could keep the old one too...why would Intel want it back anyways, "defective" as it is? Bet ebay wouldn't complain about taking it, though.

Intel should be able to give you a CPU with a different stepping, assuming you provide some sort-of proof that you're being affected by one of the flaws documented in the Errata. (Meaning, if you had a B1, you could probably get a B3 assuming that addressed the issue you were encountering).

But don't hold me to that statement! I'm going purely off of my experiences with owning Pentium CPUs in the days of the FDIV bug (which Intel honoured replacements for) and the 0xF00F bug (which I believe Intel would replace assuming you asked, but by that time most operating systems had implemented patched).

That said, I'll remind people that since the Pentium 2 (possibly earlier), the CPUs are basically just "emulators" running microcode that translates x86 instructions into whatever the manufacturer decides is correct. You can actually update the microcode inside of the CPU. Yes, you read that right -- you can actually change the functionality/behaviour of the opcodes in the CPU by updating the code that's in it. Here's valid proof of this, running under Linux:

http://www.urbanmyth.org/microcode/

Now, that said, there's something I need to state about this: microcode updates do not get permanently stored on the CPU. There is no "flash" chip or EEPROM or anything like that -- the updates disappear once the CPU is reset in some way (power-cycling the system is guaranteed to handle it ). You have to load them every time you want a newer revision. If you want something that's permanent, you have to get an updated processor from the manufacturer.

That said: a friend of mine took the time to read the thread here and point out one shortcoming of my statements: Intel claims that this bug can be worked around with a BIOS upgrade. I initially stated I didn't see how this was possible. I was downright wrong (and I've no shame admitting that). My friend reminded me that CPU microcode updates can actually be done via the BIOS. Your machine will boot the BIOS, and during the POST phase, will actually spend some time applying microcode changes to work around such issues.

So, if this is indeed a serious problem people are encountering, BIOS manufacturers can provide a BIOS update that can provide a fix. That should be a relief for anyone who's worrying about it (including me). Heck, maybe this is what some motherboard manufacturers have already done since the Core 2 Duo release. Pure speculation on my part -- who knows.

So, as for my own concern -- am I still going to refuse to get a Core 2 Duo until this is fixed? NOPE! I'm going to purchase one once the 6600s are in stock at the sites I purchase hardware from. There have been numerous people in this thread who have posted 100% success with the Core 2 Duos under heavy load when using threaded applications, and I would expect this problem to surface almost immediately under such conditions -- but it doesn't, so it must indeed be a rare sort-of thing.

Some other flaws in the Errata do also appear major (someone mentioned buggy REP STOSB/STOSW support when fast strings are enabled, for example)... but like some of the processor fanboys here on the forum mentioned, AMD has its share of issues too. I don't want to sound like I'm being anti-Intel or even pro-AMD. I'm fair about it. (Heck, I'm a lot more concerned with NB and SB chipset flaws than I am with CPU bugs).

All in all, this shouldn't stop anyone from buying a Core 2 Duo or Core 2 Extreme. But it is something to keep a close eye on... and I really do hope someone (Anand? ) can get Intel to give a statement about it, or at least get the Errata updated with some clarification.

Remember, it's a good thing that Intel is at least aware of these issues -- companies which aren't aware of problems usually deny they exist... then some smart-ass mathematician comes along and says "What's with this broken rounding going on?!", and before you know it, you're having to re-release a product you thought was "rock solid". ;-)
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
augh what a crapflood.

first off, intel would not have let merom out the door if the common case invalidate and flush doesn't work. the key word in the errata document is "certain conditions". in validation terms, that can be translated as "****** corner case". silicon escapes happen, but with a year of si debug, there shouldn't be any crazy showstoppers.

also, ucode is a ROM, so it is permanent. it is programmed into the silicon right before tapeout. there are ways to insert flows (so-called millicode), but core ucode is permanent. workaround for bugs usually involve a MSR toggle in software.

freaking out over errata seems to be an inq trademark... they did the same with yonah. what a bunch of turds.
 

Conky

Lifer
May 9, 2001
10,709
0
0
Originally posted by: Some1ne
Intel states in the Errata that they plan on releasing a fix for this (in English: a newer stepping (version) of the CPU should resolve the problem). For those who already own CPUs, you *might* be able to get a replacement CPU with a later stepping.

So is there a form to sign up for these replacements, or are they all made up? My chip seems to be fine, though if given a chance to swap it out for one that might overclock even better, free of charge, I certainly would. Especially if I could keep the old one too...why would Intel want it back anyways, "defective" as it is? Bet ebay wouldn't complain about taking it, though.

Stop being a tard. As has been pointed out repeatedly in this thread errata is part of computer parts.

I just ordered a new E6400 and I'm not sweating even a little bit.
 

koitsu

Member
Feb 13, 2004
69
0
76
Originally posted by: galvelan
Heres your update.... And for those that must wait for the new stepping guess you got it. Agree with others though, it is probably still not a big deal or we would have heard about some kind of problem by now

http://www.theinquirer.net/default.aspx?article=33942

Good news indeed. I think I'll wait a few weeks for this stepping to be released before making my CPU purchase, "just in case".

A good thread in general, guys. Big thanks to galvelan for bringing this issue to light in the first place (at least from a forum perspective). Always good to shed light on stuff like this...
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |