Intel processors crashing Unreal engine games (and others)

Page 46 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,462
24,160
146
I suspect they have come to grips with the fact there is no preventing the avalanche of RMAs. And what is going to happen to flaky CPUs when the voltage is lowered by the MC patch? These questions and more, next on 60 minutes.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,462
24,160
146

RnR_au

Platinum Member
Jun 6, 2021
2,006
4,901
106
Maybe intel laid off too many QA staff and can't even catch something basic like this for 2 years
Don't laugh...

Let me set the scene: It's late in 2013. Intel is frantic about losing the mobile CPU wars to ARM. Meetings with all the validation groups. Head honcho in charge of Validation says something to the effect of: "We need to move faster. Validation at Intel is taking much longer than it does for our competition. We need to do whatever we can to reduce those times... we can't live forever in the shadow of the early 90's FDIV bug, we need to move on. Our competition is moving much faster than we are" - I'm paraphrasing. Many of the engineers in the room could remember the FDIV bug and the ensuing problems caused for Intel 20 years prior. Many of us were aghast that someone highly placed would suggest we needed to cut corners in validation - that wasn't explicitly said, of course, but that was the implicit message. That meeting there in late 2013 signaled a sea change at Intel to many of us who were there. And it didn't seem like it was going to be a good kind of sea change. Some of us chose to get out while the getting was good. As someone who worked in an Intel Validation group for SOCs until mid-2014 or so I can tell you, yes, you will see more CPU bugs from Intel than you have in the past from the post-FDIV-bug era until recently.

As a former Intel employee this aligns closely with my experience. I didn't work in validation (actually joined as part of Altera) but velocity is an absolute buzzword and the senior management's approach to complex challenges is sheer panic. Slips in schedules are not tolerated at all - so problems in validation are an existential threat, your project can easily just be canned. Also, because of the size of the company the ways in which quality and completeness are 'acheived' is hugely bureaucratic and rarely reflect true engineering fundamentals. Intel's biggest challenge is simply that it's not 'winning big' at the moment and rather than strong leadership and focus the company just jumps from fad to fad failing at each (VR is dead, long live automotive).

From https://news.ycombinator.com/item?id=16058920
 

Jan Olšan

Senior member
Jan 12, 2017
396
680
136
Intel found the root cause.



Not that clear, there are some doubts if this is really the ultimate solution. The Intel statement is costructed such that it can be referring only to part of the issues present. So it's possible it's a repeat of the erratum where incorrect eTVB behaviour is getting fixed. People also assumed that's the root cause and solution, but Intel had to clarify that it is only a contributing condition, not the root cause.

 

RnR_au

Platinum Member
Jun 6, 2021
2,006
4,901
106
The current voltage adjustment microcode is permanent according to the bios notes...



Will be interesting to see how this impacts performance and whether or not the Intel OC crowd wants anything to do with this bios and the one coming in August. Could be very good news for Zen 5 reviews though!
 
Last edited:

Det0x

Golden Member
Sep 11, 2014
1,216
3,789
136
The current voltage adjustment microcode is permanent according to the bios notes...

Will be interesting to see how this impacts performance and whether or not the Intel OC crowd wants anything to do with this bios and the one coming in August. Could be very good news for Zen 5 reviews though!
Would you like to update your comment below @zir_blazer ?

Just saw this on the hwbot discord


Latest rumors also kinda points to its the ringbus on raptor lake thats failing, something which would also explain that its a random core (P/E) and/or memory controller etc the error log points to..

What the hell does he means with "Microcode can't be undone"? Microcode is uploaded to Processor by Firmware about POST time (Nowadays can also be done by early OS boot) and goes in a sort of volatile patch RAM. It is not some kind of internal NVRAM.
UNLESS Microcode is blowing some kind of eFuses...

What I do recall is some kind of bug involving Intel XTU tool where changing the Uncore/Cache was sorta permanent and required a full Firmware reflash, since settings are stored in NVRAM but the Firmware itself doesn't clear whatever setting XTU was changing. There were multiple reports about that one, and I have memories of having personally encountering it.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,463
3,347
106
Paul Alcorn take here. He says the microcode is the root cause:

"Intel's advisory says an erroneous CPU microcode is the root cause of the incessant instability issues...."

He also says the damaged chips stay damaged:

"The bug causes irreversible degradation of the impacted processors. We're told that the microcode patch will not repair processors already experiencing crashes, but it is expected to prevent issues on processors that aren't currently impacted by the issue. For now, it is unclear if CPUs exposed to excessive voltage have suffered from invisible degradation or damage that hasn't resulted in crashes yet but could lead to errors or crashes in the future.

Intel advises all customers having issues to seek help from its customer support. Because the microcode update will not repair impacted processors, the company will continue to replace them. Intel has pledged to grant RMAs to all impacted customers."

 

GTracing

Member
Aug 6, 2021
78
192
76
In that Tom's article, Intel says there will be no drop in performance with the August microcode.

We're told that the microcode patch currently doesn't exhibit any adverse performance impact (i.e., the chip running slower), but testing is ongoing. We can expect Intel to share more information about performance in the future.
 

zir_blazer

Golden Member
Jun 6, 2013
1,191
483
136
Would you like to update your comment below @zir_blazer ?
BIOS and ME (Management Engine) regions on the SPI Flash EEPROM can be technically updated/downgraded separately. Your worst case scenario is that ME regions are locked and it doesn't allow you to downgrade via Software (You need Intel tools to unlock it to update, flashrom can't write to it if it is protected), forcing you to use an external reprogrammer like a CH341A to downgrade.
MSI FlashBIOS on the boards I know about (MSI PRO Z690-A, Z790-P) flashes the ME too, so you can USB recovery to downgrade. Heck, Z690-A didn't even had protected ME region and you could do it via Software.
Again, it is not some eFuse level thing that permanently modifies the Processor like it happened with consoles.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,491
5,054
96
If it's a bug with the core voltage going out of spec, wouldn't one of the testing guys say "Hey, this pin is spiking to 1.6V, that doesn't seem right"?
That's not the issue.
The stock (which is what, 1.4v 1t?) voltage isn't particularly safe to begin with.
Guess Intel will have to learn the sacred art of Clock Stretching.
 

GTracing

Member
Aug 6, 2021
78
192
76
That's not the issue.
The stock (which is what, 1.4v 1t?) voltage isn't particularly safe to begin with.
Guess Intel will have to learn the sacred art of Clock Stretching.
If that's the case, then why does raptor lake still degrade and crash when underclocked and undervolted below alder lake? And why is Intel claiming that there'll be no loss in performance?
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,462
24,160
146

Sounds like the current available microcode drops performance by about 4%. Should be interesting to see what the August microcode will do performance wise.
"This situation has not been awesome?" That's the understatement of the year. Having the asterisk flash by for the 300Mhz downclock that says it isn't meant as a permanent fix. Telling you to install Nvidia drivers 10 times in a row to see if you are affected.

 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,462
24,160
146
Heh yeah I lol'ed at that part too. Intel needs to release a detection tool.
I don't even need to shoot the messenger, comments are roasting this shill. I was trying to remember why I did not sub to his channel last year, then I saw this and was like "oh yeah, that's right." 🤣
 
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,054
15,195
136
If that's the case, then why does raptor lake still degrade and crash when underclocked and undervolted below alder lake? And why is Intel claiming that there'll be no loss in performance?
I will believe anybody but Intel after this. When EVERYBODY agrees its fixed, and a few websites have confirmed no performance loss after benchmarked "fixed" CPUs then I will agree its fixed, and agree with the benchmarks.

For now its still the same thing, "we think we know", but we will see.

Edit: and adroc is right, the chip is already cooked.
 

PJVol

Senior member
May 25, 2020
696
618
136
I don't understand people like that. He's gonna continue using his degraded CPU by limiting it instead of asking for refund or replacement
There are various types of FET degradation. IIRC, things like HCI or xBTI causes "ageing" or increasing of Vth and this may be mitigated to some extent by the FW or microcode. On the contrary, EM ultimately leads to circuit failure.
 
Last edited:

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,462
24,160
146
Maybe they got one with the oxidation issue? Or maybe there’s more going on. We’ll have to wait for the update to drop and lots of testing to be done.
My gut brain says when the problems reach mainstream awareness, they tell you there is a microcode patch coming that helps, in an attempt to calm things down. They lean on their vast propaganda machine to disseminate it, then go back to silent running. Always fighting a delaying action and buying more time.

I am stoked to read that our crowd is at the midway point of a Scooby Doo episode, and knows the mystery has not been solved yet. The red flags are too many to ignore. The hypothesis floating around for weeks that this is to mute the damage during the Zen 5 launch rings true with me tactically speaking. What they did not factor in, is GN, HUB, Kit Guru, and maybe even LTT, are going to make certain their audience knows there is a HUGE asterisk attached to Raptor Lake. That is not going to help Arrow Lake because we have already seen the negative halo effect and it is spreading like a nuclear mushroom cloud.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |