Intel processors crashing Unreal engine games (and others)

Ranulf · Jul 17, 2024

Framechasers has been a big Intel fan for the past few years, sells OC'd intel chips/settings advice for people to max out fps in cod etc..

Just the title screen is a bit... explosive. Intel's fault but also tech-tubers fault. i9's should be bought by enthusiasts only, no warranty from intel. Take your chances I guess and sub to the guy to get fixes to run the chips the best.

DAPUNISHER · Jul 17, 2024

coercitiv said:
To understand the scale, Wendel looked closely at 210 data center machines with 13th/14th CPUs in a population of ~2800 machines to which he had access to broader statistics and info from the data centers.

That's an excellent listen, well worth the time. Covers it A-Z, even the difficulties due to survivorship bias and odd behavior like blackscreens mucking things up. Wendell talks about how his testing left him with no clear answers. He thinks it is a crap sandwich* of multiple factors too (* my words)

One of the hosts of the podcast had a bad time with raptor. He was EAC banned because the errors triggered it. Everyone agreed the lack of clear messaging or promise to make things right is the biggest problem. Like Wendell said, if there's a problem, there's a problem. They happen. But what they do about it is more important.

DAPUNISHER · Jul 17, 2024

EDIT: Back to our regularly scheduled program.

Adding a little humor and my own anecdote - There are over 12 million members of PCMR. This is the title of one of the latest threads on the topic.

Awareness is growing rapidly, no putting the cat back in the bag.

cusideabelincoln · Jul 17, 2024

Ranulf said:
Framechasers has been a big Intel fan for the past few years, sells OC'd intel chips/settings advice for people to max out fps in cod etc..

Just the title screen is a bit... explosive. Intel's fault but also tech-tubers fault. i9's should be bought by enthusiasts only, no warranty from intel. Take your chances I guess and sub to the guy to get fixes to run the chips the best.

This guy has the worst takes and contradicts himself so much and I've had him ignored from my feeds for years now. He uses blame and scare tactics, questioning the validity of data from other people while refusing to provide any data himself... unless you pay for it. Don't pay for it; he sneaks in his "fix" towards the end of the video and claims none of his CPUs have died (not that I would believe his numbers). Ironically while he's blaming techtubers and calling them wrong, the tuned CPUs he sells have the boosting turned off thus lowering max voltage and this is exactly the currently suspected reason that techtubers are saying these CPUs die with anyway. Dude calls other people wrong and then says convoluted BS that actually agrees with them.

It's also laughable he's blaming techtubers for pushing Cinebench scores and citing Gamers Nexus as a prime example... when they don't even publish Cinebench scores in their reviews.

NTMBK · Jul 17, 2024

Timur Born said:
If this were a "UE engine" only problem then I would point to the engine instead of the CPU. But I am always open to put in the work for science, so despite the optics being revolting I will now install The First Descendant.

Uh huh, for science

branch_suggestion · Jul 17, 2024

Ranulf said:
Framechasers has been a big Intel fan for the past few years, sells OC'd intel chips/settings advice for people to max out fps in cod etc..

Just the title screen is a bit... explosive. Intel's fault but also tech-tubers fault. i9's should be bought by enthusiasts only, no warranty from intel. Take your chances I guess and sub to the guy to get fixes to run the chips the best.

Ignore grifters.

Thunder 57 · Jul 17, 2024

branch_suggestion said:
Ignore grifters.

With stupid shocked faced Youtube pictures.

Timur Born · Jul 18, 2024

NTMBK said:
Uh huh, for science

Yeah, way to demonstrate why I wrote "despite the optics being revolting". Thanks for underlining the point.

DrMrLordX · Jul 18, 2024

gdansk said:
I suspect the Supermicro W680 boards, personally, for the "data center" game servers being unreliable.
Because I have one such board, running with two different 12600K at DDR5-3600 ECC. I've had issues to the point I am submitting RMA for the board if it shuts off randomly again. I'm pretty sure the CPU isn't the problem because like I said I tried two different 12600Ks. And neither showed issues in a cheap ASRock Z690 DDR4 board but they both shutdown randomly when in the Supermicro W680 board.

Just my experience.

And I don't think it is related to the client crashes - where it does seem to be CPU issues.

The same game server datacenter that collaborated with Wendel reported far fewer problems with the Alder Lake-S CPUs they used in what were probably W680 motherboards. It doesn't seem like the boards were at fault.

coercitiv · Jul 18, 2024

Timur Born said:
Ah, so this is a forum were people don't treat each other with respect and take them seriously?

You were treated with respect and taken seriously, and then you proceeded to mock other people's answers: apparently I'm the guy with "we don't know". If you lack the ability to exercise restraint and show respect to others, don't expect to be respected in return.

I'm not the guy with "we don't know", I'm the guy who took the time to answer you and enumerated a host of theories this thread has gone through, none of which have been proven true, all of which may still be true or play a role. I'm the guy who took the time to explain how degradation can happen at different speeds depending on contributing factors, with a real world anecdotal example. I gave YOU my time out of respect.

Next time you dismiss people because you don't like their answers don't complain about being dismissed in return. I get it, maybe you had a long/hard day at work, maybe something one of us wrote rubbed you the wrong way (it happens to all of us). Guess what, maybe I had a hard day too, maybe I did not have the time to write the perfect answer to your satisfaction. I did my best though.

I come here to relax and read about my hobby, not to satisfy the whims of others. Over the years I have learned to tone down my sarcasm because I acknowledge other people may also want to just read and relax. You should probably take some time to reflect as well, think about why you tend to mistreat people when you get even the impression you're being mistreated. (you were not mistreated, not initially anyway)

Timur Born · Jul 18, 2024

I got the impression that my original question wasn't even addressed, and my first post on the discussion was rather shut down because people jumped right on the bandwagon of "my CPU" (which was given as *one* example). But I see now that the answer is we don't know (and don't care to discuss it anymore) and the thread is about discussing UE related problems *not* based on the known 13/14th gen instabilities. I understand and thanks for the answer.

On a side-note, if people wonder what makes UE engine so special compared to all other engines I checked yet: It's the only engine that includes AVX offset triggering workload. In Fortnite - which did not crash for me yet, contrary to The First Descendant - it is about 8% of CPU time when the game is 100% CPU bottlenecked.

This does not mean UE is the only game engine using AVX, but it's the only one I found yet using AVX instructions heavy enough to trigger the offset (or rather use the dedicated AVX ratios).

moinmoin · Jul 18, 2024

Timur Born said:
On a side-note, if people wonder what makes UE engine so special compared to all other engines I checked yet: It's the only engine that includes AVX offset triggering workload. In Fortnite - which did not crash for me yet, contrary to The First Descendant - it is about 8% of CPU time when the game is 100% CPU bottlenecked.

Both are UE games though?

Also again the "special" thing about UE is that it includes a tool that actually checks data integrity. So that's why (especially recent) UE games are the best software to check for the crashes this thread is about: They offer the workload necessary to trigger it to begin with as well as catch faulty data that most other software doesn't even check for. So for most other software you may also be able to trigger the issues, but the resulting faulty data will just cause random behaviour, if any at all. That's the issue with tracking unchecked faulty data.

The main issue is that we still don't know what's creating said faulty data, nor what are possible safe guards to prevent that from happening. To me Buildzoid's guess that the ring is being affected by high voltage would explain the most, but that would unfortunately also be the potential cause the hardest to safe guard against. Until when (if at all?) Intel finally publishes their statement they promised months ago we won't know about the true cause and solution.

Timur Born · Jul 18, 2024

And both have not shown any kind of checksum based errors on my CPU sample yet.

RAD/Oodle: When starting an Unreal Engine-based game, the most common failure is of this type:

DecompressShader(): Could not decompress shader (GetShaderCompressionFormat=Oodle)
...
However, this problem does not only affect Oodle, and machines that suffer from this instability will also exhibit failures in standard benchmark and stress test programs. Any programs which heavily use the processor on many threads may cause crashes or unpredictable behavior. There have been crashes seen in RealBench, CineBench, Prime95, Handbrake, Visual Studio, and more. This problem can also show up as a GPU error message, such as spurious "out of video memory" errors, even though it is caused by the CPU.

The First Descendant triggers very particular blue screens for me that take a minute to unfold and drag down the system one part at a time. So the crashes I see in TFD yet are of a different kind to the usual Oodle based reports. It's noteworthy that after every crash the shader compilation process started anew (as did the whole game) and it didn't fail any of the four compilation runs.

Still waiting for the servers to go back online to check after a Clear CMOS + Intel spec settings. Meanwhile that pinup girl image keeps showing with no way to exit it other than ALT-F4. My trust in this game is not high enough to base any stability based conclusions on it, but it seems to be what people are looking at right now.

Timur Born · Jul 18, 2024

moinmoin said:
To me Buildzoid's guess that the ring is being affected by high voltage would explain the most, but that would unfortunately also be the potential cause the hardest to safe guard against. Until when (if at all?) Intel finally publishes their statement they promised months ago we won't know about the true cause and solution.

Which could then again be connected to sample bins = VID range, with the Ring running on Vcore.

Skatterbencher:
Then, it seems there’s a rule where the Core VID must be a minimum of 30 mV higher than the Ring VID.

It may also play a role that there are two differently aggressive Ring ratio scaling algorithms in place in my Gigabyte BIOS. The less aggressive one is used when Ring Min/Max are set to "Auto", which corresponds to 8/50. When I manually set Min/Max to the very same 8/50 the Ring ramps up much more aggressively, stays at higher clocks at higher load and generally uses more voltage accordingly. No idea how other mainboard handle this, but maybe they default to the more aggressive ramp-up!?

IEC · Jul 18, 2024

Timur Born said:
And both have not shown any kind of checksum based errors on my CPU sample yet.

The First Descendant triggers very particular blue screens for me that take a minute to unfold and drag down the system one part at a time. So the crashes I see in TFD yet are of a different kind to the usual Oodle based reports. It's noteworthy that after every crash the shader compilation process started anew (as did the whole game) and it didn't fail any of the four compilation runs.

Still waiting for the servers to go back online to check after a Clear CMOS + Intel spec settings. Meanwhile that pinup girl image keeps showing with no way to exit it other than ALT-F4. My trust in this game is not high enough to base any stability based conclusions on it, but it seems to be what people are looking at right now.

I get zero crashes on TFD on my 7800X3D and my 12600K.

biostud · Jul 18, 2024

IEC said:
I get zero crashes on TFD on my 7800X3D and my 12600K.

Didn't Wendel mention something about texture decompression?

igor_kavinski · Jul 18, 2024

Timur Born said:
No idea how other mainboard handle this, but maybe they default to the more aggressive ramp-up!?

Could you set your BIOS at Unlimited or Extreme, then go into Intel XTU and turn on AI OC and post a screenshot of what settings it applies? Then try running TFD with XTU Auto OC active to see if the bluescreens still happen?

TheELF · Jul 18, 2024

coercitiv said:
Wendel's video talks about 14900K game servers that fail stability tests @ 5.3Ghz with JEDEC spec DDR5. For servers with dual DIMMs per channel they went as low as DDR5 4200 and even disabled E cores (/w single DIMM per channel they used DDR 5200). The combination of lower clocks and slower memory helped the most, but did not completely solve the issue. Updating BIOS also seemed to help, but not enough to make the machines pass all tests.

Yes, he even shows ycruncher running at 5.3Ghz all core with all cores actually running at 5.3Ghz....which is overclocking because you have to let turbo decide how high it should boost any given core at any given time.
And the max DDR5 speed intel allows for 4 modules is 4000Mhz.
On 2DPC mobos the max speed allowed at all is 4400.

So the "fix" is to overclock both the CPU and the RAM, makes you think about what the original settings were.

Hitman928 · Jul 18, 2024

TheELF said:
Yes, he even shows ycruncher running at 5.3Ghz all core with all cores actually running at 5.3Ghz....which is overclocking because you have to let turbo decide how high it should boost any given core at any given time.
And the max DDR5 speed intel allows for 4 modules is 4000Mhz.
On 2DPC mobos the max speed allowed at all is 4400.

So the "fix" is to overclock both the CPU and the RAM, makes you think about what the original settings were.

Where did you see that he's running static all core at 5.3 GHz? That's not what he shows in his video. . .

Edit: You are correct on max memory speed, but Intel also lists max 1DPC memory as 5600 MHz. How many people actually use 5600 MHz memory? How many people do you think would accept that they can only use 5600 MHz memory? Technically that's the fastest officially supported speed, but that's not the capability that platform was sold on (just like AMD buyers wouldn't be happy with being stuck with 5200 MHz memory).

blckgrffn · Jul 18, 2024

Hitman928 said:
Where did you see that he's running static all core at 5.3 GHz? That's not what he shows in his video. . .

Edit: You are correct on max memory speed, but Intel also lists max 1DPC memory as 5600 MHz. How many people actually use 5600 MHz memory? How many people do you think would accept that they can only use 5600 MHz memory? Technically that's the fastest officially supported speed, but that's not the capability that platform was sold on (just like AMD buyers wouldn't be happy with being stuck with 5200 MHz memory).

View attachment 103393

@DAPUNISHER meme of Raptor wondering if Overclocking voids warranties but platforms are sold overclocked out of the box seems rather applicable here.

I mean is it really Intel's fault everyone runs those CPUs way out of spec? How could they have possibly known? Aren't they and their reputation being ruined the real victim here? /s

And another /s just for good measure.

Timur Born · Jul 18, 2024

igor_kavinski said:
Could you set your BIOS at Unlimited or Extreme, then go into Intel XTU and turn on AI OC and post a screenshot of what settings it applies? Then try running TFD with XTU Auto OC active to see if the bluescreens still happen?

Which setting do you mean by "unlimited"? Usually there is only ICCmax with that specific wording, but I wouldn't want to set that higher than 400 A. Bad enough that my CPU hit over 415 A a few times already. Power or TVB temp is no problem, it will just be current or temperature throttled at some point anyway.

I need to leave the house, but I will do a quick run of TFD with Clear CMOS + Intel "normal" spec settings now. 125/253 W, 307 A, TVB temp 70°C, AVX offset stock (-5 I think), AC LL = DC LL = LLC (0.900 mOhm on my GB board), JEDEC memory (4800 MT).

KompuKare · Jul 18, 2024

Well, the poster in 1020 above did mention in the Bartlette thread that Intel can just do nothing and a class action would fail (since everything is overclocked these days).

Basically: copy Nvidia during Nvidia's Bumbgate and let the consumer suffer.

TheELF · Jul 18, 2024

Hitman928 said:
Where did you see that he's running static all core at 5.3 GHz? That's not what he shows in his video. . .

oops, indeed they do clock down.
Still setting an all core multiplier is circumventing normal turbo.
Anyway, the point is that at least some things are overclocked in wendells video.

Hitman928 · Jul 18, 2024

TheELF said:
oops, indeed they do clock down.
Still setting an all core multiplier is circumventing normal turbo.
Anyway, the point is that at least some things are overclocked in wendells video.

He was under clocking though. He didn’t set an all core frequency, he lowered the max boost frequency to 5.3 GHz to try and stabilize the processors.

Edit: the boost algorithm is still on, he just limits it from exceeding 5.3 GHz because going past that caused instability on a large percentage of the tested systems.

Timur Born · Jul 18, 2024

@igor_kavinski
After the Clear CMOS I played through the tutorial once with Intel 307 A specs and once with my usual UV + OC (resetting all game data), both worked without a crash.

The reason I came up with the Clear CMOS at once is because I saw this kind of BSOD before and it seems to be kind of PCIe connected. I don't fully trust the mainboard anymore and the whole thing may have killed my PCIe Creative X-Fi. When the defect X-Fi was insert I could reproduce problems, but I think that XTU's Watchdog hard cold boot crash "reset" - that sometimes even triggers BIOS errors/resets - may also trigger it the (likely PCIe) instability. So the Clear CMOS may have fixed it for now with no connection to my CPU or memory UV/OC at all.

Intel processors crashing Unreal engine games (and others)

Platinum Member

Super Moderator CPU Forum Mod and Elite Member

Super Moderator CPU Forum Mod and Elite Member

Diamond Member

Lifer

Senior member

Platinum Member

Senior member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Elite Member

Lifer

Lifer

Diamond Member

Diamond Member

Diamond Member

Senior member

Golden Member

Diamond Member

Diamond Member

Senior member