Question DEGRADING Raptor lake CPUs

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kocicak

Golden Member
Jan 17, 2019
1,059
1,114
136
I noticed some reports about degrading i9 13900K and KF processors.

I experienced this problem myself, when I ran it at 6 GHz, light load (3 threads of Cinebench), at acceptable temperature and non extreme voltage. After only few minutes it crashed, and then it could not run even at stock setting without bumping the voltage a bit.

I was thinking about the cause for this and I believe the problem is, that people do not appreciate, how high these frequencies are and that the real comfortable frequency limit of these CPUs is probably at something like 5500 or 5600 MHz. These CPUs are made on a same process (possibly improved somehow) on which Alder lake CPUs were made. See the frequencies 12900KS runs at. The frequency improvement of the new process tweak may not be so high as some people presume.

Those 13900K CPUs are probably highly binned to be able to find those which contain some cores which can reliably run at 5800 MHz. Some of the 13900K probably have little/no OC reserve left and pushing them will cause them to degrade/break.

The conclusion for me is that the best you can do to your 13900K or 13900KF is to disable the 5800 MHz peak, which will allow you to offset the voltage lower, and then set all core maximal frequency to some comfortable level, I guess the maximum level could be 5600 MHz. With lowered voltage this frequency should be gentler to the processor than running it at original 5500 MHz at higher voltage. You can also run it at lower frequencies, allowing for even higher voltage drop, but then the CPU is slowly loosing its sense (unless you want some high efficiency CPU intended for heavy multithread loads).

Running it with some power consumption limit dependent on your cooling solution to keep the CPU at sensible temperature will help too for sure.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,443
2,352
136
If this issue is not due to degradation then would some here please elaborate on other defects that would cause some processors to perform normally but others to fail?

Having had both a 13900K that degraded under high auto voltages and a 14900K that has not degraded under manual (sane) voltages, I have never experienced a failure that was not due to too high frequency or too low voltage or a combination of the two.

I stand by my belief that the root of the issue here is Intel publishing specs that are simply too high for the CPU's to reliably attain and/or maintain. They rated them too high.

How many people are keeping their CPUs under 70 degrees as specified by Intel when opportunistically boosting frequency? Or are people forcing high frequencies out-of-spec?

Don't get me wrong, there is some fault for Intel here in that their specs with Turbo 1/2/3, Thermal Velocity Boost, etc.. are confounding.

Mobile are experiencing the same issues but to a lesser extent due to the fact that they operate with a more stringent guard band.

In conclusion, I will be surprised if the cause of this is architectural and not process related.
 
Reactions: lightmanek
Jul 27, 2020
19,380
13,289
146
What is proven is that there are abnormally high failure rates for Raptor Lake CPUs.
I think popular benchmarks like Geekbench/3DMARK/CB r23/24 etc. should all incorporate data integrity checks going forward. CPU failed data consistency check? CPU failed benchmark and gets ZERO score. That should motivate Intel to do their job properly in future.
 

Nothingness

Diamond Member
Jul 3, 2013
3,012
1,940
136
I think popular benchmarks like Geekbench/3DMARK/CB r23/24 etc. should all incorporate data integrity checks going forward. CPU failed data consistency check? CPU failed benchmark and gets ZERO score. That should motivate Intel to do their job properly in future.
SPEC does that: the output of the benchmark is checked against the reference.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,368
23,866
146
There is nothing proved yet.
Do explain then, won't you, why Intel has acknowledged the issues and announced an investigation is ongoing? Or why they pushed a microcode update they say is partly responsible. Or why they just came out of lurker mode to assure everyone that mobile CPUs are not susceptible to the instability issues the other CPUs have.

The only thing not proven yet is WHY they are becoming unstable, not IF they are.
 
Jul 27, 2020
19,380
13,289
146
Having had both a 13900K that degraded under high auto voltages and a 14900K that has not degraded under manual (sane) voltages, I have never experienced a failure that was not due to too high frequency or too low voltage or a combination of the two.
Ahem. Your sig shows you cheaped out on the mobo and went with DDR4. No offense but as @coercitiv wondered out aloud, it could be a combination of factors that is causing the issues and DDR5 might be one of them, maybe coz it pushes the RPL IMC to work harder than it has to for DDR4? So unless you want to get a DDR5 mobo and tempt fate, I can't say confidently that you really have an issue-free 14th gen Core i9 CPU.
 

Hulk

Diamond Member
Oct 9, 1999
4,443
2,352
136
Ahem. Your sig shows you cheaped out on the mobo and went with DDR4. No offense but as @coercitiv wondered out aloud, it could be a combination of factors that is causing the issues and DDR5 might be one of them, maybe coz it pushes the RPL IMC to work harder than it has to for DDR4? So unless you want to get a DDR5 mobo and tempt fate, I can't say confidently that you really have an issue-free 14th gen Core i9 CPU.
True. But my system as configured has been thus far issue free.

Any rig can be pushed to have issues.
 
Jul 27, 2020
19,380
13,289
146
But my system as configured has been thus far issue free.
That may very well be the key to your success with your 14900K. That same CPU in a DDR5 mobo may degrade over time and I posited that Intel most likely knew this before shipping these CPUs worldwide and shipped them anyway because maybe their test sample size was too small and they underestimated the extent of the problem.
 

Hulk

Diamond Member
Oct 9, 1999
4,443
2,352
136
That may very well be the key to your success with your 14900K. That same CPU in a DDR5 mobo may degrade over time and I posited that Intel most likely knew this before shipping these CPUs worldwide and shipped them anyway because maybe their test sample size was too small and they underestimated the extent of the problem.
Of course that is possible. Overworked/clocked memory controller?

Intel is going to try and stay quiet and wait this out I would bet. Meaning replace CPU's that are returned under warranty and move on to ARL and LNC. "Don't look at what I have in that hand, look at what I have in this hand!" A recall would be a disaster, especially if it included BGA parts.
 
Reactions: igor_kavinski

9949asd

Member
Jul 12, 2024
30
8
36
Do explain then, won't you, why Intel has acknowledged the issues and announced an investigation is ongoing? Or why they pushed a microcode update they say is partly responsible. Or why they just came out of lurker mode to assure everyone that mobile CPUs are not susceptible to the instability issues the other CPUs have.

The only thing not proven yet is WHY they are becoming unstable, not IF they are.
Ask intel not me
 
Reactions: DAPUNISHER

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,368
23,866
146
Are the non K processors also having issues?
That's where things get clear as mud. Reports of HX and T series. Some vanilla i9 and i7 too, even a few i5 like the 13500. I personally am not going to be spreading that info around unless or until it is confirmed to be a fab issues including those SKUs. While we have plenty of verifiable reports with impeccable credentials, I think some are taking the opportunity to "grind an axe" with Intel. Which is what makes me a little spicy. They need to stop leveraging this to wage their jihad, so the actual issues can be addressed.
 

Doug S

Platinum Member
Feb 8, 2020
2,675
4,521
136
That's where things get clear as mud. Reports of HX and T series. Some vanilla i9 and i7 too, even a few i5 like the 13500. I personally am not going to be spreading that info around unless or until it is confirmed to be a fab issues including those SKUs. While we have plenty of verifiable reports with impeccable credentials, I think some are taking the opportunity to "grind an axe" with Intel. Which is what makes me a little spicy. They need to stop leveraging this to wage their jihad, so the actual issues can be addressed.

When issues like this come to light you tend to have a lot of "piling on" with one off reports that people assume are related to the pile. Like when you get very real reports of MCAS related crashes and near misses on Boeing 737 MAX and conspiracy theorists are soon blaming it for MH370 even though that airplane was a 777 and did not have MCAS.

So I could easily believe you have a case where a few random people have say a "T" CPU that failed (which may or may not have been operated out of spec in some manner) who never would have spoken up (or wouldn't have received much notice if they did) but due to the larger issue with K series decided to say "it happened to me too!"
 

TheELF

Diamond Member
Dec 22, 2012
4,026
753
126
This is proven untrue by the terrible failure rates on the W series boards.
W series boards allow both ram and CPU overclock, now we have no idea if they auto overclock as default without asking the user like the z boards do, but stating w boards alone is not enough, we would need to have more info on how they ran the CPUs from day one.
Survivorship bias is powerful mojo for those experiencing it.

On top of the W series, we have word a major S.I. is failing something like 12% of Raptor Lake during the initial stress testing. That's a lot of faulty CPUs out of the gate.
Again no idea on how they do the stress test, if it's an overclock stability test and it fails at the settings they are using , because they are way above normal, then they just have to use lower settings, we don't know because they leave that stuff out.
On the contrary, Wendel's video alone had good data from people who operate thousands of these CPUs in a professional environment. More data points are coming in every day. The latest GN video has a source from a top OEM indicating immediate failure rates of 10-25% of 13th gen Raptor Lake CPUs. Keep in mind those are CPUs that probably never shipped to customers and had the failures out-of-the-gate. That doesn't include ones that can go bad after 6-12 months of use.

What is proven is that there are abnormally high failure rates for Raptor Lake CPUs.
Professional, in one domain, doesn't mean they know what they are doing in another...
Do explain then, won't you, why Intel has acknowledged the issues and announced an investigation is ongoing? Or why they pushed a microcode update they say is partly responsible. Or why they just came out of lurker mode to assure everyone that mobile CPUs are not susceptible to the instability issues the other CPUs have.

The only thing not proven yet is WHY they are becoming unstable, not IF they are.
They didn't acknowledge the issues, they stated that they will investigate to see if there is any issue, and all the stuff you listed is a result of that.
Here, take careful note of the wording, for example.
"Intel and its partners are continuing to investigate user reports regarding instability issues"
 

coercitiv

Diamond Member
Jan 24, 2014
6,584
13,865
136
The "T" CPUs are on the list of affected CPUs in the Gamers Nexus video, a list of ~6 million CPUs they had from a major OEM with a failure rate of 10-25%. They're also included on the alleged leak list of CPUs confirmed by Intel to have issues. For the mobile CPUs we lack enough info, at worst we can consider them suspects. However, the absence of a proper statement from Intel and the relatively weak denial of problems with the mobile CPUs does not inspire me with confidence to say the least.

I would not go as far as saying we're certain T CPUs are affected, but any theory based on the idea that T CPUs are fine should also be discarded for now. If the 4.9Ghz 13700T is going down... all the talk about CPUs being pushed too far is also going to the recycle bin.

To me it seems all the culprits we suspected until now are catalysts for another problem, they accelerate the process induced by the root cause. This would explain why 13900K/14900K are the first ones to go down (statistically speaking), would explain why users who take care of their settings aren't seeing degradation as fast as others, and would also put 13600K and 13700T to the bottom of the affected CPUs.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,368
23,866
146
They didn't acknowledge the issues, they stated that they will investigate to see if there is any issue,
Their latest statement -

Intel is aware of a small number of instability reports on Intel Core 13th/14th Gen mobile processors. Based on our in-depth analysis of the reported Intel Core 13th/14th Gen desktop processor instability issues, Intel has determined that mobile products are not exposed to the same issue. The symptoms being reported on 13th/14th Gen mobile systems – including system hangs and crashes – are common symptoms stemming from a broad range of potential software and hardware issues. As always, if users are experiencing issues with their Intel-powered laptops we encourage them to reach out to the system manufacturer for further assistance.”
They have done in-depth analysis and acknowledge it now. There is no real wiggle room for alternative explanations as it is followed by stating the mobile issues are due to common instability causes.
 

Kocicak

Golden Member
Jan 17, 2019
1,059
1,114
136
I tried to calculate Blacks equation for the activation energy 0,9 eV, current density exponent 1,2, doubling current density and increasing temperature from 60°C (333K) to 100°C (373K), and I got 66 times shorter time before failure. Did I make a mistake in the calculation? It seems wrong.
...
I just found something supporting my (probably not very accurate) result, that electromigration is strongly dependent on temperature. Increasing temperature by 57/60°C caused tenfold decrease of the time to reach the same amount of failures.
...
It seems that there could be a real problem with oxidation, which causes some interconnects to have high resistance and fail in higher stress conditions.
...
Perhaps the problem is that these interconnects do not directly fail, but just cause some hotspots which will cause premature degradation of the lower layers of the chip.
...
This manufacturing issue must impact their server chips too. I wonder if they mitigate it by extensive testing and burn-in of their server CPUs.
...
I realised now that server CPUs usually run at pretty low voltages and frequencies.

If the problem is accelerated degradation due to hotspots caused by high resistance interconnects caused by a problem in manufacturing process, low enough voltage and frequency will simply almost prevent these hotspots from occuring, even in those huge and very complex server CPU dies.


I would not go as far as saying we're certain T CPUs are affected, but any theory based on the idea that T CPUs are fine should also be discarded for now. If the 4.9Ghz 13700T is going down... all the talk about CPUs being pushed too far is also going to the recycle bin.

To me it seems all the culprits we suspected until now are catalysts for another problem, they accelerate the process induced by the root cause. This would explain why 13900K/14900K are the first ones to go down (statistically speaking), would explain why users who take care of their settings aren't seeing degradation as fast as others, and would also put 13600K and 13700T to the bottom of the affected CPUs.
4,9 GHz is not a very low frequency that would for sure prevent any overheating.

If we presume that:

1) Manufacturing process shortcoming leads to random number of randomly located spots that may lead to local overheating dependent on voltage needed to reach a particular frequency in random degrees

2) Different parts of the silicon die are differently sensitive to temperature and will degrade at different speeds

we get quite complicated situation that does not lead to any clear picture.

For example, we can have 13700T with a single hotspot, that is by chance pretty severe and located in the part of the die highly sensitive to temperature

and have 14900K with multiple hotspots, that are all by chance not so severe and are located in parts of die that are quite resilient to temperature.

In this case even the 13700T with its modest frequency in a heavy professional usage situation can degrade quicker than that 14900K midly used in a home PC.
 
Last edited:
Reactions: KompuKare

Hitman928

Diamond Member
Apr 15, 2012
6,003
10,295
136
W series boards allow both ram and CPU overclock, now we have no idea if they auto overclock as default without asking the user like the z boards do, but stating w boards alone is not enough, we would need to have more info on how they ran the CPUs from day one.

The companies Wendel talked to have tried multiple things to resolve the issue and received a large number of replacement CPUs from Intel. One company, after trying everything they could, getting replacement CPUs from Intel, and the replacement CPUs also failed, is spending additional money to swap out all of their Intel servers with AMD ones. Do you really think they would go through all of this and spend the money to swap all of their systems to AMD if it was just a default BIOS setting? Additionally, Supermicro boards were failing just as much as Asus ones. Supermicro is not in the business of supporting overclockers, especially on workstation boards. Their whole model is to provide rock stable platforms with minimal customization options. If the failure rate is the same using those boards, there is basically no hope in pinning the problem on motherboard settings.
 

evident

Lifer
Apr 5, 2005
12,010
625
126
This really blows. I bought two i7-13700K's. Please correct me if i'm wrong. even if i dont overclock these, these can still fail? I have one configured as my gaming rig and another one as a proxmox home server thingy.
 

coercitiv

Diamond Member
Jan 24, 2014
6,584
13,865
136
Please correct me if i'm wrong. even if i dont overclock these, these can still fail?
Based on what we know today, yes, they can fail even without overclocks. You'll have to wait a while longer until Intel decides to come out with a clear statement on the root cause and the spread of the issue.
 

Hitman928

Diamond Member
Apr 15, 2012
6,003
10,295
136
This really blows. I bought two i7-13700K's. Please correct me if i'm wrong. even if i dont overclock these, these can still fail? I have one configured as my gaming rig and another one as a proxmox home server thingy.

If you already have the systems, aren't overclocking, and aren't seeing any issues, I would just hold tight for now. Hopefully your CPUs end up not being effected or at least not within the life time you use them. If you want, you could try out some of the known games/apps that seem to expose the instability to see if your CPUs are problematic at this time. At least then you'd know you need to RMA if they fail, or give you some peace of mind that they are probably OK if they don't.
 

9949asd

Member
Jul 12, 2024
30
8
36
I surmised as much. It's your prerogative to set that standard. I do not share it though. It doesn't matter why the broken clock is correct twice a day, it is still correct.

Here's how I see it. The bar isn't nearly that high to clear.

Thread title plus bold sentences are a hypothesis + Mark understood and agreed with the hypothesis in a post that many have almost exactly repeated the last few months = bar cleared.
Here is the statement, just like I said it’s the voltage was too high.
 

Attachments

  • IMG_2833.png
    1 MB · Views: 28
Reactions: DAPUNISHER
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |