Overheating damage??

koshling

Member
Nov 15, 2005
43
0
0
I've been running utterly stable for about 5 months with a stock-clocked 4770 (Noctua DH14U in an FT02). Recently I've been running a VERY CPU-intensive process for LONG periods (overnight) and I noticed that this runs the CPU at/near its thermal limits (bouncing off Tj max at 100 degrees). Under 'normal' load I get core temps around 50-60 degrees (when running a bunch of active VMs and running parallel compilations etc.), so this load profile is extraordinary (its an aggressively multi-threaded, aggressively utilization-optimizing Mont Caro Tree Search general game player).

As of last Friday my system became unstable, and now hard-hangs (no interrupts - so no mouse movement, no response to Ctrl-alt-del, only power cycle shifts it [the display freezes rather than breaks up so the GPU is fine), typically after about 20 minutes of high-load.

So, a few questions:

1) Is it possible (probable) that running things near the thermal max for an extended period has caused physical damage to the CPU? (isn't that what auto-throttling is supposed to prevent?)

2) Given that it is unstable at load, but not when lightly loaded, what does that tell us about the nature of the damage? (i.e. - is it likely that I can avoid it by lowering load in some [reasonable] way)?

3) Can I be certain the damage is to the CPU? If so should I consider replacing it? If so what steps should I take to prevent a re-occurance with a new CPU if I can't rely on the auto-throttling (i.e. - what is a safe level below the throttle-trigger level)?

4) Related to (2) and (3) is there some way to lower the core temp the CPU auto-throttles at (I can't seem to see one in my BIOS)
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
I don't think that hitting 100C at stock clocks with a DH-14U is normal under any sort of load, no matter what you do with it unless you have a very high ambient temp... I think you might want to check your paste and how the heatsink is seated.
 

koshling

Member
Nov 15, 2005
43
0
0
I didn't see it the first reading either, but his original post says no.

Correct - no. I suspect I have one core that has poor thermal contact, either due to a manufacturing issue between core and spreader, or Tim/HS contact externally. If I reduce the thread load so that only 2 cores are fully loaded at any one time I seem to be fine (with temps maxxing at 60 or so), which suggests having dark silicon (well darkish anyway) on an adjacent core always is solving the issue. I'll try re-seating the heatsink as suggested - can anyone point me at a good guide for doing this (in regard to most appropriate way to clean old TiM if need be before applying new)?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Cores should not go defect due to heat, since they throttle.

Are you sure its not a memory related issue? Have you tried running memtest?
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
Cores should not go defect due to heat, since they throttle.

Are you sure its not a memory related issue? Have you tried running memtest?

Yeah, looking at memory might be a good idea. The temps seem very wrong, but I don't think that's causing the stability issue.
 

koshling

Member
Nov 15, 2005
43
0
0
Yeah, looking at memory might be a good idea. The temps seem very wrong, but I don't think that's causing the stability issue.

If it was a bad bit in the memory then surely reducing the threading load (which doesn't impact memory access patterns apart from having less of them to the same structure instances - so no change in amount used etc.) would not have stabilized things would it?

I can see that it might have reduced the stressfulness of the access pattern, and thus reduced timing stress, but I also (before I reduced the threading intensity) disabled the XME profile and dropped the memory back to loser timings (it wasn't O/C'd relative to the memory spec previously, but it was operating under an XME profile). Since loosening the memory timings in this way made no difference, and actual memory usage is unchanged by the threading change (which DID reduce CPU load and heat significantly) did, doesn't that suggest it's not a memory issue? Could be a memory controller issue perhaps (but that' back to the CPU die)...?
 

The Day Dreamer

Senior member
Nov 5, 2013
415
2
81
I think your Cabinet cooling is at fault.

1. Clean your Processor heat-sink. You will be suprised by the amount of dust it can hold

2. Re-apply new Layer of Thermal paste.

3. Install additional Fans.

4. Try Liquid cooling if you willing to spend more bucks on your current Rig. :whiste:
 

koshling

Member
Nov 15, 2005
43
0
0
I think your Cabinet cooling is at fault.

1. Clean your Processor heat-sink. You will be suprised by the amount of dust it can hold

2. Re-apply new Layer of Thermal paste.

3. Install additional Fans.

4. Try Liquid cooling if you willing to spend more bucks on your current Rig. :whiste:

If it was that the core temps would not shift between 30 degrees and 100 degrees (and back) in 1-2 seconds (which they do, implying not much mass being heated - i.e. - thermal conductivity locally, not heat distribution more broadly). Also the HS is pretty new (and dust free - rig is only a few months old), and the air flow is VERY good indeed (3 140mm fans with unobstructed path blowing straight through the area the HSF is in [with HSF fan mount aligned to the same airflow] - the FT02 is an extremely good air cooled case). Furthermore the core temps do not respond to changing the fan speeds (which is expected since the case is not getting hot, it's definitely just local thermal conductivity out of the codes that is at issue)
 

Schmide

Diamond Member
Mar 7, 2002
5,596
730
126
Dude! Jumping from 30-100 in 1-2 sec is the exact symptom of bad hsf contact. Reset it!
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
If it was that the core temps would not shift between 30 degrees and 100 degrees (and back) in 1-2 seconds (which they do, implying not much mass being heated - i.e. - thermal conductivity locally, not heat distribution more broadly). Also the HS is pretty new (and dust free - rig is only a few months old), and the air flow is VERY good indeed (3 140mm fans with unobstructed path blowing straight through the area the HSF is in [with HSF fan mount aligned to the same airflow] - the FT02 is an extremely good air cooled case). Furthermore the core temps do not respond to changing the fan speeds (which is expected since the case is not getting hot, it's definitely just local thermal conductivity out of the codes that is at issue)

I ran into the same issue. It was unaffected by reseating. As I see it, the TIM between the CPU chip and the IHS can only pass but so much heat. After that, it acts as an insulator and your temp goes up really fast.

Idontcare noted that the gap between the IHS and the chip varies. My suspicion is that if you throttle your CPU at stock, then you have gotten a package where there is too much distance between the chip and the IHS.

After you try reseating your heatsink, you will have done everything you can at your end. At that point, you may wish to start a conversation with Intel to see what they can suggest. But my guess is that with overheating in a CPU at stock, it's time to swap it for a non-defective CPU.
 

Puppies04

Diamond Member
Apr 25, 2011
5,909
17
76
Probably a long shot but do the caps on your mobo, specifically around the CPU area look normal?
 

Torn Mind

Lifer
Nov 25, 2012
11,902
2,716
136
If possible, run a heavy load with the stock heatsink--I suspect the thermal material is still on there, leave it and just attach it after you clean the CPU top-- and see if it does the same thing. If it does, it is warranty claim time. In fact, you probably can skip this step if you take photos of the "compound smear" and have proof your Noctua covers the chip sufficiently.
 
Last edited:

sm625

Diamond Member
May 6, 2011
8,172
137
106
A search for the word "delid" yielded 0 responses. I am disappoint. You clearly need to delid and replace the TIM under your IHS..
 

VirtualLarry

No Lifer
Aug 25, 2001
56,554
10,171
126
A search for the word "delid" yielded 0 responses. I am disappoint. You clearly need to delid and replace the TIM under your IHS..

This should not be required of his CPU. Clearly, it was defectively mfg'ed, if that were the case. He's not overclocking, and he is using a larger-than-stock heatsink.

We've all heard about how Haswell throttles at stock on the stock heatsink when running AVX code.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
A search for the word "delid" yielded 0 responses. I am disappoint. You clearly need to delid and replace the TIM under your IHS..

Nobody "needs" to delid. Don't be ridiculous. Telling someone that voiding their warranty and risking destroying their CPU is their only option without being absolutely sure is a sign that you have no business giving people advice.
 

Torn Mind

Lifer
Nov 25, 2012
11,902
2,716
136
A search for the word "delid" yielded 0 responses. I am disappoint. You clearly need to delid and replace the TIM under your IHS..

Yeah, and if he bungles the delid, he's out of money and a CPU. The CPU should be working without such issues under the stock configuration Intel provides; his aftermarket cooler is total overkill and yet his chip still reached a malfunctioning state. This is a manufacturing defect on the part of Intel and he should simply make a warranty claim and get a new CPU that hopefully doesn't kill itself under full load for an extended period of time at stock values. Even Idontcare pointed out that you don't just go around delidding things without thinking about whether you can afford to assume the risk of losing the CPU.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |