Recently, AMD was nice enough to sell us their first 7nm product: Radeon VII. Those of us who bought one or read between the lines on reviews found out about the Tjunction temperature reading from the card. How is this relevant to a speculation thread about Zen2? Consider this:
When Zen first launched, many of us 1700x and 1800x owners began to notice Tctl, which is a temperature reading exactly 20C above Tdie/package temp. To hear AMD's own words on Tctl:
https://community.amd.com/community/gaming/blog/2017/03/13/amd-ryzen-community-update
Consider carefully what AMD had to say about it back in 2017, and keep in mind that thermal throttling on models that had the Tctl offset was based off Tctl, not Tdie/package temp.
Fast forward to the future, with Radeon VII's two different temperature settings (Temp/Junction Temp). Junction Temperature, which is always higher, does NOT behave the same way, except that it also sets the curve for thermal throttling.
At idle, TJunction (or Tjunct) is about the same as Tdie. At load, Tjunct is around 25-30C higher than Tdie. I saw a guy at OCN use a hacked-together water-cooling solution for his Radeon VII, and even when he pushed his Tdie down to 40C during load, he's still showing 60C TJunct (20C delta). What gives?
It's all about them hotspots.
I had thought, back in 2017, that the Tctl we were seeing on the "overclocker"-oriented Zen chips might have been related to hotspots. AMD might have been having problems measuring temperatures across the entire die, so they provided themselves with a failsafe by assuming that, if the edge of the die (or the solder immediately contacting the die) had a temperature of X, that some point in the die might reach a temperature as high as 20C above that temperature. So if they throttled accordingly, they could protect the die. That was never established as a fact.
Now we have a 7nm process. Hotspots are a bigger problem with a smaller process. I don't know how AMD measures Tjunct - are they measuring die temperature from the die edge, and Tjunct from the thermal pad junction point? I don't know. But they're consistently showing higher temps for Tjunct. It also seems like it's very difficult to move Tjunct downward. People have moved it down maybe 5C on Radeon VII by replacing the stock graphite pad with liquid metal. And as I cited above, moving to water cooling brought it down by about 10C relative to Tdie. But what Tjunct is doing is to effectively protect the CPU from hotspot temps - either AMD is actually tracking hotspots effectively and showing us the temperatures, or they're using some kind of an estimation algorithm based on Tdie (instead of just adding +20C like they did on early Ryzen chips). Either way, they throttle based on that temp to protect the hottest areas of the die.
Zen2 will probably behave the same way.
From a practical point-of-view, the only way for us to cope with that is to try to cool the coolest areas of the die as best we can; those are probably the only accurate temps we can record anyway. During load, the hotspots on our Ryzen 3xxx CPUs may be 20-30C higher than anywhere else. So if "anywhere else" is 50-60C, then we may be in for some trouble. Especially when overclocking.
Based on my experiences moving from Vega FE (14nm) to Radeon VII (7nm), I can tell you that the average GPU temp on my 14nm card could easily stay below 75C with an undervolt and an extremely aggressive fan profile. Moving on to Radeon VII, under "similar" settings (I use quotes, because AVFS makes things really weird on Vega FE/Vega64), I can keep Tdie down in the same range while using a performance-oriented setting. Stock, temps are actually higher due to ridiculous voltage but I digress. The thing to keep in mind here is that GPU temp vs. GPU temp, 14nm and 7nm were about the same once I had the card locked in to a tune I liked for my setup. The difference is in the hotspots. Using my performance tune (1940 MHz) I get Tjunct values in the range of 100-102C. The GPU will throttle at around 105-110C Tjunct. Eww. Also keep in mind that Radeon VII is using maybe 50W less power than Vega FE in this comparative scenario.
So if I use my 14nm 1800x as a starting point, during a heavy load while overclocked to the highest speeds the chip will normally allow (4 GHz), I can easily get Tdie temps in the range of 65-70C, with Tctl being 20C higher every time. So if my Tdie temps on Zen2, fully overclocked to whatever clockspeed it allows under those settings, approaches 65-70C, I'm going to be seeing hotspot temps at (or estimated to be at) 95-100C. Or higher. Yow. And that's with an NH-d15 with a pair of 3000 rpm IndustrialPPC fans. All that solder and the heatspreader on the die may cause hotspotting to get even worse (as opposed to Radeon VII which is direct-die except for the graphite pad).
Bottom line, I do not think overclockers will be very happy with hotspots on their 7nm CPUs. We may have to push those "average" temperatures into the basement to prevent massive hotspot throttling, and that may, in turn, require us to dial back our overclocks or go for heftier cooling. Custom water may be a requirement.