AMD Raven Ridge 'Zen APU' Thread

maddie · Jun 3, 2017

Glo. said:
No memory cell ever will have 30-50W TDP, unless the sizes will change dramatically.

I was responding to Valantar who claimed that Navi could not have stacked memory on top of the GPU as dissipating ~150W was unworkable.

My response was that a scalable design might only have a fraction of the total power budget per 3D stack and asked, can you dissipate 30W - 50W from a stacked 3D GPU/memory structure?

I believe the answer is yes, as this is equivalent in W/mm^2 ratio to a Ryzen die.

In addition the TSVs in the HBM2 memory contribute positively to vertical heat transfer.

maddie · Jun 3, 2017

Valantar said:
It sounds like you're mixing up VRAM power and GPU power (or you're not expressing yourself quite clearly). I (we?) was talking about stacking HBM on top of a relatively high powered chip like a CPU, GPU or APU. Even though the HBM does consume some power, it's bound to be far lower than the chip beneath, unless you're talking about a Core m competitor, in which case a couple of stacks of low-clocked HBM2 would probably be roughly equal. The problem here is twofold: first (and the lesser of the two) is that you're concentrating heat-generating chips into a smaller surface area, i.e. concentrating the heat. The second, and bigger issue is that the stacked-on-top HBM will act as a thermal insulator between the CPU/GPU/APU die and the IHS(/cold plate in a laptop), making thermal transfer (cooling) exponentially more difficult. In essence, you're trapping part of the heat from the lower chip, not allowing it to be carried away by the cooling system.

While switching from PoP to stacked dice will lower this effect (as @imported_jjj was pointing out) it will in no way remove it entirely. And the hotter the chip below, the bigger the problem this becomes (as more heat generated means it needs to be dissipated quicker to avoid overheating). You'll run a very high risk of your Cpu entering heat soak, while your cooling solution will become far less efficient.

This barely works for ultra-low power devices like tablets (where, as I said, the iPad outperforms everything else in sustained loads largely due to having off-package RAM), but can never, ever work with more power hungry chips than that.

Silicon is not a bad heat conductor, as you seem to think. You are acting as if it's actually going to heavily insulate the lowest CPU/GPU logic die. A few degrees higher operating temps will easily compensate for the higher heat production.

I don't know if you have ever solved these type of engineering problems, but the solutions in this scenario [very thin layers, fair conductor] are not even close to dire as you might believe.

Heat conduction rates:
Silicon 149 W/MK
Aluminium 237 W/MK

Now look at a silicone TIM
Dow Corning TC-5022 4 W/MK

This means that a silicon layer can be (149/4) 37 times thicker than a TC-5022 TIM layer for the same power dissipation at the same temperature differential. You can't use language reasoning to be precise here., you must use numbers, and they don't agree with your reasoning.

Valantar · Jun 3, 2017

maddie said:
I was responding to Valantar who claimed that Navi could not have stacked memory on top of the GPU as dissipating ~150W was unworkable.

My response was that a scalable design might only have a fraction of the total power budget per 3D stack and asked, can you dissipate 30W - 50W from a stacked 3D GPU/memory structure?

I believe the answer is yes, as this is equivalent in W/mm^2 ratio to a Ryzen die.

In addition the TVAs in the HBM2 memory contribute positively to vertical heat transfer.

But... but... but Ryzen doesn't have a second layer of silicon stacked on top of it and hindering the efficiency of the IHS! Ryzen generates its heat in a single layer, with a relatively uniform spread! It doesn't trap the majority of its heat production beneath an insulating layer! You see the difference, right?

GPUs need the silicon to stay beneath roughly 95°. For a 4-500mm² chip generating 150W+, that means that the heat needs to be dissipated very, very quickly. Adding any insulation (even if that insulation generates its own heat(which at least to me sounds even worse)) between the GPU die and the heatsink will dramatically reduce the efficiency of the heatsink.

Also, the best thing you can hope for with thermal vias is for the stacking to not insulate quite as much. It can never, ever, give as efficient thermal transfer as a layer of TIM or solder and then an IHS or heatsink. That should be obvious.

Not to mention a very crucial issue of die stacking: HBM(2) has a fixed size. Which means that either you dimension the die beneath to exactly match the dimensions of X number of memory dice, or you need extremely precision-engineered heatsinks that can contact both memory and the non-stacked parts of the die beneath equally. Or, of course, you fill in the gaps with TIM, giving horrible thermal transfer.

In other words: there are no good solutions for stacking memory on top of high power ICs. It's simply not viable.

maddie · Jun 3, 2017

Valantar said:
But... but... but Ryzen doesn't have a second layer of silicon stacked on top of it and hindering the efficiency of the IHS! Ryzen generates its heat in a single layer, with a relatively uniform spread! It doesn't trap the majority of its heat production beneath an insulating layer! You see the difference, right?

GPUs need the silicon to stay beneath roughly 95°. For a 4-500mm² chip generating 150W+, that means that the heat needs to be dissipated very, very quickly. Adding any insulation (even if that insulation generates its own heat(which at least to me sounds even worse)) between the GPU die and the heatsink will dramatically reduce the efficiency of the heatsink.

Also, the best thing you can hope for with thermal vias is for the stacking to not insulate quite as much. It can never, ever, give as efficient thermal transfer as a layer of TIM or solder and then an IHS or heatsink. That should be obvious.

Not to mention a very crucial issue of die stacking: HBM(2) has a fixed size. Which means that either you dimension the die beneath to exactly match the dimensions of X number of memory dice, or you need extremely precision-engineered heatsinks that can contact both memory and the non-stacked parts of the die beneath equally. Or, of course, you fill in the gaps with TIM, giving horrible thermal transfer.

In other words: there are no good solutions for stacking memory on top of high power ICs. It's simply not viable.

Please read my post#777 above yours in this thread.

You keep writing statements like these:

"For a 4-500mm² chip generating 150W+, that means that the heat needs to be dissipated very, very quickly."
"It doesn't trap the majority of its heat production beneath an insulating layer"

What are the heat conduction rates? What is the additional delta Temperature needed to allow the same heat transfer rate?

You keep repeating "the insulating layer". This is a very vague and arbitrary statement. The heat conduction values of the various materials prove it to be wrong.

Heat conduction rates:
Silicon 149 W/MK
Aluminium 237 W/MK
Dow Corning TC-5022 4 W/MK

The HBM2 stack has a height of 0.72mm. This 0.72mm height has the same heat conduction transfer rate as a (0.72x(149/4)))mm or 0.019mm silicone TIM layer or a ((237/149)x0.72)mm or 1.145mm aluminium layer.

Are you really saying that a 1.145mm aluminium layer on top of a 40W 92mm^2 die (HBM2 module area) will cause severe heat transfer problems. This is the exact equivalent scenario for a heat transfer solution. A 0.72mm silicon layer or a 1.145mm aluminum layer. Both situations have the same effect on heat flow or temp delta needed for a given heat flow.

imported_jjj · Jun 3, 2017

maddie said:
I was responding to Valantar who claimed that Navi could not have stacked memory on top of the GPU as dissipating ~150W was unworkable.

My response was that a scalable design might only have a fraction of the total power budget per 3D stack and asked, can you dissipate 30W - 50W from a stacked 3D GPU/memory structure?

I believe the answer is yes, as this is equivalent in W/mm^2 ratio to a Ryzen die.

In addition the TSVs in the HBM2 memory contribute positively to vertical heat transfer.

HBM is way too hot to be stacked on high performance logic today, way too hot.
It's not about total power to begin with, to simplify it, it's about the hot spots. If you stack a hot thing on a very hot thing, they both get way hotter.

I'll include a slide for folks that don't have the time to watch the vid

Valantar · Jun 3, 2017

maddie said:
Are you really saying that...

No. What I'm saying is this: adding more layers means adding more material that doesn't aid (and thus harms) thermal conductivity. You still need the layer of TIM, you still need the cold plate, and so on. Adding the stacked dice doesn't change any of this, doesn't remove the requirement for any of the above. As such, any intermediate layer, no matter how thin, acts as an insulator as longs as its thermal conductivity is lower than that of the cold plate. Isn't that clear? If the intermediate layer itself generates heat, that exacerbates the problem, as this will heat up both the stacked layer and the base layer. Not to mention that stacking chips necessitates an air gap (unless you figure out a way to fill this with something that conducts heat but not electricity, and doesn't interfere with solder, in which case you'd be a billionaire), which, no matter how thin, introduces massive thermal resistance.

With heat sensitive heat generating components, you try to keep them separated. That's not only simple logic, but sound engineering principles.

maddie · Jun 3, 2017

Valantar said:
No. What I'm saying is this: adding more layers means adding more material that doesn't aid (and thus harms) thermal conductivity. You still need the layer of TIM, you still need the cold plate, and so on. Adding the stacked dice doesn't change any of this, doesn't remove the requirement for any of the above. As such, any intermediate layer, no matter how thin, acts as an insulator as longs as its thermal conductivity is lower than that of the cold plate. Isn't that clear? If the intermediate layer itself generates heat, that exacerbates the problem, as this will heat up both the stacked layer and the base layer. Not to mention that stacking chips necessitates an air gap (unless you figure out a way to fill this with something that conducts heat but not electricity, and doesn't interfere with solder, in which case you'd be a billionaire), which, no matter how thin, introduces massive thermal resistance.

With heat sensitive heat generating components, you try to keep them separated. That's not only simple logic, but sound engineering principles.

Good lord man. The most it would mean is a few degrees extra needed.

I agree with you that any additional layer adds to the resistance and in a perfect world we would not have any, but all engineering incorporate compromises and good engineering knows what compromises are worth having. Simple logic can be very misleading.

My belief as to the problems with this tech is not heat transfer as you seem to believe, but failure rates in the assembly of the stack. Each layer has an assembly failure rate and the total failure rate for the stack would be compounded by the individual rates making good yields difficult at this time. The more levels you add, the failure rate for the total stack rises exponentially and yields drop to uneconomic levels.

f = failure rate for each level
n = # levels

HBM2 KGD = (1-f)^n

maddie · Jun 3, 2017

imported_jjj said:
HBM is way too hot to be stacked on high performance logic today, way too hot.
It's not about total power to begin with, to simplify it, it's about the hot spots. If you stack a hot thing on a very hot thing, they both get way hotter.

I'll include a slide for folks that don't have the time to watch the vid

Can you link the slide, it's a bit illegible? I'll look at it and reply with my thoughts.

imported_jjj · Jun 3, 2017

maddie said:
Can you link the slide, it's a bit illegible? I'll look at it and reply with my thoughts.

The slide is also in the video, here's the entire slide stack form that Xilinx presentation in PDF form (slide 14 and 15 are on thermal issues) https://www.hotchips.org/wp-content...1-HBM-package-Suresh_Ramalingam-Xilinx-v5.pdf
Stacking on high perf logic is an entirely different beast though as there is a localized thermal issue in the areas the die overlap. My point was to show that the memory itself is hot. Some years down the road things will get there but it's too soon.

maddie · Jun 3, 2017

imported_jjj said:
The slide is also in the video, here's the entire slide stack form that Xilinx presentation in PDF form (slide 14 and 15 are on thermal issues) https://www.hotchips.org/wp-content...1-HBM-package-Suresh_Ramalingam-Xilinx-v5.pdf
Stacking on high perf logic is an entirely different beast though as there is a localized thermal issue in the areas the die overlap. My point was to show that the memory itself is hot. Some years down the road things will get there but it's too soon.

I'm seeing a 8 degree delta between the top and lowest layer for a 4-Hi stack. Unfortunately I can't tell the heat in watts being transferred as some is surely coming from the main logic die. I do see and agree with your point about the hot spots in the base layer as a limiting factor.

AtenRa · Jun 4, 2017

lolfail9001 said:
We both know how that 40% number is calculated, i wonder how does that translate in real life. Especially with power claim.

No i dont know how that 40% is calculated, but since we know that Embedded RavenRidge will support up to 3200MHz DDR-4 then that is 50% higher bandwidth than BristolRidge DDR-4 2133MHz. Add all the Vega enhancements , plus the 14nm LPP vs 28nm Bulk and 40%higher iGPU perf at half the power over BristolRidge is coming very close to be within normal expectations.

mtcn77 · Jun 4, 2017

AtenRa said:
No i dont know how that 40% is calculated, but since we know that Embedded RavenRidge will support up to 3200MHz DDR-4 then that is 50% higher bandwidth than BristolRidge DDR-4 2133MHz. Add all the Vega enhancements , plus the 14nm LPP vs 28nm Bulk and 40%higher iGPU perf at half the power over BristolRidge is coming very close to be within normal expectations.

704/512=1.375

imported_jjj · Jun 4, 2017

mtcn77 said:
704/512=1.375

Except that would mean a 40% increase in power too but they claim half the power. To get from 140% to 50% they would need a 64% reduction in power per "perf" - we don't how perf is defined, FLOPS or gaming perf, likely gaming since that favors the stats.
64% lower power from just process is too much and the math is more complex than that.

mtcn77 · Jun 4, 2017

imported_jjj said:
Except that would mean a 40% increase in power too but they claim half the power. To get from 140% to 50% they would need a 64% reduction in power per "perf" - we don't how perf is defined, FLOPS or gaming perf, likely gaming since that favors the stats.
64% lower power from just process is too much and the math is more complex than that.

14nm? It is that simple, there isn't much calibration to their former apus. I find the oem partners to blame mostly.

imported_jjj · Jun 4, 2017

mtcn77 said:
14nm? It is that simple, there isn't much calibration to their former apus. I find the oem partners to blame mostly.

LOL

Mopetar · Jun 4, 2017

Improved performance might also come from better memory bandwidth. Pick a benchmark that was memory starved and RR is going to look even better.

We don't know enough about Vega yet to definitively say how much of a leap that alone is over GCN 1.3 but we know how big The Zen cores will be so if Vega is in the same ballpark it's not too difficult to imagine that there are a few scenarios where it looks that good.

majord · Jun 4, 2017

imported_jjj said:
LOL

did you expect 14nm to yield zero performance/watt increase?

Polaris is at least 50% higher perf/watt over its direct 28nm cousin, but watt @ x clock and shader count is a bit 'untested', and may well be signifcantly higher. It's also unknown what Vega architecute will bring to the table with bandwidth starved APU's.

-Vega vs Tonga
-14nmFF vs 28nm bulk
-Higher memory bandwidth
-potential for Higher TDP share for the GPU (thanks to Zen's efficency, and much much smarter SoC power management)

all may well combine to equal a doubling in perf/watt , or to be more specific perf/SoC TDP. and I don't mean in some highly theretical best case scenaro either, seems quite realistic actually.

perhaps think about it this way.. based on what we know about RX Vega . It's clocked ~60% higher than an identically configured (SP count, ROP, etc) Fiji, at compariable TDP, and the educated assumption is it will bring at least 50% higher performance.

RR on the other hand has a 34% higher SP count than BR, , and therfore will barely need to be clocked any higher than current GCN1.3 bristol ridge to achieve that 40% higher performance...

Consider Watt / clock is always more advantageous than clock/Watt , and you can see how such claims are not so unrealistic

imported_jjj · Jun 4, 2017

majord said:
did you expect 14nm to yield zero performance/watt increase?

Polaris is at least 50% higher perf/watt over its direct 28nm cousin, but watt @ x clock and shader count is a bit 'untested', and may well be signifcantly higher. It's also unknown what Vega architecute will bring to the table with bandwidth starved APU's.

-Vega vs Tonga
-14nmFF vs 28nm bulk
-Higher memory bandwidth
-potential for Higher TDP share for the GPU (thanks to Zen's efficency, and much much smarter SoC power management)

all may well combine to equal a doubling in perf/watt , or to be more specific perf/SoC TDP. and I don't mean in some highly theretical best case scenaro either, seems quite realistic actually.

perhaps think about it this way.. based on what we know about RX Vega . It's clocked ~60% higher than an identically configured (SP count, ROP, etc) Fiji, at compariable TDP, and the educated assumption is it will bring at least 50% higher performance.

RR on the other hand has a 34% higher SP count than BR, , and therfore will barely need to be clocked any higher than current GCN1.3 bristol ridge to achieve that 40% higher performance...

Consider Watt / clock is always more advantageous than clock/Watt , and you can see how such claims are not so unrealistic

You are entirely missing the point, the dude claimed a 64% reduction in power consumption from the process shrink and no gains except from increase core cont.
And that was laughable,thus the LOL.
Too bad you can't read...

mtcn77 · Jun 4, 2017

imported_jjj said:
You are entirely missing the point, the dude claimed a 64% reduction in power consumption from the process shrink and no gains except from increase core cont.
And that was laughable,thus the LOL.
Too bad you can't read...

How rude of you. In fact, 65% is what GF quotes. Core count increase is 37.5% and I'll let you in on a 2.5% IPC discount.
If you look at Anandtech's article, they specifically state one or the other;
Samsung 10nm:"This manufacturing process allowed the company to make its chips 30% smaller compared to ICs made using its 14LPE process as well as reducing power consumption by 40% (at the same frequency and complexity) or increase their frequency by 27% (at the same power and complexity)."
TSMC:"a ~50% higher transistor density, a 20% performance improvement at the same power and complexity or a 40% lower power consumption at the same frequency and complexity."

imported_jjj · Jun 5, 2017

mtcn77 said:
How rude of you. In fact, 65% is what GF quotes. Core count increase is 37.5% and I'll let you in on a 2.5% IPC discount.
If you look at Anandtech's article, they specifically state one or the other;
Samsung 10nm:"This manufacturing process allowed the company to make its chips 30% smaller compared to ICs made using its 14LPE process as well as reducing power consumption by 40% (at the same frequency and complexity) or increase their frequency by 27% (at the same power and complexity)."
TSMC:"a ~50% higher transistor density, a 20% performance improvement at the same power and complexity or a 40% lower power consumption at the same frequency and complexity."

Instead of just making the same funny claim over and over again, how about you get smarter and learn something.
The process doesn't give them 64% and AMD's claim is a combination of many factors that have been enumerated here a bunch of times by a bunch of folks, including myself a few pages back.
It's ok to be wrong, it's idiotic to persist in your mistake.

Insulting other members is not allowed.
Markfw
Anandtech Moderator

mtcn77 · Jun 5, 2017

imported_jjj said:
Instead of just making the same funny claim over and over again, how about you get smarter and learn something.
The process doesn't give them 64% and AMD's claim is a combination of many factors that have been enumerated here a bunch of times by a bunch of folks, including myself a few pages back.
It's ok to be wrong, it's idiotic to persist in your mistake.

Okay, you asked for it:
"GF’s technology is based on Samsung’s 14nm LPP (Low Power Process), and while it can perform some customization work to validate higher TDPs, AMD’s Zen CPUs are expected to top out around 95W TDP. GPUs, in contrast, can reach much higher values. It’s not uncommon for high-end graphics cards to hit 250W, and ultra-high-end cards can break 300W.

GlobalFoundries is claiming that its new process will deliver up to 50% gains in performance and 65% reduction in total power consumption."

NostaSeronx · Jun 5, 2017

mtcn77 said:
50% gains in performance and 65% reduction in total power consumption.

Recent numbers are 55% gains in performance or 65% reduction in power. From GF28HPP, which AMD had not bothered to use. (GF28SHP(Kaveri) -> GF28A(Carrizo, Bhavani(Desktop Puma), Carrizo-L(Desktop Puma+)) -> GF28HPA(Bristol Ridge, Stoney Ridge))

22FDX is more conventionally measured against AMD nodes;
<70% higher performance or ~75% lower power from AMD's current Bristol/Stoney Ridge node. (Of course these numbers are with biasing: 1.5V~1.8V. Which is more than ~1.3V peak that is seen with 28nm FDSOI designs.)

My beef @ AMD isn't for this thread though.

mtcn77 · Jun 5, 2017

NostaSeronx said:
Recent numbers are 55% gains in performance or 65% reduction in power. From GF28HPP, which AMD had not bothered to use. (GF28SHP(Kaveri) -> GF28A(Carrizo, Bhavani(Desktop Puma), Carrizo-L(Desktop Puma+)) -> GF28HPA(Bristol Ridge, Stoney Ridge))

22FDX is more conventionally measured against AMD nodes;
<70% higher performance or ~75% lower power from AMD's current Bristol/Stoney Ridge node. (Of course these numbers are with biasing: 1.5V~1.8V. Which is more than ~1.3V peak that is seen with 28nm FDSOI designs.)

My beef @ AMD isn't for this thread though.

I don't eat too much beef. Beef=megamitochondria.
Seriously, why no steady state testing? Even Haswell with its integrated vr doesn't respond seamlessly. It is just faster to use it at a steady multiplier with near voltage thresholding done and core not leaking unnecessarily at duty.
It is too much arbitrage. Setting the offsets in bios don't work either. It is PowerNow itself the culprit of stutter; power isn't the limit as my tdp is more than enough to keep steady.

T1beriu · Jun 9, 2017

Possibly new Raven Ridge Engineering Samples APUs

It seems the new ES have a modified code name structure. Bottom image is the "old?" decipher.

Source: Videocardz

Valantar · Jun 9, 2017

Doesn't that look more or less like the same system, just that the table is missing some designations?
Z=Qual. Sample
M=Mobile
200=2.0GHz base
0="0th" revision (yeah, this doesn't make much sense)
C4=some TDP (35W?) Might the "A" in the TDP signify chips without GPUs, and "C" signify APUs? Seems counterintuitive, but who knows?
T=soldered? Mobile socket whatsitsname?
4=4 cores (yeah, I've got no explanation why they'd suddenly start spelling that out)
M=cache config? Something about the iGPU?
F2=stepping.

Of course, we have no confirmation that the cipher above is actually entirely correct. Might, for example, one of the letters in two-symbol combinations somewhere actually be a designation for the iGPU, that we simply didn't know as all chips until recently were pure CPUs?

AMD Raven Ridge 'Zen APU' Thread

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Lifer

Member

Senior member

Member

Senior member

Diamond Member

Senior member

Senior member

Member

Senior member

Member

Diamond Member

Member

Member

Golden Member