Speculation: Ryzen 4000 series/Zen 3

Kenmitch · Aug 28, 2020

maddie said:
I gather it means a reduction in margin due to moving down the product stack.

I'm leaning towards he's talking performance.

maddie · Aug 28, 2020

jpiniero said:
Dual sourcing Zen 3 on some inferior node? Don't think it would be worth it.

Agreed.

LightningZ71 · Aug 28, 2020

Well, if TSMC can not give AMD more wafers, then they have to be able to compete with Intel leading edge offerings by either using less silicon per package (better cores, cache structure, and thermal/power management to allow more frequency for a given package power rating) or they have to come up with wafers from another source or node. TSMC has extra capacity in the 8 and 10nm nodes, but neither share design rules with 7nm. Samsung is moving into N7 territory with its leading edge node, but, again, far different design rules. We’ve heard absolutely nothing in the rumor mill about AMD sourcing from Samsung.

I think it’s just about Milan cores being that much better than Intel’s best.

I’ve wondered if AMD could find a way to fit a lower performance Zen3 CCD on GloFo’s improves 12LP node. I mean, more specifically, could AMD put an eight core CCX with 16MB of L3 in a CCD on GloFo’s 12lpp, and use a maximum of four of them on a pin compatible EPYC package. 12LPP is supposed to be more dense with better power characteristics than 12LP, and CCDs with half the L3 should be small enough to fit in the place of two N7 CCDs. This could be sold to the lower end of the market for IO heavy SKUs or just lower performance packages.

That’s obviously not going to happen, but I think it’s possible.

Markfw · Aug 28, 2020

LightningZ71 said:
I think it’s just about Milan cores being that much better than Intel’s best.

I think Rome is already better, core for core than Intels best, or at least equal to. Not to mention blowing them away in number of core/socket.

HurleyBird · Aug 28, 2020

Markfw said:
I think Rome is already better, core for core than Intels best, or at least equal to. Not to mention blowing them away in number of core/socket.

I think whenever Intel still (rarely) wins vs Rome, it's because of cache hierarchy or AVX512. Milan should eliminate that first (sometimes) advantage.

maddie · Aug 28, 2020

LightningZ71 said:
Well, if TSMC can not give AMD more wafers, then they have to be able to compete with Intel leading edge offerings by either using less silicon per package (better cores, cache structure, and thermal/power management to allow more frequency for a given package power rating) or they have to come up with wafers from another source or node. TSMC has extra capacity in the 8 and 10nm nodes, but neither share design rules with 7nm. Samsung is moving into N7 territory with its leading edge node, but, again, far different design rules. We’ve heard absolutely nothing in the rumor mill about AMD sourcing from Samsung.

I think it’s just about Milan cores being that much better than Intel’s best.

I’ve wondered if AMD could find a way to fit a lower performance Zen3 CCD on GloFo’s improves 12LP node. I mean, more specifically, could AMD put an eight core CCX with 16MB of L3 in a CCD on GloFo’s 12lpp, and use a maximum of four of them on a pin compatible EPYC package. 12LPP is supposed to be more dense with better power characteristics than 12LP, and CCDs with half the L3 should be small enough to fit in the place of two N7 CCDs. This could be sold to the lower end of the market for IO heavy SKUs or just lower performance packages.

That’s obviously not going to happen, but I think it’s possible.

Two Anandtech posts that might be relevant. It seems that 5nm is ramping very quickly, plus it's using a totally different fab than 7nm production, so no shared equipment.

1) TSMC Expects 5nm to be 11% of 2020 Wafer Production (sub 16nm)

TSMC Expects 5nm to be 11% of 2020 Wafer Production (sub 16nm)

www.anandtech.com

"However, comparing 5nm to TSMC’s 7nm capability, it does show that 2019 to 2020, 7nm increased by 22.7%, and in 2020, 5nm production will be ~24% of 7nm production. This leads into TSMC’s narrative that it expects to grow its 5nm production to double in 2021, and triple in 2022, using the 2020 numbers as a base."

2) ‘Better Yield on 5nm than 7nm’: TSMC Update on Defect Rates for N5

‘Better Yield on 5nm than 7nm’: TSMC Update on Defect Rates for N5

www.anandtech.com

DrMrLordX · Aug 28, 2020

@maddie

If TSMC can move enough of their mobile customers to 5nm, that would leave more N7+ wafers for AMD. AMD's move to 5nm may be rather sluggish.

maddie · Aug 28, 2020

DrMrLordX said:
@maddie

If TSMC can move enough of their mobile customers to 5nm, that would leave more N7+ wafers for AMD. AMD's move to 5nm may be rather sluggish.

True, but if there is more 5nm available, don't you think AMD would want to move also?

Don't we want Nosta to be right for once? Zen3 chiplets on both 7+ and 5.

jamescox · Aug 28, 2020

HurleyBird said:
I think whenever Intel still (rarely) wins vs Rome, it's because of cache hierarchy or AVX512. Milan should eliminate that first (sometimes) advantage.

The 32 MB monolithic cache on each CCX should take care of almost all cases where Intel was leading due to cache size accessible from a single core. Top Intel part is 28 cores with 38.5 MB of cache. With Milan, you will be able to get 8 CCD/CCX with 32 MB each in one socket. Intel can’t really compete with Rome, so how is it going to look compared to even a low end Milan? Also, I suspect that the AMD 32 MB cache will be a higher performing cache compared to Intel‘s mesh network. The process tech lead probably allows for a higher performing design.

They have also stated 32+ MB, so there may be some even larger cache die variant for high end servers/HPC. If Intel can’t compete with the “low-end” 32 MB variant, then how will it look if AMD has a 48, 64, or more die variant? I didn’t think any chip stacking was coming with Zen 3, but stacking a cache chip with the cpu die would allow for truly massive caches. Even going to 64 would probably not be that big of a die though, so I still consider stacking in Zen 3 to be unlikely. A 64 MB variant may only be around 100 square mm on whatever process Zen3 is using. I think the EPYC package could still fit 8 die. A 64 core Milan with 512 MB L3 could be a possibility.

It it is still unclear what is going on with AVX512 though. They have stated 50% FP performance boost, but Intel leads by quite a bit in some AVX benchmarks. I have been thinking that they would support AVX512 by doing the same thing they did with AVX256. Initial implementation was with half width unit over 2 clocks. If internal interconnect is still 32-bytes (256-bit) per clock, then splitting it over 2 clocks makes a lot of sense. I am kind of wondering if they just didn’t bother supporting AVX512 though. You would generally be better off using a gpu. Hopefully they will get to a point where we get a cache coherent, low latency, gpu commute unit included somehow. In the early days, an FPU was actually a separate chip before it moved on die and then became standard. I have wondered if they would try to leverage their gpu compute unit design in a more general fashion somehow.

NostaSeronx · Aug 28, 2020

jamescox said:
They have stated 50% FP performance boost, but Intel leads by quite a bit in some AVX benchmarks.

Current rumor is ~15% improvement on Int and ~10% on FP and a ~5% improvement on frequency.

Basically, Improved/Enhanced Zen2, rather than Next-Generation Zen3.

So, unless Milan launches as Zen2+ on N7e(N7 -> N7P -> N7e // N7e != N7+/N6). If not then there will be problems.

HurleyBird · Aug 28, 2020

jamescox said:
The 32 MB monolithic cache on each CCX should take care of almost all cases where Intel was leading due to cache size accessible from a single core. Top Intel part is 28 cores with 38.5 MB of cache.

Icelake-SP should increase that to 42MB. Still, it's difficult to envision many scenarios where 1*42MB will win out over 8*32MB L3.

amd6502 · Aug 28, 2020

15% increase for the integer core points to significant changes. My guess is core slightly widened core (+1 ALU) among other refinements to match. Hopefully that 15% holds for not only multithread Int increase.

A/// · Aug 28, 2020

Haven't read most of the replies on this thread, but a '22 triple rate for yield makes sense as 2022 will probably be when Zen 4 releases on AM5 with whatever rumored spec sheet.

NostaSeronx · Aug 29, 2020

A/// said:
Haven't read most of the replies on this thread, but a '22 triple rate for yield makes sense as 2022 will probably be when Zen 4 releases on AM5 with whatever rumored spec sheet.

Except... it's not really true.
One-sixth is 14nm at SMIC (15k at SMIC vs 90k at TSMC 16nm)
110k at 7nm
50k at 5nm

50K now(since, 2019), 120K+ in 2021(since, 2020).

So, far TSMC's capacity is off by a year. While, TSMC's yields are off by quarters as MP/HVM happened in Q1 2020 w/ hitting peak vol. target in early Q2 2020(the 50K number).

A/// · Aug 29, 2020

Yeah, who would have thought a new process being compared to an older and still active node would be only a fraction of the latter's size. Mind = blown. With AM5 being a new platform offering new tech, don't be surprised if Zen4 launches 16-18 months after Zen 3, falling in line with your own suggestion of 2Q2020.

NostaSeronx · Aug 29, 2020

A/// said:
Yeah, who would have thought a new process being compared to an older and still active node would be only a fraction of the latter's size. Mind = blown. With AM5 being a new platform offering new tech, don't be surprised if Zen4 launches 16-18 months after Zen 3, falling in line with your own suggestion of 2Q2020.

AM5 in my opinion is right there with SP4. It either doesn't exist or doesn't exist in the way people think it does.

SP3 -> SP5 (Server X3D: 8x4 Hi => 256-cores)
AM4 -> AM6 (Desktop X3D: 2x4 Hi => 64-cores)

SP3/AM4 already supports implementing DDR5 with revisions, but they won't be called SP4 or sTRX5/AM5.

5nm 50K != ~25% of 7nm 110K.
5nm hitting 120K in 2021 != 3x what it is hitting now in 2022.

LightningZ71 · Aug 29, 2020

What if the stacking doesn't happen on the CCDs? What if the stacking is on the large IO die? You wouldn't want to stack the SRAM cache on the CCDs as they need to dissipate eat at prodigious rates. However, while the IO die gets warm, it's not a huge energy sink. What if the IO die move to the enhanced 12lpp process with a reduced Z height and instead stacks a large L4 cache on top of it. If the CCDs maintain 32MB of L3, and they stick with eight CCDs max, then it would have to be exclusive 256MB to be of my real use, or inclusive 1GB. Can that much L4 SRAM even fit on a 12lpp chip that size?

soresu · Aug 29, 2020

NostaSeronx said:
SP3 -> SP5 (Server X3D: 8x4 Hi => 256-cores)
AM4 -> AM6 (Desktop X3D: 2x4 Hi => 64-cores)

They would have to achieve an insane leap in power efficiency to stack 4 logic dies of top of each other at desktop frequencies.

Unless the stack is using some kind of interior thermal via like ICEcool the heat build up would just be way too much to handle.

At server frequencies maybe though.

It seems more likely that they would start at 2 high stacks at 5nm anyway to begin with - then 4 high at N3 or more likely N2.

NostaSeronx · Aug 29, 2020

soresu said:
They would have to achieve an insane leap in power efficiency to stack 4 logic dies of top of each other at desktop frequencies.

Unless the stack is using some kind of interior thermal via like ICEcool the heat build up would just be way too much to handle.

At server frequencies maybe though.

It seems more likely that they would start at 2 high stacks at 5nm anyway to begin with - then 4 high at N3 or more likely N2.

Well we are still at least 1.5 years away before the stacked sockets.

Via: https://www.servethehome.com/marvell-seeking-5nm-chip-share/
=> "Marvell says N5 will offer not just a 40% reduction in silicon area, but also a 40% power savings over current-generation N7."
Via: http://www.alchip.com/press-release/alchip-technologies-opens-5nm-asic-design-capabilities/
=> "Still, Alchip expects that 5nm devices will be 52% smaller, 3% faster, yet use only 36% of the power, compared to current 7nm devices."

From 7nm:
Marvell; area (~60/100) and power (~60/100)
Alchips; area (~48/100) and power (~36/100)

Do to the minimal performance gains from FinFETs, didn't AMD shift away from performance gains to power shrinks. Early on in Zen3 speculation days didn't AMD say expect no to modest performance improvement, with all the gains being in power efficiency instead.

Sweet spot on single CCD => 48W on Zen2
48 * (6/10) => 28.8 watts
48 * (36/100) => 17.8 watts
minus whatever changes to make Zen3/Zen4 more efficient.

jamescox · Aug 29, 2020

LightningZ71 said:
What if the stacking doesn't happen on the CCDs? What if the stacking is on the large IO die? You wouldn't want to stack the SRAM cache on the CCDs as they need to dissipate eat at prodigious rates. However, while the IO die gets warm, it's not a huge energy sink. What if the IO die move to the enhanced 12lpp process with a reduced Z height and instead stacks a large L4 cache on top of it. If the CCDs maintain 32MB of L3, and they stick with eight CCDs max, then it would have to be exclusive 256MB to be of my real use, or inclusive 1GB. Can that much L4 SRAM even fit on a 12lpp chip that size?

That is essentially what is meant by an active interposer, but they would probably stack the CPU chiplets on top of the active interposer. A passive interposer (just interconnect) wouldn't save anywhere near as much area. That doesn't mean that they couldn't use die stacks for the the cpu and cache still. I have wondered if it would make sense to have something like a 2 CCX chiplet with 16 cores stacked on top of a cache die that is just the L3 cache and fabric interface. It seems like you would want the cpu die on top for cooling. With a stacked die, you could have a much larger cache and also much shorter connections since the cache would be right under or on top of the cpu cores rather than half way across the chip. I wouldn't think TSVs would incur much of any latency penalty. You could probably do 64 MB per CCX in a very compact die stack. The current Zen 2 CCD is actually around 50% cache, so you would have double the cores and cache in roughly the same footprint. It may also make better use of available process tech since the best process tech for SRAM cache may not be optimal for the cpu cores. Also, such a stacked die wouldn't need to be placed on an interposer. You could make the bottom SRAM die different depending on whether it is meant for an interposer or not.

jamescox · Aug 29, 2020

amd6502 said:
15% increase for the integer core points to significant changes. My guess is core slightly widened core (+1 ALU) among other refinements to match. Hopefully that 15% holds for not only multithread Int increase.

I would actually ecpect most of the improvement to be cache hierarchy re-design.

jamescox · Aug 29, 2020

NostaSeronx said:
Current rumor is ~15% improvement on Int and ~10% on FP and a ~5% improvement on frequency.

Basically, Improved/Enhanced Zen2, rather than Next-Generation Zen3.

So, unless Milan launches as Zen2+ on N7e(N7 -> N7P -> N7e // N7e != N7+/N6). If not then there will be problems.

I haven't had too much time to read the rumors lately. I guess the 50% number is an older rumor. I would expect more than 10% just on cache bandwidth and other improvements, so I would wonder if they are trying to reduce the expectations. People always make a lot of outlandish performance claims. That isn't good since you could come out with a great product but have the perception that it is a disappointment because it did not live up to the false hype.

jamescox · Aug 29, 2020

HurleyBird said:
Icelake-SP should increase that to 42MB. Still, it's difficult to envision many scenarios where 1*42MB will win out over 8*32MB L3.

You could have cases where a huge number of threads are trying to share data. Technically, you could have up to 56 threads with SMT on sharing ~40 MB of cache with intel. With AMD, it will be up to 16 threads with 32 MB. That is a big improvement since current Zen 2 is 8 threads and 16 MB. I am still wondering if AMD is going to pull off some kind of larger cache variant. There is still that one slide were they have "32+ MB L3" for milan. It does look like there would be room for longer chips on the Epyc package.

jamescox · Aug 29, 2020

soresu said:
They would have to achieve an insane leap in power efficiency to stack 4 logic dies of top of each other at desktop frequencies.

Unless the stack is using some kind of interior thermal via like ICEcool the heat build up would just be way too much to handle.

At server frequencies maybe though.

It seems more likely that they would start at 2 high stacks at 5nm anyway to begin with - then 4 high at N3 or more likely N2.

It doesn't seem like they would want to stack multiple logic die at all. Stacked memory die are much more likely since they do not consume as much power. I have been wondering about stacking a cache chip with a cpu die, but sram takes a bit of power. It may be a non-starter, unless they have come up with a way to cool it. There was some AMD patent a while back about using an integrated TEC to cool stacks of memory and logic, but that doesn't seem like it would be very effective. They seem to always have the logic die at the bottom. It seems like it would make more sense to put the logic die on top and the memory on the bottom. I wonder what prevents the memory die from being under the logic die.

moinmoin · Aug 29, 2020

jamescox said:
I have wondered if it would make sense to have something like a 2 CCX chiplet with 16 cores stacked on top of a cache die that is just the L3 cache and fabric interface. It seems like you would want the cpu die on top for cooling. With a stacked die, you could have a much larger cache and also much shorter connections since the cache would be right under or on top of the cpu cores rather than half way across the chip. I wouldn't think TSVs would incur much of any latency penalty. You could probably do 64 MB per CCX in a very compact die stack. The current Zen 2 CCD is actually around 50% cache, so you would have double the cores and cache in roughly the same footprint. It may also make better use of available process tech since the best process tech for SRAM cache may not be optimal for the cpu cores. Also, such a stacked die wouldn't need to be placed on an interposer. You could make the bottom SRAM die different depending on whether it is meant for an interposer or not.

That's be an ingenious approach to increasing the core count further even with N7+ (or whatever node AMD now uses for Zen 3) not significantly increasing density over Zen 2/N7. And as always smaller dies also mean higher yield. Though so far there was no single (or was there?) indication that Zen 3 is actually going to increase the core count.

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Diamond Member

Golden Member

Moderator Emeritus, Elite Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member