Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

nicalandia · Nov 8, 2021

Markfw said:
OMG... I would say that Intel has a real problem in the server area. No wonder Facebook (meta) contracted for a bunch of these.

Direct link to Microsoft Benches

Performance & Scalability of HBv3 VMs with Milan-X CPUs

Article contributed byâ¯Amirreza Rastegari, Jon Shelley, Jithin Jose, Evan Burness, and Aman Verma Â A Preview program for Azure HBv3 VMs enhanced with AMD EPYC..

techcommunity.microsoft.com

uzzi38 · Nov 8, 2021

DisEnchantment said:
Gen-Z is in the PPR, so pretty much confirmed.

CCIX lost when AMD joined CXL and CXL and Gen-Z agreed on MoU

EFB is inside a fanout package.
Some folks are saying the EFB bridges are derivatives of ASE/SPIL FOEB bridges. They look nothing like TSMC LSI bridges (which is polymer and RDL interconnect layers with the bridge), which brings more context when AMD said they invested heavily in bringing up the packaging supply chain (from last ER Q&A).
That would greatly help on cost and capacity.

Turns out there's a really nice comparison chart of FOEB vs EMIB from this article:

SPIL Fan-out Embedded Bridge (FOEB) Technology - 3D InCites

A closer look at SPILs Scalable chiplet package using FOEB package for the server, high-performance computing, router, and switcher markets.

www.3dincites.com

moinmoin · Nov 8, 2021

Markfw said:
OMG... I would say that Intel has a real problem in the server area.

Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.

Markfw · Nov 8, 2021

moinmoin said:
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.

And since Ryzen is derived from EPYC (my opinion) Desptop will see Alder lake crushed shortly....Maybe not Zen3d, but certainly with Zen4

nicalandia · Nov 8, 2021

Markfw said:
And since Ryzen is derived from EPYC (my opinion)

How is that even an opinion or question? AMD CCDs are so modular that it's basically the same from Ryzen Desktop(not including the monolithic APUs) to EPYC, they are identical in design(Top Binned CCDs go to EPYC).

DisEnchantment · Nov 8, 2021

Hello..., what is this strip running from end to end on top of the the substrate?
I bet the renders are accurate since the package is more or less decided since samples are already shipped to folks.
This strange outline and the placement of SMDs is hinting this is not the same old Milan approach.

Not proportional because I changed the 3D perspective with GIMP.

Mopetar · Nov 8, 2021

tamz_msc said:
Still some info would have been nice. Workloads that see benefit from AVX-512 are more bound by system memory BW rather than cache, so Milan-X and Genoa/Bergamo are for different target applications.

I think they easily could have and should have announced it. One can always say they have something for another future event, but they don't lose anything announcing now. Anyone who really cares about AVX-512 right now isn't buying AMD anyway since they don't have it. The only company they could hurt by announcing support now would be Intel. Of course after everything they did announce maybe AMD just didn't feel like slipping the boot in.

DisEnchantment · Nov 8, 2021

Mopetar said:
I think they easily could have and should have announced it. One can always say they have something for another future event, but they don't lose anything announcing now. Anyone who really cares about AVX-512 right now isn't buying AMD anyway since they don't have it. The only company they could hurt by announcing support now would be Intel. Of course after everything they did announce maybe AMD just didn't feel like slipping the boot in.

I doubt AVX512 is any more important from an ISA perspective than the new SEV extensions (which they collectively called Infinity Guard)
SEV is actively used by Google, Microsoft, TenCent, others and any major updates are worth a mention (more than AVX512) but they didn't because there will be a time for that.

Ajay · Nov 8, 2021

DisEnchantment said:
I doubt AVX512 is any more important from an ISA perspective than the new SEV extensions (which they collectively called Infinity Guard)
SEV is actively used by Google, Microsoft, TenCent, others and any major updates are worth a mention (more than AVX512) but they didn't because there will be a time for that.

And all the large cloud customers certainly know more than us already.

tamz_msc · Nov 8, 2021

DisEnchantment said:
I doubt AVX512 is any more important from an ISA perspective than the new SEV extensions (which they collectively called Infinity Guard)
SEV is actively used by Google, Microsoft, TenCent, others and any major updates are worth a mention (more than AVX512) but they didn't because there will be a time for that.

SEV has major vulnerabilities because it leverages the PSP, which has been shown to be vulnerable to voltage-glitching attacks.

jamescox · Nov 8, 2021

BorisTheBlade82 said:
Yes, for sure. 8c per CCD have been pretty sure for a long time.
But regarding Bergamo: 8 CCD x 16c (2 CCX)?

I fear this is still only for GPUs. Also there is no way how the Milan 12CCD unit could be connected via silicon bridges - geometrically speaking 😉
Or were talking about Desktop/Mobile SKU?

First thought is two 8 core CCX on one CCD for Bergamo, but that doesn’t increase the number of cores per L3 cache. A 16-core CCX for Bergamo seems odd also. I am still wondering if it is 2 to 4 cores sharing larger L2 cache with a single 16 MB L3. They could go to a 4 core cluster, just at the L2 level. This would fit with the cache hierarchy statements.

I am also wondering how they will connect to the IO die. Are the zen 4 CCDs going to have more / wider interconnect to the IO die? For 96 cores with 8 CCD, they need at least 12 links, which isn’t divisible by 8 Bergamo CCD. 24 links would be though, so perhaps regular 8 core CCD have 2 links and Bergamo CCD have 3 links. More links means they can operate at lower clock / lower power. If they have less FP performance then their bandwidth requirements might be lower, but there is 2x the number of cores per ccd.

I have wondered for quite some time if they would branch the architectures since huge numbers of server applications have little to no use for massive (in die area and power) floating point units, so I have been expecting them to do some kind of cut down core. Not all servers are HPC machines. A similar argument holds for mobile, where massive FP units in the CPU are likely a waste of die area for most applications. If they can use super high density libraries, like what is used for the stacked cache chip, then perhaps they will not even take that big of a hit on cache size.

jamescox · Nov 8, 2021

DisEnchantment said:
View attachment 52649

Hello..., what is this strip running from end to end on top of the the substrate?
I bet the renders are accurate since the package is more or less decided since samples are already shipped to folks.
This strange outline and the placement of SMDs is hinting this is not the same old Milan approach.

Not proportional because I changed the 3D perspective with GIMP.

Is this the same image as what is behind Lisa Su in the article here:

AMD Bergamo to hit 128 Cores and Genoa at 96 Cores

AMD Bergamo with up to 128 Zen 4c cores and AMD Genoa with up to 96 Zen 4 cores on track for AMD's next-generation of EPYC server CPUs

www.servethehome.com

These do not look like Bergamo. It appears to have 12 CCD, which would be Genoa. Compared to Rome / Milan, this appears to have some surface mount capacitors in the middle, between the cpu chiplets in addition the those along the top and bottom. Bergamo should have 8 CCD only (for some reason).

There could be some surprise with that though. It isn’t coming out for a while, so perhaps it actually uses embedded silicon interconnect. Fitting 8 die close enough to the IO die for embedded silicon interconnects seems like it would be difficult, but it isn’t impossible given the dimensions of the normal Zen 4 CCD. Given the supposedly leaked specs, the IO die is 24.79 x 16 mm. The Zen 4 / Genoa CCD is 10.7 x 6.75. 24.79 divided by 4 is about 6.2 mm, so it isn’t that much of a stretch that they could put 4 die along each side, directly adjacent to the IO die with a slightly smaller or differently shaped die, or a larger IO die.

Another possibility is that the Bergamo CCD has little to no L3 cache and the IO die has the L3 or L4 cache. It might be made on 6 nm, so having large caches is plausible, like the 128 MB infinity caches on GPUs. It will need to be a different version of the IO die to use embedded silicon bridges (of some kind; I can’t keep the names straight), but that would fit in with the lower power usage and extreme density. The penalty for going to the IO would be much lower than with serdes based solutions. It might also be lower latency making a somewhat monolithic last level cache reasonable. The IO die might be of similar size, even with the cache. If they don’t have any serdes for the IFOP connections, that would likely save a lot of die area and power that could be used for caches.

eek2121 · Nov 8, 2021

leoneazzurro said:
And these instructions are quite probably disabled and reserved for Sapphire Rapids as for distributing properly the workload the core capability must be the same:

Intel Architecture Day 2021: Alder Lake, Golden Cove, and Gracemont Detailed

www.anandtech.com

And in fact, for activating the AVX512 support with Alder Lake, the E-cores had to be disabled in the AT test.
Look, I am not saying that for sure Bergamo will not have AVX512, but it is very unlikely that a dense design which is cloud optimized, uses such a area/power Hungry feature which is basically unused in the target workload those CPUs should be optimized for.

Are you implying it will be added in Genoa only to be removed in Bergamo? We know Genoa has AVX-512 support.

I suspect the smaller Zen4 cores in Bergamo will either have a much smaller L2 and larger L3, or they will strip down some of the cores, and use the neat little trick described in their “big.little” patent for the instructions the small cores don’t support. I seriously doubt the chip won’t support AVX-512 at all. Bergamo lands around the time Intel caches up on process, assuming no delays.

moinmoin said:
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.

Depends on whether the performance increase carries across to other workloads, of course.

As far as SPR vs. Milan, SPR is definitely competitive on the IPC front, we know that already. Intel allows up to 4S, so technically Intel wins at core density as well. In the end it will come down to clocks and power consumption.

It seems like Milan-X may stomp SPR, however. We will see. I want to see general workloads. Database transactions, web service benchmarks, etc.

Zucker2k · Nov 9, 2021

Abwx said:
Some infos :

Microsoft has issued documentation for the Milan-X HBv3 VMs with the following performance projections and VM size details and technical overview:

Up to 80% higher performance for CFD workloads

Up to 60% higher performance for EDA RTL simulation workloads

Up to 50% higher performance for explicit finite element analysis workloads

Up to 120 AMD EPYC 7V73X CPU cores (EPYC with 3D V-cache, “Milan-X”)

Up to 96 MB L3 cache per core (3x larger than standard Milan CPUs, and 6x larger than “Rome” CPUs)

350 GB/s DRAM bandwidth (STREAM TRIAD), up to 1.8x amplification (~630 GB/s effective bandwidth)

448 GB RAM

200 Gbps HDR InfiniBand (SRIOV), Mellanox ConnectX-6 NIC with Adaptive Routing

2 x 900 GB NVMe SSD (3.5 GB/s (reads) and 1.5 GB/s (writes) per SSD, large block IO)

AMD's EPYC Milan-X is Official: 3D V-Cache Brings Up To 768MB of L3 Cache, 64 Cores (Updated)

L3 Cache taken to the extreme

www.tomshardware.com

Oh yeah, Intel is in trouble alright. AMD is going for the jugular here, and it'll be interesting to see how Intel responds.

This is a giant stride in computing. Kudos to AMD for being bullish with the way they keep pushing chip development on x86. Simply stupendous!

Edit: @Markfw what don't you like about my post?

Arkaign · Nov 9, 2021

Zucker2k said:
Oh yeah, Intel is in trouble alright. AMD is going for the jugular here, and it'll be interesting to see how Intel responds.

This is a giant stride in computing. Kudos to AMD for being bullish with the way they keep pushing chip development on x86. Simply stupendous!

The big thing that has saved Intel considering how much better the Zen and Epyc families has been on so many levels vs the competition is volume from TSMC. Especially with Apple, GPUs etc competing for limited capacity, it's really bottlenecking the potential sales in a huge way.

Exciting times, Zen4 should be an absolute beast. Any idea when it will release?

uzzi38 · Nov 9, 2021

Arkaign said:
The big thing that has saved Intel considering how much better the Zen and Epyc families has been on so many levels vs the competition is volume from TSMC. Especially with Apple, GPUs etc competing for limited capacity, it's really bottlenecking the potential sales in a huge way.

Exciting times, Zen4 should be an absolute beast. Any idea when it will release?

In 2022. Genoa is already sampling according to AMD.

beginner99 · Nov 9, 2021

eek2121 said:
Are you implying it will be added in Genoa only to be removed in Bergamo? We know Genoa has AVX-512 support.

Came here to suggest this. No one need AVX-512 for most cloud deployments so why not simply get rid of it and save die space and power? Hosting of web applications doesn't even need AVX2 really so they could even reduce on that on top.

Yes you can do data science / compute tasks on the cloud but then just offer a intel machine or genoa for that. This is rather the exception than the rule.

Joe NYC · Nov 9, 2021

BorisTheBlade82 said:
I fear this is still only for GPUs. Also there is no way how the Milan 12CCD unit could be connected via silicon bridges - geometrically speaking 😉
Or were talking about Desktop/Mobile SKU?

I agree. Looking at the picture of Genoa and arrangement of the chiplets, in the presentation, there is no way the chiplets could be connected to IOD via bridges.

Maybe in Zen 5

uzzi38 · Nov 9, 2021

Joe NYC said:
I agree. Looking at the picture of Genoa and arrangement of the chiplets, in the presentation, there is no way the chiplets could be connected to IOD via bridges.

Maybe in Zen 5

I don't. I think @DisEnchantment is absolutely on the ball in pointing out the strip on the Genoa diagram running under the surface of each package.

Joe NYC · Nov 9, 2021

moinmoin said:
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.

I don't think they are having a party at Ampere either, when they realized that AMD turned the Bergamo ship, and the guns are pointed directly at them...

Timorous · Nov 9, 2021

Arkaign said:
The big thing that has saved Intel considering how much better the Zen and Epyc families has been on so many levels vs the competition is volume from TSMC. Especially with Apple, GPUs etc competing for limited capacity, it's really bottlenecking the potential sales in a huge way.

Exciting times, Zen4 should be an absolute beast. Any idea when it will release?

The possibility of having Milan - X for workloads that love cache and Genoa for more general workloads on different nodes is really going to help AMDs supply situation.

Joe NYC · Nov 9, 2021

Markfw said:
And since Ryzen is derived from EPYC (my opinion) Desptop will see Alder lake crushed shortly....Maybe not Zen3d, but certainly with Zen4

It seemed that Microsoft was able to secure most of the early Zen3D production, to be able to make this big splash with the announcement (and immediate availability of some Milan X based VMs),

Desktop was put on the sideline. We will see if it was a good decision on part of AMD. Probably a good short term financial decision, but losing some of the focus on desktop may have a cost...

gdansk · Nov 9, 2021

Timorous said:
The possibility of having Milan - X for workloads that love cache and Genoa for more general workloads on different nodes is really going to help AMDs supply situation.

One wonders, however, if there will be a "Genoa X" at some future date. I suppose that would come after 5nm supply improves. Or perhaps they can stack 7/6nm cache on 5nm CCD?

Joe NYC · Nov 9, 2021

uzzi38 said:
I don't. I think @DisEnchantment is absolutely on the ball in pointing out the strip on the Genoa diagram running under the surface of each package.

I got to that post by @DisEnchantment after I posted mine. Sounds intriguing, I did not know it was a possibility for EFB to span longer distance and possibly several chips ...

It would be fantastic if it was possible.

DisEnchantment · Nov 9, 2021

beginner99 said:
Came here to suggest this. No one need AVX-512 for most cloud deployments so why not simply get rid of it and save die space and power? Hosting of web applications doesn't even need AVX2 really so they could even reduce on that on top.

Yes you can do data science / compute tasks on the cloud but then just offer a intel machine or genoa for that. This is rather the exception than the rule.

Another way to look at this is that AMD did not take the retarded approach when implementing AVX512.
They probably used multiple cycles within each FMA pipe or fused two pipes to do it with minimal transistor cost.
This sounds more like an approach they would rather take. Same story like Zen1 when supporting AVX2. You could see the fp blocks in Zen1 are tiny compared to Zen2 which added 256bit FMA pipes
So they did not introduce bloat in the design but at the same time did not fragment the ISA support across the SKUs which makes sense.
From software point of view, it is full featured x64-v4 which is what most cloud vendors could optimize their distros for.
Kind of reminds me what Clark said about making CPUs for software of the future, and the future of x86 software is going to be targetted for x64-v4

Anyway, now that they let the cat out, linux patches can start coming in.

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Diamond Member

Attachments

Platinum Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Golden Member

Lifer

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Platinum Member

Platinum Member

Platinum Member

Golden Member