Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 112 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
OMG... I would say that Intel has a real problem in the server area. No wonder Facebook (meta) contracted for a bunch of these.

Direct link to Microsoft Benches



 

Attachments

  • 1636407791794.png
    9.5 KB · Views: 16
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146
Gen-Z is in the PPR, so pretty much confirmed.


CCIX lost when AMD joined CXL and CXL and Gen-Z agreed on MoU


EFB is inside a fanout package.
Some folks are saying the EFB bridges are derivatives of ASE/SPIL FOEB bridges. They look nothing like TSMC LSI bridges (which is polymer and RDL interconnect layers with the bridge), which brings more context when AMD said they invested heavily in bringing up the packaging supply chain (from last ER Q&A).
That would greatly help on cost and capacity.

Turns out there's a really nice comparison chart of FOEB vs EMIB from this article:

 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
OMG... I would say that Intel has a real problem in the server area.
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,758
14,785
136
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.
And since Ryzen is derived from EPYC (my opinion) Desptop will see Alder lake crushed shortly....Maybe not Zen3d, but certainly with Zen4
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136


Hello..., what is this strip running from end to end on top of the the substrate?
I bet the renders are accurate since the package is more or less decided since samples are already shipped to folks.
This strange outline and the placement of SMDs is hinting this is not the same old Milan approach.

Not proportional because I changed the 3D perspective with GIMP.
 

Mopetar

Diamond Member
Jan 31, 2011
8,021
6,473
136
Still some info would have been nice. Workloads that see benefit from AVX-512 are more bound by system memory BW rather than cache, so Milan-X and Genoa/Bergamo are for different target applications.

I think they easily could have and should have announced it. One can always say they have something for another future event, but they don't lose anything announcing now. Anyone who really cares about AVX-512 right now isn't buying AMD anyway since they don't have it. The only company they could hurt by announcing support now would be Intel. Of course after everything they did announce maybe AMD just didn't feel like slipping the boot in.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
I think they easily could have and should have announced it. One can always say they have something for another future event, but they don't lose anything announcing now. Anyone who really cares about AVX-512 right now isn't buying AMD anyway since they don't have it. The only company they could hurt by announcing support now would be Intel. Of course after everything they did announce maybe AMD just didn't feel like slipping the boot in.
I doubt AVX512 is any more important from an ISA perspective than the new SEV extensions (which they collectively called Infinity Guard)
SEV is actively used by Google, Microsoft, TenCent, others and any major updates are worth a mention (more than AVX512) but they didn't because there will be a time for that.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
I doubt AVX512 is any more important from an ISA perspective than the new SEV extensions (which they collectively called Infinity Guard)
SEV is actively used by Google, Microsoft, TenCent, others and any major updates are worth a mention (more than AVX512) but they didn't because there will be a time for that.
And all the large cloud customers certainly know more than us already.
 
Reactions: prtskg

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
I doubt AVX512 is any more important from an ISA perspective than the new SEV extensions (which they collectively called Infinity Guard)
SEV is actively used by Google, Microsoft, TenCent, others and any major updates are worth a mention (more than AVX512) but they didn't because there will be a time for that.
SEV has major vulnerabilities because it leverages the PSP, which has been shown to be vulnerable to voltage-glitching attacks.
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
Yes, for sure. 8c per CCD have been pretty sure for a long time.
But regarding Bergamo: 8 CCD x 16c (2 CCX)?


I fear this is still only for GPUs. Also there is no way how the Milan 12CCD unit could be connected via silicon bridges - geometrically speaking 😉
Or were talking about Desktop/Mobile SKU?

First thought is two 8 core CCX on one CCD for Bergamo, but that doesn’t increase the number of cores per L3 cache. A 16-core CCX for Bergamo seems odd also. I am still wondering if it is 2 to 4 cores sharing larger L2 cache with a single 16 MB L3. They could go to a 4 core cluster, just at the L2 level. This would fit with the cache hierarchy statements.

I am also wondering how they will connect to the IO die. Are the zen 4 CCDs going to have more / wider interconnect to the IO die? For 96 cores with 8 CCD, they need at least 12 links, which isn’t divisible by 8 Bergamo CCD. 24 links would be though, so perhaps regular 8 core CCD have 2 links and Bergamo CCD have 3 links. More links means they can operate at lower clock / lower power. If they have less FP performance then their bandwidth requirements might be lower, but there is 2x the number of cores per ccd.

I have wondered for quite some time if they would branch the architectures since huge numbers of server applications have little to no use for massive (in die area and power) floating point units, so I have been expecting them to do some kind of cut down core. Not all servers are HPC machines. A similar argument holds for mobile, where massive FP units in the CPU are likely a waste of die area for most applications. If they can use super high density libraries, like what is used for the stacked cache chip, then perhaps they will not even take that big of a hit on cache size.
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
View attachment 52649

Hello..., what is this strip running from end to end on top of the the substrate?
I bet the renders are accurate since the package is more or less decided since samples are already shipped to folks.
This strange outline and the placement of SMDs is hinting this is not the same old Milan approach.

Not proportional because I changed the 3D perspective with GIMP.

Is this the same image as what is behind Lisa Su in the article here:


These do not look like Bergamo. It appears to have 12 CCD, which would be Genoa. Compared to Rome / Milan, this appears to have some surface mount capacitors in the middle, between the cpu chiplets in addition the those along the top and bottom. Bergamo should have 8 CCD only (for some reason).

There could be some surprise with that though. It isn’t coming out for a while, so perhaps it actually uses embedded silicon interconnect. Fitting 8 die close enough to the IO die for embedded silicon interconnects seems like it would be difficult, but it isn’t impossible given the dimensions of the normal Zen 4 CCD. Given the supposedly leaked specs, the IO die is 24.79 x 16 mm. The Zen 4 / Genoa CCD is 10.7 x 6.75. 24.79 divided by 4 is about 6.2 mm, so it isn’t that much of a stretch that they could put 4 die along each side, directly adjacent to the IO die with a slightly smaller or differently shaped die, or a larger IO die.

Another possibility is that the Bergamo CCD has little to no L3 cache and the IO die has the L3 or L4 cache. It might be made on 6 nm, so having large caches is plausible, like the 128 MB infinity caches on GPUs. It will need to be a different version of the IO die to use embedded silicon bridges (of some kind; I can’t keep the names straight), but that would fit in with the lower power usage and extreme density. The penalty for going to the IO would be much lower than with serdes based solutions. It might also be lower latency making a somewhat monolithic last level cache reasonable. The IO die might be of similar size, even with the cache. If they don’t have any serdes for the IFOP connections, that would likely save a lot of die area and power that could be used for caches.
 

eek2121

Diamond Member
Aug 2, 2005
3,053
4,281
136
And these instructions are quite probably disabled and reserved for Sapphire Rapids as for distributing properly the workload the core capability must be the same:


And in fact, for activating the AVX512 support with Alder Lake, the E-cores had to be disabled in the AT test.
Look, I am not saying that for sure Bergamo will not have AVX512, but it is very unlikely that a dense design which is cloud optimized, uses such a area/power Hungry feature which is basically unused in the target workload those CPUs should be optimized for.
Are you implying it will be added in Genoa only to be removed in Bergamo? We know Genoa has AVX-512 support.

I suspect the smaller Zen4 cores in Bergamo will either have a much smaller L2 and larger L3, or they will strip down some of the cores, and use the neat little trick described in their “big.little” patent for the instructions the small cores don’t support. I seriously doubt the chip won’t support AVX-512 at all. Bergamo lands around the time Intel caches up on process, assuming no delays.
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.
Depends on whether the performance increase carries across to other workloads, of course.

As far as SPR vs. Milan, SPR is definitely competitive on the IPC front, we know that already. Intel allows up to 4S, so technically Intel wins at core density as well. In the end it will come down to clocks and power consumption.

It seems like Milan-X may stomp SPR, however. We will see. I want to see general workloads. Database transactions, web service benchmarks, etc.
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
Some infos :

Microsoft has issued documentation for the Milan-X HBv3 VMs with the following performance projections and VM size details and technical overview:

  • Up to 80% higher performance for CFD workloads
  • Up to 60% higher performance for EDA RTL simulation workloads
  • Up to 50% higher performance for explicit finite element analysis workloads
  • Up to 120 AMD EPYC 7V73X CPU cores (EPYC with 3D V-cache, “Milan-X”)
  • Up to 96 MB L3 cache per core (3x larger than standard Milan CPUs, and 6x larger than “Rome” CPUs)
  • 350 GB/s DRAM bandwidth (STREAM TRIAD), up to 1.8x amplification (~630 GB/s effective bandwidth)
  • 448 GB RAM
  • 200 Gbps HDR InfiniBand (SRIOV), Mellanox ConnectX-6 NIC with Adaptive Routing
  • 2 x 900 GB NVMe SSD (3.5 GB/s (reads) and 1.5 GB/s (writes) per SSD, large block IO)

Oh yeah, Intel is in trouble alright. AMD is going for the jugular here, and it'll be interesting to see how Intel responds.

This is a giant stride in computing. Kudos to AMD for being bullish with the way they keep pushing chip development on x86. Simply stupendous!

Edit: @Markfw what don't you like about my post?
 
Last edited:

Arkaign

Lifer
Oct 27, 2006
20,736
1,377
126
Oh yeah, Intel is in trouble alright. AMD is going for the jugular here, and it'll be interesting to see how Intel responds.

This is a giant stride in computing. Kudos to AMD for being bullish with the way they keep pushing chip development on x86. Simply stupendous!

The big thing that has saved Intel considering how much better the Zen and Epyc families has been on so many levels vs the competition is volume from TSMC. Especially with Apple, GPUs etc competing for limited capacity, it's really bottlenecking the potential sales in a huge way.

Exciting times, Zen4 should be an absolute beast. Any idea when it will release?
 

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146
The big thing that has saved Intel considering how much better the Zen and Epyc families has been on so many levels vs the competition is volume from TSMC. Especially with Apple, GPUs etc competing for limited capacity, it's really bottlenecking the potential sales in a huge way.

Exciting times, Zen4 should be an absolute beast. Any idea when it will release?
In 2022. Genoa is already sampling according to AMD.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
Are you implying it will be added in Genoa only to be removed in Bergamo? We know Genoa has AVX-512 support.
Came here to suggest this. No one need AVX-512 for most cloud deployments so why not simply get rid of it and save die space and power? Hosting of web applications doesn't even need AVX2 really so they could even reduce on that on top.

Yes you can do data science / compute tasks on the cloud but then just offer a intel machine or genoa for that. This is rather the exception than the rule.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
I fear this is still only for GPUs. Also there is no way how the Milan 12CCD unit could be connected via silicon bridges - geometrically speaking 😉
Or were talking about Desktop/Mobile SKU?

I agree. Looking at the picture of Genoa and arrangement of the chiplets, in the presentation, there is no way the chiplets could be connected to IOD via bridges.

Maybe in Zen 5
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
Yeah, Intel had a problem before already that just got much bigger. This event kind of was a low blow against them: AMD now added not one, not two, but three new server lines (for Intel not) to look forward to, all while Intel is already down and Sapphire Rapids may not look that good against the current Milan to begin with, never mind Milan-X, Genoa and Bergamo.

I don't think they are having a party at Ampere either, when they realized that AMD turned the Bergamo ship, and the guns are pointed directly at them...
 
Reactions: scineram

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
The big thing that has saved Intel considering how much better the Zen and Epyc families has been on so many levels vs the competition is volume from TSMC. Especially with Apple, GPUs etc competing for limited capacity, it's really bottlenecking the potential sales in a huge way.

Exciting times, Zen4 should be an absolute beast. Any idea when it will release?

The possibility of having Milan - X for workloads that love cache and Genoa for more general workloads on different nodes is really going to help AMDs supply situation.
 
Reactions: Tlh97 and Arkaign

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
And since Ryzen is derived from EPYC (my opinion) Desptop will see Alder lake crushed shortly....Maybe not Zen3d, but certainly with Zen4

It seemed that Microsoft was able to secure most of the early Zen3D production, to be able to make this big splash with the announcement (and immediate availability of some Milan X based VMs),

Desktop was put on the sideline. We will see if it was a good decision on part of AMD. Probably a good short term financial decision, but losing some of the focus on desktop may have a cost...
 
Reactions: Drazick

gdansk

Platinum Member
Feb 8, 2011
2,492
3,396
136
The possibility of having Milan - X for workloads that love cache and Genoa for more general workloads on different nodes is really going to help AMDs supply situation.
One wonders, however, if there will be a "Genoa X" at some future date. I suppose that would come after 5nm supply improves. Or perhaps they can stack 7/6nm cache on 5nm CCD?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
Came here to suggest this. No one need AVX-512 for most cloud deployments so why not simply get rid of it and save die space and power? Hosting of web applications doesn't even need AVX2 really so they could even reduce on that on top.

Yes you can do data science / compute tasks on the cloud but then just offer a intel machine or genoa for that. This is rather the exception than the rule.
Another way to look at this is that AMD did not take the retarded approach when implementing AVX512.
They probably used multiple cycles within each FMA pipe or fused two pipes to do it with minimal transistor cost.
This sounds more like an approach they would rather take. Same story like Zen1 when supporting AVX2. You could see the fp blocks in Zen1 are tiny compared to Zen2 which added 256bit FMA pipes
So they did not introduce bloat in the design but at the same time did not fragment the ISA support across the SKUs which makes sense.
From software point of view, it is full featured x64-v4 which is what most cloud vendors could optimize their distros for.
Kind of reminds me what Clark said about making CPUs for software of the future, and the future of x86 software is going to be targetted for x64-v4

Anyway, now that they let the cat out, linux patches can start coming in.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |