Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 351 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

inf64

Diamond Member
Mar 11, 2011
3,759
4,212
136
Regarding the 7600X vs 5800X3D gaming performance, I will leave just these two images here:





The above implies that 7600X will at least tie or slightly edge out the 5800X3D in games, which is expected. This also means that even the lowest tier Zen 4 with Vcache (7800X3D?) will be in a different league of performance versus any other AMD (non Vcache) and intel part (yes, Raptor Lake included).
 

Attachments

  • 1662136347778.png
    182.3 KB · Views: 11
  • 1662136497994.png
    195.7 KB · Views: 11

Saylick

Diamond Member
Sep 10, 2012
3,372
7,104
136
Going off by Locuza's diagram and what AMD might pull off in regards of Zen4c FP Register size(compact FPU as found on PS5) This is how smaller the Zen4c will be compared to Zen4

View attachment 67019

View attachment 67021

I just don't see how AMD will be able to Halve the Zen4 CCD
If I remember correctly, Zen 4c has similar IPC to Zen 3 so I would not be surprised if Zen 4 gets some of the enhancements scaled back, e.g. smaller mop cache. Not sure if Zen 4c is going to be literally Zen 3 but with a new name, but there's also probably going to be more extensive use of higher density libraries to get the area back down. Per leak, Bergamo clocks only in the low 2 GHz range, so there's really no penalty for just blatantly using ultra-high density libraries as much as possible.
 
Reactions: Tlh97

ryanjagtap

Member
Sep 25, 2021
110
132
96
Going off by Locuza's diagram and what AMD might pull off in regards of Zen4c FP Register size(compact FPU as found on PS5) This is how smaller the Zen4c will be compared to Zen4

View attachment 67019

View attachment 67021

I just don't see how AMD will be able to Halve the Zen4 CCD
What if they exclude the L3$ altogether and have a shared L2 instead? I've heard rumors that the zen 4C CCD would be like Matisse with dual CCX. Would a design like dual 8 core complex with shared 8MB L2$ would be feasible? [I don't know much about designing et al just wanted to throw my thoughts into this, because like it or not L3$ takes up almost 50% of the die area. Sorry if I'm wrong or what I'm suggesting is ridiculous in advance.]
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
If I remember correctly, Zen 4c has similar IPC to Zen 3 so I would not be surprised if Zen 4 gets some of the enhancements scaled back, e.g. smaller mop cache. Not sure if Zen 4c is going to be literally Zen 3 but with a new name
I have not seen that report(Zen4c Having similar IPC than Zen3), as far as I am aware Zen4c is ISA equivalent than Zen4(AVX512, BF16, 1MiB per core) but much more compact. we have seen how dense they can make the 3D Cache layer( twice as dense than the same sized found on Zen3) perhaps they are applying that to the core? meaning in the area of 8MiB they do 16MiB

My Diagram of Zen4c using regular 5nm libraries with just half L3$ and Compact FPU and the one below is Super Dense SDRAM Libraries. is still not half but close to it.

 
Reactions: Kaluan

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
What if they exclude the L3$ altogether and have a shared L2 instead? I've heard rumors that the zen 4C CCD would be like Matisse with dual CCX. Would a design like dual 8 core complex with shared 8MB L2$ would be feasible? [I don't know much about designing et al just wanted to throw my thoughts into this, because like it or not L3$ takes up almost 50% of the die area. Sorry if I'm wrong or what I'm suggesting is ridiculous in advance.]
Doing away with the L3$ all together is not feasible, a big chunk of IPC is found there.

It would be more feasible to do is to use the same dense libraries found on the 3D Cache layer(twice as dense).
 
Reactions: ryanjagtap

Saylick

Diamond Member
Sep 10, 2012
3,372
7,104
136
I have not seen that report(Zen4c Having similar IPC than Zen3)
Here, I'll snipped a quote from Charlie at Semiaccurate (behind paywall) to elaborate. You're right, I misread his reporting.
The layout for Bergamo is simple, take the sIOD from Genoa and put on eight Bergamo CCDs and you have a very different device from it’s bigger brother. Each CCD has 16 cores rather than the 8 cores of a Genoa CCD but the L3 cache is the same 32MB on both. Interestingly, Bergamo CCDs take a page from the Naples CCD and split each physical die into two logical CCXs each with 8C and 16MB of L3. It wasn’t explicitly stated in the AMD OEM briefings but it is highly likely that the CCXs will not be able to talk to each other directly but instead communicate by going back to the sIOD. This strongly implies a split xGMI3 link like Naples and since it worked decently there, it should perform well enough on Bergamo.

The cores are the same ISA so everything that Genoa has, Bergamo has, and the sIOD is the same so the platform is the same SP5 base as well.

...

Now we come to the interesting data, performance. AMD is making some amazingly bold claims about Bergamo, they outright state that it, “Targets up to 2X improvement over Milan 7763 across ALL key foundational workloads”, their caps. For reference that 7763 is the top Milan SKU with 64C at 2.45/3.5GHz and pulls 280W TDP.

...

Slightly more nuanced is that if the Zen4 core has an IPC gain over the Zen3 core, and it does, that will allow AMD to drop clocks vs Milan at the same performance level and save even more power in the real world. This can be clawed back by higher performance or seen as lowered TCO for customers.

...

That jaw stuff comes when you turn off threading or what AMD calls SMT Off mode. In this mode, Bergamo has 2x the cores as Milan but the same thread count, not exactly impressive. That said AMD claims it will beat a 64C/128T 7763 by *60%*! Let me repeat that, a Bergamo running in SMT off mode will have the same thread count as the current top Milan but be overall 60% faster.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Here, I'll snipped a quote from Charlie at Semiaccurate (behind paywall) to elaborate. You're right, I misread his reporting.
Thanks for sharing. That is an Impressive performance boost

This is the Smallest I could make the 16C/32T CCD, taking out TSV, using super dense L3$ SDRAM Libraries(as found on 3D V cache parts) which is twice as dense as the HP libraries.

8C/32T Zen4 32MiB L3$ at the Top

16C/32T 16MiB Super Dense L3$ and compatct FPU at bottom
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,042
4,258
136
Going off by Locuza's diagram and what AMD might pull off in regards of Zen4c FP Register size(compact FPU as found on PS5) This is how smaller the Zen4c will be compared to Zen4

View attachment 67019

View attachment 67021

I just don't see how AMD will be able to Halve the Zen4 CCD

They could be simply eliminating all the dark area that allows desktop parts to scale to a high frequency. They wouldn’t need that for a server part. Between that and maybe switching to a density optimized library maybe?

I am actually extremely interested in this as well. It is pretty impressive. I would really like to see a part where they stack all the L3. Such a design could save a ton of space as well.
 

LightningZ71

Golden Member
Mar 10, 2017
1,657
1,939
136
Wait, I'm not following something here. Why does the Bergamo CCX have to be half the size of the Genoa CCX? In the Genoa package, there are (up to) 12 CCDs (allowing 96 cores max), each with an 8 core CCX. That gives you essentially three "units" of space in each quadrant of the processor. To have an additional 8 core CCX, you only need to shrink the existing CCXs by 25%. That gives you .75 X 4, or roughly three units of space taken up. In addition, there is the space that's "wasted" by the edge of each CCD package and the air gap between them. That gets reclaimed a bit by only having two CCDs per quadrant. Are we expecting Bergamo to use the exact same package as Genoa? I hardly think that that makes sense given the link requirements of the different CCD layouts. It is still possible I guess, just that the two CCDs in Bergamo would have to use the inner and outer CCD positions from Genoa's setup and let the extra CCD size take up space in the middle position without using the contact pad. This would limit their link throughput to the IOD, but, maybe it's not that big of a deal in that market?

Just halving the L3 and opting for the more compact FPU design that reduces it's throughput somewhat is more than enough to make Bergamo possible. You don't even have to switch to denser libraries.

I'm also not astounded by Bergamo with SMT off having a bunch more throughput than Milan. Why wouldn't that be expected? In Milan, you have 64 cores with 128 threads. AMD's implementation of SMT has been shown to give about 20% more performance over having it turned off, load depending of course. Having twice as many physical cores of a newer design SHOULD be expected to produce significantly higher throughput numbers than a lower number of physical cores with SMT enabled. Yes, the smaller L3 can be a situational hit, but, having the larger L2 can certainly help with some of that pain. I would actually be rather disappointed if it didn't achieve at the very least 50% more performance than the comparable Milan with half the cores. With SMT on, bergamo should be able to just about double Milan (160% * 1.2 should be in the 192% range).

None of that seems outlandish...
 
Reactions: Tlh97 and Vattila

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Wait, I'm not following something here. Why does the Bergamo CCX have to be half the size of the Genoa CCX? In the Genoa package, there are (up to) 12 CCDs (allowing 96 cores max), each with an 8 core CCX. That gives you essentially three "units" of space in each quadrant of the processor. To have an additional 8 core CCX, you only need to shrink the existing CCXs by 25%. That gives you .75 X 4, or roughly three units of space taken up. In addition, there is the space that's "wasted" by the edge of each CCD package and the air gap between them. That gets reclaimed a bit by only having two CCDs per quadrant.
I just don't see the Room on the CPU Package with the 4 quadrants.. Let me find the data again.
 

SteinFG

Senior member
Dec 29, 2021
517
608
106
I have not seen that report(Zen4c Having similar IPC than Zen3), as far as I am aware Zen4c is ISA equivalent than Zen4(AVX512, BF16, 1MiB per core) but much more compact. we have seen how dense they can make the 3D Cache layer( twice as dense than the same sized found on Zen3) perhaps they are applying that to the core? meaning in the area of 8MiB they do 16MiB

My Diagram of Zen4c using regular 5nm libraries with just half L3$ and Compact FPU and the one below is Super Dense SDRAM Libraries. is still not half but close to it.
Papermaster said in the ryzen 7000 reveal stream that Zen4c has half the "core" area. I assume they also exluded TSVs and there's 2MB L2 per core
3D cache layer is dense beause there's almost no logic, no ring, no shadow tags etc.
It'll definitely be interesting to see the floorplan if they ever show it.
 

Saylick

Diamond Member
Sep 10, 2012
3,372
7,104
136
This is an X-Ray picture took on the Genoa Package, can someone with more brains than me be able to fit 128 Cores on that package with 8 Chiplets that are not twice as big as the normal ones?

View attachment 67033


I am thinking that two large chiplets that carry 16 Cores each could fit this way

View attachment 67034
You're probably right. Probably 8 larger CCDs for Bergamo. They'll need to adjust the substrate/routing of Infinity Fabric to the CCDs but the underlying pins and socket don't need to change.
 

FangBLade

Member
Apr 13, 2022
199
395
106

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
So after everything that's happened around first party benchmarks over the past 5 years, now you're saying AMD is pulling a stunt here? You have got to be kidding.

You think AMD is beyond shenanigans? That's very naive. Although I was admittedly wrong in my assessment concerning Zen 4's FCLK being 3ghz, the fact is that the 12900K can and does benefit from memory faster than DDR5-6000, whereas the Zen 4 rig was configured optimally with an overclocked FCLK set at 2ghz (default is 1733), and an overclocked UCLK specifically to exploit the DDR5-6000 in 1:1 config.

We don't even know what the latency is like in a 2:1 config, which will be used by those wanting to achieve higher memory clocks. But if AMD is putting an emphasis on 1:1 ratio, then 2:1 is likely going to be suboptimal.

The AMD Expo slide said "low latency DDR5 down to 63ns" (ostensibly in gear 1) which is already higher than what Intel can achieve in gear 2.
 
Reactions: Kaluan

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
When you're saying it's 'more' than what ADL can achieve, you're forgetting that Zen4 still has its memory controller off-CCD and still achieving comparable latency despite the fact, while keeping all the benefits of utilizing a chiplet design.

I'd call that a huge win in that area.

Yes but it's likely running in gear 1 mode, which you can't do on Intel CPUs with DDR5 as it's limited to gear 2 only.

Disabling the e cores and overclocking the ring bus can result in latencies as low as in the 40s with Alder Lake, which is as close to gear 1 as you can get.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |