Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 437 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Don’t know if this has already been discussed. I just noticed that the model listing on Wikipedia has 2, 3, and 6 chiplet devices listed.

Don't trust Wikipedia. It can be edited by a Monkey.

Lowest Chiplet count is 2 per package.

Any CPU Model with 384 MiB L3$ is a 12 Chiplet Design.
Any CPU Model with 256 MiB L3$ is a 8 Chiplet Design.
Any CPU Model with 128 MiB L3$ is a 4 Chiplet Design.
Any CPU Model with 64 MiB L3$ is a 2 Chiplet Design.



In that case, The EPYC 9174F is a High Speed CPU with up to 4.4 Ghz with 8 Chiplet Design and two cores active per Chiplet.

 

MarkPost

Senior member
Mar 1, 2017
239
345
136
That's Alder Lake with DDR4, so it's not the same. I appreciate the reference though. This is from TPU:




And you can check out Phoronix's results HERE for another point of reference which is much more exhaustive.

Sorry to say but those TPU compiling benchmarks are crap, plain and simple. Did you note that (for example) 7900X and 7950X are basically at the same time? No way. Compiling tasks take advantages of number of threads, and of course a 7950X will be faster than a 7900X (the same with other models).

I've compiled Unreal 4 with my 7950X with their 16/32 and simulating a 7900X (deactivating two cores of each CCD) CPU frecuency is stock (in my system, 5.1-5.2 while compiling). This is the comparison between them:



As can be seen a 16/32 is clearly faster than a 12/24 (not too far to 20%), not barely faster as TPU shows. I really would like to know how in hell they configure their benchmarks.

About Phoronix, well both are in margin of error times, but we have to keep in mind that 13900K is running with unlimited power enabled, so no apples to apples from a perfomance pov.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Can we keep the Intel chat out of this thread named "Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)" ?

Surprised a moderator has not stepped in to kick out the Intel brigade from here.

Just create a thread titled "Zen4 vs Alderlake" and post to your hearts content....

Yeah I agree. I'm bowing out. Had some good debates but it's clear no one was going to be convinced and we shouldn't be discussing Intel vs AMD in this thread.
 
Reactions: therealmongo

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
So I asked Locuza for an annotation of the sIOD. It looks as if there really only are 12 GMI links. Regarding Bergamo we are back to square one. Either they use a "narrow narrow mode" (only one link for 16 cores and 2 CCX) or they might use an entirely different IOD and maybe another implementation of the interconnect.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,413
136
So I asked Locuza for an annotation of the sIOD. It looks as if there really only are 12 GMI links. Regarding Bergamo we are back to square one. Either they use a "narrow narrow mode" (only one link for 16 cores and 2 CCX) or they might use an entirely different IOD and maybe another implementation of the interconnect.
it wil jsut be one .

people get hung up on bandwidth but its stupid. unless your workload is streaming / some high cache level prefetcher friendly you will be bound by total number of outstanding memory requests anyway. Then look at the target market and what workloads get run on those and move on.
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
it wil jsut be one .

people get hung up on bandwidth but its stupid. unless your workload is streaming / some high cache level prefetcher friendly you will be bound by total number of outstanding memory requests anyway. Then look at the target market and what workloads get run on those and move on.
Not arguing your opinion. I am having a hard time understanding what those "typical cloud workloads" might be. Surely some AI/ML, Databases, storage, but isn't VOD streaming a huge part of that market as well?
 

JustViewing

Member
Aug 17, 2022
159
268
96
Not arguing your opinion. I am having a hard time understanding what those "typical cloud workloads" might be. Surely some AI/ML, Databases, storage, but isn't VOD streaming a huge part of that market as well?
I think they are for VM/Docker kind of workloads used with micro services. Each VM will likely to contain 2-4 cores and the main objective is running number of micro services in parallel. In these types of workloads, high single threaded performance is not the most important part. Network latency and database access likely to play bigger part than single threaded performance.

While one can argue that fewer high performance core can do a similar job, but this would force to share cores between different services running in VM/Docker. With all the side channel security holes, I doubt any one would want to share the cores.
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
Don't trust Wikipedia. It can be edited by a Monkey.

Lowest Chiplet count is 2 per package.

Any CPU Model with 384 MiB L3$ is a 12 Chiplet Design.
Any CPU Model with 256 MiB L3$ is a 8 Chiplet Design.
Any CPU Model with 128 MiB L3$ is a 4 Chiplet Design.
Any CPU Model with 64 MiB L3$ is a 2 Chiplet Design.

View attachment 71256

In that case, The EPYC 9174F is a High Speed CPU with up to 4.4 Ghz with 8 Chiplet Design and two cores active per Chiplet.

I don’t trust Wikipedia either. The Wikipedia article has 9224 as a 3 chiplet device with 96 MB of cache and 24 cores. The AMD link has it as 64 MB of cache. This can’t be a 2 chiplet device with 24 cores. I don’t think this would be a 3 chiplet device, so I expect it is actually a 4 chiplet device with only 16 MB L3 per chiplet. Not much else makes sense, so not all 64 MB cache devices are 2 chiplets. I am wondering if the 9124 (non-F) is also 4 chiplets with 4 cores each with only 16 MB L3 per core. Edit: per CCD.

Wikipedia also has the 9454 as a 6 chiplet x 8 cores with 192 MB rather than 8 chiplet x 6 cores. Since AMD has it as 256 MB, it is likely 8 chiplet x 6 cores.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
I think they are for VM/Docker kind of workloads used with micro services. Each VM will likely to contain 2-4 cores and the main objective is running number of micro services in parallel. In these types of workloads, high single threaded performance is not the most important part. Network latency and database access likely to play bigger part than single threaded performance.

While one can argue that fewer high performance core can do a similar job, but this would force to share cores between different services running in VM/Docker. With all the side channel security holes, I doubt any one would want to share the cores.
User-facing microservices are often very performance sensitive, and for code that's often single thread bound, that means an affinity for higher performance cores. Yes, network and database latency are huge hits, but compute is not insignificant. But obviously, very application dependent.
 

JustViewing

Member
Aug 17, 2022
159
268
96
User-facing microservices are often very performance sensitive, and for code that's often single thread bound, that means an affinity for higher performance cores. Yes, network and database latency are huge hits, but compute is not insignificant. But obviously, very application dependent.
While what you said is true, they can often load balance micro services with more instances when demand is high. So high core count of Bergamo will help in these situations.

Also, Bergamo is by no means low performance core. most likely to have 70%-90% of bigger core.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,413
136
Not arguing your opinion. I am having a hard time understanding what those "typical cloud workloads" might be. Surely some AI/ML, Databases, storage, but isn't VOD streaming a huge part of that market as well?
The space where you actually want high memory bandwidth per core is highly oversubscribed VM farms. especially in not a cloud environment , 32 cores a socket is the best place to be. im not seeing any real issues here.
 
Reactions: MadRat

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Regarding whether Bergamo reuses Genoa's sIOD or gets a new one: I think everybody would agree that Bergamo is a bigger deal than Siena, and the latter has to get a new sIOD design anyway due to running on a different platform with different I/O spec. It may well be the case that Bergamo does change the package structure (possible even in preparation to Turin) which could be the explanation for a lot of things, Genoa's sIOD not being prepared for it, Zen 4c dies repeatedly being said not to appear on client PC systems etc. pp.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Again, did you even analyze the numbers?
Did you even understand what I wrote ?
"There is little variance among reviewers for RPL numbers. There is huge variance for Zen 4"

If the tune up of the systems is "not rocket science", then why your preferred outlets have much lower numbers for Zen 4?
No, I am not underestimating the competence of most youtubers. If anything, I am giving them too much credit by calling them clueless as opposed to downright incompetent.
It is all about the clicks, and getting a review up faster than the competition gives them more clicks.
Again, why are they getting much lower numbers for Zen 4? It is just basic "not rocket science stuff", right?

Go and read the RPL review by techspot, check the Zen4 numbers
Now, go check one by the sites you shill for, compare the Zen 4 numbers
The numbers by techspot are much higher, as these guys did the basic "not rocket science stuff"

You were totally unable to answer me how the reviewers you shill for configured their Zen4 systems.
Here is answer for you:
"out of the box bios with drivers installed by windows from a master intel image"

ps. It is not one game that might be bugged, techspot showed Zen 4 ahead across the board. If their RPL numbers were lower, you could claim that was the reason. Their RPL numbers are the same as everyone else, but their Zen 4 are higher, so this is the key. However, hey are not the only ones getting higher Zen 4 numbers.
You got it all backwards... getting higher results with Zen4 is not because of setting up the system right or better.

Better results for AMD hardware are just glitches.
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
Regarding whether Bergamo reuses Genoa's sIOD or gets a new one: I think everybody would agree that Bergamo is a bigger deal than Siena, and the latter has to get a new sIOD design anyway due to running on a different platform with different I/O spec. It may well be the case that Bergamo does change the package structure (possible even in preparation to Turin) which could be the explanation for a lot of things, Genoa's sIOD not being prepared for it, Zen 4c dies repeatedly being said not to appear on client PC systems etc. pp.
I am not sure if Siena will get its own IOD. Rather I can imagine them to harvest Genoa's IOD. This thing is pretty big with 400mm2 at 6nm - it might be worth it.

Regarding Bergamo: I am now back at own IOD, different packaging, same new interconnect as Zen5 and re-use of Zen4c later on as the little-part in client 😉
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I am not sure if Siena will get its own IOD. Rather I can imagine them to harvest Genoa's IOD. This thing is pretty big with 400mm2 at 6nm - it might be worth it.
I doubt that. It would be a repeat of the non-Pro Threadripper: Essentially a budget platform reusing an IOD of which only half is actually used. Non-Pro Threadripper failed since there was too little profit in it while requiring dies in high demand in the high profit server market. I can't imagine AMD wanting to repeat that with Siena.
 

Tigerick

Senior member
Apr 1, 2022
686
576
106
Based on leaks, Bergamo still use SP5 and positions itself above Genoa @ 112 and 128 cores of Zen 4c. So at least initially they won't overlap with each others.

Sienna is using same CCD die as Bergamo but designed for SP6 with 1P support only. With max 4 CCD, Sienna can only support up to 64 Zen 4c cores. SP6 supports half of memory channels ie 6, so AMD could design smaller IOD for Sienna platform.
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
Based on leaks, Bergamo still use SP5 and positions itself above Genoa @ 112 and 128 cores of Zen 4c. So at least initially they won't overlap with each others.

Sienna is using same CCD die as Bergamo but designed for SP6 with 1P support only. With max 4 CCD, Sienna can only support up to 64 Zen 4c cores. SP6 supports half of memory channels ie 6, so AMD could design smaller IOD for Sienna platform.
Sorry, but you got several things wrong. Bergamo is not above Genoa - just a different offering for a different market.
Siena uses the same CCD as Genoa, not Bergamo.
 
Reactions: Tlh97 and scineram

MadRat

Lifer
Oct 14, 1999
11,923
259
126
Why does it seem Intel wants to avoid head to head matchups in core counts?

Their core counts seem like they would pose a mathematical inconsistency for schedulers, where cores that lay outside a base2 exponent may get irregular schedules.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
Why does it seem Intel wants to avoid head to head matchups in core counts?

Their core counts seem like they would pose a mathematical inconsistency for schedulers, where cores that lay outside a base2 exponent may get irregular schedules.
There's nothing special about power of 2 core counts from a scheduling perspective. And Intel's behind in core counts because they've been unable to engineer an economically viable solution with more than they have. Process node differences and the inherently larger core sizes vs Zen probably contribute greatly.
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
AMD Has already published a product list with accurate numbers. There's absolutely no reason to keep quoting Wikipedia when they are clearly wrong
Yes, but does AMD give the chiplet and cache per chiplet counts anywhere? Just trying to figure out where the inconsistency came from. It seems like the 9124 (16-core ) and 9224 (24-core) are both 4 chiplet devices just with 16 MB of L3 per chiplet rather than 32 MB. AMD has 64 MB of L3 listed for the 9224 which must be a 4 chiplet device since it is 24 cores. I don’t think there will be any 2 chiplet devices. This seems to be a bit of extra marketing segmentation.

The point is: 64 MB L3 = 2 chiplets is wrong.
 
Last edited:
Reactions: Tlh97 and scineram

jamescox

Senior member
Nov 11, 2009
640
1,104
136
I doubt that. It would be a repeat of the non-Pro Threadripper: Essentially a budget platform reusing an IOD of which only half is actually used. Non-Pro Threadripper failed since there was too little profit in it while requiring dies in high demand in the high profit server market. I can't imagine AMD wanting to repeat that with Siena.
How high is the defect rate on 6 nm TSMC vs. 14 nm GF for the IO die? I don’t see that any of these need a different IO die. With 12x DDR5, 12x GMI, and >128x pci-express lanes, I would expect them to have some partially defective parts. They actually only need the fully functional IO die for the 12 chiplet parts. The 4 and 8 chiplet parts could have non-functional GMI that are not used. Using the same IO die in Siena could allow them to disable defective memory controllers on the IO die also.

edit: obvious thing for 6 channels is straight cut in half IO die. Probably not difficult to tape out a cut down die since it is already somewhat modular internally. Whether they do that or not may just depend on whether they need a channel to sell IO die with defects.

I was hoping for some stacking with Bergamo or Siena with the 2023 release dates, but it seems we don’t get that until Zen 5.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
I am not sure if Siena will get its own IOD. Rather I can imagine them to harvest Genoa's IOD. This thing is pretty big with 400mm2 at 6nm - it might be worth it.
I think because the IO die is such a large piece of silicon, AMD will be incentivized to make a cut-down version for Sienna. Their volumes are easily getting to the point where the unit cost savings will eclipse the cost of a separate tape out.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |