Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,392
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

tomatosummit

Member
Mar 21, 2019
184
177
116
That's exactly what Intel does with the E cores in its ADL desktop and mobile chips though.
I should have said area in a more overall sense, including per package and per U.
And the performance target is different again. Intel's mainstream desktop die, the one that has reduced silicon space, does not have any ecores and there's not a rumour or leak of any server products introducing hybrid cores that I'm aware of.
Although I personally think the ecores on the desktop K series are a bit of a marketing stunt. 2more golden cove cores would have served a similar performance per area uplift with none of the inconsistencies when you take into consideration the efficiency of GC at sane wattage and gracemont's poor scaling at higher power. It's more important for mobile computing and marketing today

Then on desktop applications would 32 zen4c cores beat 24 zen4 or zen4 with stacked cache even? If it's 3x ccd then we're back to bergamo questions of why can't it be 48zen4c cores or is it another IO die and/or packaging technology. Not the kind of complications I forsee at that stage of am5's life cycle when there's still zen4-3d or even zen5 on the horizon by the time 4c is ready.
 

turtile

Senior member
Aug 19, 2014
618
296
136
zen4c does not seem like a consumer core to me, amd stated it's for cloud providors to get as many cores and in as area allows, does not read to me as something useful for desktop cpus, especially if they can jump to 24cores for raphael. If it's changes reduce it's absolute single core performance for any reason, be it cache changes or reduced target frequency then it's probably a no go for a ryzen product.
3d stacked cache is in a similar situation, it's targetted to hpc style workloads but importantly will give improved gaming performance so can happily go onto consumer desktop.

Since the cache won't shrink much from 5nm to 7nm, cutting the L3 cache in half will easily allow for 32 more cores. And considering that AMD already does this for the last couple of mobile lines, I won't be surprised one bit if this ends up in the Ryzen mobile CPU. The rumors say that AMD will use Zen 5 + Zen 4c. That will provide super fast cores and cores slightly behind in one package.
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
zen4c does not seem like a consumer core to me, amd stated it's for cloud providors to get as many cores and in as area allows, does not read to me as something useful for desktop cpus, especially if they can jump to 24cores for raphael. If it's changes reduce it's absolute single core performance for any reason, be it cache changes or reduced target frequency then it's probably a no go for a ryzen product.
3d stacked cache is in a similar situation, it's targetted to hpc style workloads but importantly will give improved gaming performance so can happily go onto consumer desktop.

As for 3ccd and 3memory channels I'd welcome that. Exposing 32pcie lanes also wouldn't go amiss either.
Would finally give a good reason to have a top level shipset over Bx50 motherboards and reintroduce relatively cheaper hedt workstations after tr's price hike.
1700 pin count might be too few for all of that though. My guess is still 128bit memory.
I suspect Zen4c to be stacked, possibly with a lot of cache off die. If some desktop/mobile variant of it includes off die cache, then it wouldn’t surprise me that it would do very well on desktop applications also. It seems that most of AMD’s mobile APUs have half the L3 of the desktop variant anyway, so it may do well even without off die / stacked cache. High density server and mobile actually have some similar design goals if you just consider the core, which the MCM or stacked packages allow us to do. It would obviously make more sense in mobile than desktop where you could likely just use all large cores and burn a lot more power. I wouldn’t mind such a device on desktop though. It would probably do great at compiling code, if it has sufficient cache.

We don’t know what has been cut on the Zen 4c core. I would suspect FP resources to be reduced significantly, since massive vector FP units are a waste for many types of servers and these are probably second indie area after caches. It also takes a lot of power having very wide interconnect just to supply the data. I have wondered if they actually expanded to 16 cores by having 2 cores share each L2 cache with larger and higher density L2. If they did that, then it might be possible to support the same instruction set with AVX512 by sharing a single unit between the two cores. I assume that would be unpopular with enthusiast, but if it is server or possibly mobile only, then I don’t think it is an issue.

It seems like they will need something like Zen4c in mobile to compete with Apple processors. Zen4c seems directly targeted at low power, high density servers made possible by ARM cores. It isn’t that much of a stretch that it might help compete with Apple m1 pro/max processors in mobile also. Even if it is an 8 or 12 core with a little extra leakage, it may still be a better mobile processor than Zen 4.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,287
2,887
106
We don’t know what has been cut on the Zen 4c core. I would suspect FP resources to be reduced significantly, since massive vector FP units are a waste for many types of servers and these are probably second indie area after caches. It also takes a lot of power having very wide interconnect just to supply the data. I have wondered if they actually expanded to 16 cores by having 2 cores share each L2 cache with larger and higher density L2. If they did that, then it might be possible to support the same instruction set with AVX512 by sharing a single unit between the two cores. I assume that would be unpopular with enthusiast, but if it is server or possibly mobile only, then I don’t think it is an issue.

It seems like they will need something like Zen4c in mobile to compete with Apple processors. Zen4c seems directly targeted at low power, high density servers made possible by ARM cores. It isn’t that much of a stretch that it might help compete with Apple m1 pro/max processors in mobile also. Even if it is an 8 or 12 core with a little extra leakage, it may still be a better mobile processor than Zen 4.

I wonder how much overhead it would save to cut SMT and if it would make any sense

Some chips that Zen4c don't have it. Graviton 3, Ampere, Apple M1...
 

eek2121

Diamond Member
Aug 2, 2005
3,032
4,222
136
I suspect Zen4c to be stacked, possibly with a lot of cache off die. If some desktop/mobile variant of it includes off die cache, then it wouldn’t surprise me that it would do very well on desktop applications also. It seems that most of AMD’s mobile APUs have half the L3 of the desktop variant anyway, so it may do well even without off die / stacked cache. High density server and mobile actually have some similar design goals if you just consider the core, which the MCM or stacked packages allow us to do. It would obviously make more sense in mobile than desktop where you could likely just use all large cores and burn a lot more power. I wouldn’t mind such a device on desktop though. It would probably do great at compiling code, if it has sufficient cache.

We don’t know what has been cut on the Zen 4c core. I would suspect FP resources to be reduced significantly, since massive vector FP units are a waste for many types of servers and these are probably second indie area after caches. It also takes a lot of power having very wide interconnect just to supply the data. I have wondered if they actually expanded to 16 cores by having 2 cores share each L2 cache with larger and higher density L2. If they did that, then it might be possible to support the same instruction set with AVX512 by sharing a single unit between the two cores. I assume that would be unpopular with enthusiast, but if it is server or possibly mobile only, then I don’t think it is an issue.

It seems like they will need something like Zen4c in mobile to compete with Apple processors. Zen4c seems directly targeted at low power, high density servers made possible by ARM cores. It isn’t that much of a stretch that it might help compete with Apple m1 pro/max processors in mobile also. Even if it is an 8 or 12 core with a little extra leakage, it may still be a better mobile processor than Zen 4.

I was thinking about this last night. If Zen4c is destined only for server products, it is possible they could go with a lower clock speed (2.5-3.3 ghz) and stack them.
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
I was thinking about this last night. If Zen4c is destined only for server products, it is possible they could go with a lower clock speed (2.5-3.3 ghz) and stack them.
Stacking cpu cores seems unlikely and lower clocks than Genoa are almost a certainty. I am thinking that it has a lot of cache in bridge chips that will be stacked. The IO die and cpu die might all be in the same plane. I suspect that this might fit in 1 reticle sized area. Even with very low power cores, 128 of them in that small of an area will be high thermal density. For regular Zen 4 Genoa, the cpu chiplets will be quite far apart, especially in the 4 cpu chiplet variants. Low core core count variants will also have the cores quite far apart. The 7FX3 Milan parts only have 1, 2, or 3 active cores per chiplet for maximum thermal / power and cache per core. With Bergamo specifically designed for high core counts, it doesn’t seem like they would even have a low core count variant.
 
Reactions: Joe NYC

jamescox

Senior member
Nov 11, 2009
640
1,104
136
I wonder how much overhead it would save to cut SMT and if it would make any sense

Some chips that Zen4c don't have it. Graviton 3, Ampere, Apple M1...
One of the ideas behind SMT was that it doesn’t add much die area. A lot of the features that it exploits, like register renaming, are already present in most OOO, super-scalar architectures. It doesn’t seem worthwhile to tear that out. Also, server applications that Bergamo targets are often the type of code where SMT does best. Such server code is very branch heavy with a lot of unpredictable branches and memory accesses. Such server applications often have very low IPC, so running more threads to try to keep the core busy can be effective here.
 

jpiniero

Lifer
Oct 1, 2010
14,806
5,431
136

Genoa apparently supports up to 12 TB of memory with dual sockets albeit at 4000 only. Don't ask how much 12 TB of DDR5 will cost.
 
Reactions: lightmanek

jamescox

Senior member
Nov 11, 2009
640
1,104
136

Genoa apparently supports up to 12 TB of memory with dual sockets albeit at 4000 only. Don't ask how much 12 TB of DDR5 will cost.
Well, minimum config to populate all channels will be 12 modules per socket, so dual socket system will require 24 modules. SP5 based systems will not be cheap. It would be nice if they could make a half size socket really. 24 modules is 384 GB, even with “tiny” (by server standards) 16 GB modules. The larger modules go up in price fast, so it will be good to be able to get large memory size without needing 128 GB or larger modules. Dual socket would be 1.5 TB with (hopefully) cheaper 64 GB modules.
 

tomatosummit

Member
Mar 21, 2019
184
177
116
Well, minimum config to populate all channels will be 12 modules per socket, so dual socket system will require 24 modules. SP5 based systems will not be cheap. It would be nice if they could make a half size socket really. 24 modules is 384 GB, even with “tiny” (by server standards) 16 GB modules. The larger modules go up in price fast, so it will be good to be able to get large memory size without needing 128 GB or larger modules. Dual socket would be 1.5 TB with (hopefully) cheaper 64 GB modules.
I very confused as to where all the dimm slots will go. Two epyc slots and 32dimms seem to fill the entire width of a rack currently. This is now 50% more dimms and I don't think even ocp racks are that wide.
Are dual socket with 24/48dimms just not going to exist in their current form? Or is it uneven 12+4 layouts, maybe even some planned nvdimms.
On the otherhand I assume the cpu and IO die are laid out in a similar fashion into four sections so 8 or 4 channels populated will still see good performance across most applications.
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
I very confused as to where all the dimm slots will go. Two epyc slots and 32dimms seem to fill the entire width of a rack currently. This is now 50% more dimms and I don't think even ocp racks are that wide.
Are dual socket with 24/48dimms just not going to exist in their current form? Or is it uneven 12+4 layouts, maybe even some planned nvdimms.
On the otherhand I assume the cpu and IO die are laid out in a similar fashion into four sections so 8 or 4 channels populated will still see good performance across most applications.
A lot of servers I have seen are dual socket with 8 modules each arranged one behind the other. Then they put 4 such servers in a 2u or something like that. Going to 12 slots may still be too wide for the sockets arranged front to back with memory on left and right of each socket, at least for the half width servers. I suppose they could do 4, 8, or 12 (1, 2, or 3 slots per quadrant of the IO die) but you will lose some memory bandwidth if you don’t use the full 12.
 
Last edited:
Reactions: Tlh97 and Joe NYC

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,722
14,753
136
A lot of servers I have seen are dual socket with 8 modules each arranged one behind the other. Then they put 4 such servers in a 2u or something like that. Going to 12 slots may still be too wide for the sockets arranged front to back with memory on left and right of each socket, at least for the half width servers. I suppose they could do 4, 8, or 12 (1, 2, or 3 sockets per quadrant of the IO die) but you will lose some memory bandwidth if you don’t use the full 12.
Can you say vertical ? Like a memory riser, up from a slot that does like 4-6 in each slot ? taking the width of 3 ?
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Well, minimum config to populate all channels will be 12 modules per socket, so dual socket system will require 24 modules. SP5 based systems will not be cheap. It would be nice if they could make a half size socket really. 24 modules is 384 GB, even with “tiny” (by server standards) 16 GB modules.

That is very good point, more channels is always better, economic advantage of filling required memory with smaller DIMMs is cool, but it also also enables usage of those larger DIMMs and less DPC to keep memory speeds higher.
2DPC seems to have 800MT/s penalty ( for RDIMM ) so 2R 64GB modules giving 768GB and 4800 speed looks perfect to me.
 
Last edited:

MadRat

Lifer
Oct 14, 1999
11,922
259
126
How long until the chip-stacking technology takes over memory production? Having smaller form factors would go a long way to squeezing more power per square inch of your form. Just add more active cooling.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
I would disagree here. The biggest server OEMs and hyperscalers get their DDR5 without any scalper middlemen, that may plague consumer market.

And it will be just a question of quantities and Milan vs. Genoa breakdown.

Granted, these are different process nodes, but if AMD has a surplus (slow uptake) in one area, it can shift wafers to another area, with insatiable demand - GPUs. Once N31 and N32 are production worthy, AMD can make more of those, since they use different memory...



There is a huge installed base of Intel CPUs users, who were holding out so long for a worthwhile upgrade that they were almost ready to switch to AMD. And AMD gave them a reason (excuse) to stay with Intel - without feeling like an idiot.



AMD has followed a wise strategy of staying in every market they have long term desire to grow in, despite the fact that there are silicon shortages, despite the fact that there is an opportunity cost for staying in market segments (GPUs, consloes for example) that may be le less profitable today than other ones.

In light of that, the retreat from high end of the desktop segment is not outright destruction of one part of AMD's TAM, but it makes is it highly uphill battle to realize more market share in desktop. Full year wasted, until Raphael, Zen 4 is released a year from now.



TLDR of that is AMD could have had a point from a tie in gaming, with Zen3d.

But by postponing the release of Zen 3D, the complete focus of this year's big hardware review season is a big fat L for AMD. And AMD will not recover from the loss by announcing Zen3d desktop when no one is paying attention. It will be a year until we have a similar attention to CPUs with Zen 4 generation release vs. Raptor Lake from Intel.

Another way to look at it, AMD could have still been in gaming leadership with Zen3D (by getting a tie in the reviews) with just a single stack of L3. By early next year, that single stack of L3 may be a yawn. And at this time, AMD / TSMC may not have a solution for the challenge of > 1 layer of V-Cache yet.

If your original point stands that DDR5 will be a problem throughout 2022, and it slows Zen 4 adoption, AMD could benefit from yet another bump to Zen 3 generation, by giving it another upgrade with multiple levels of V-Cache in H2 2022.

But, it is not like the competition is going to be running away, due to DDR5. Sapphire Rapids is DDR5 only, so DDR5 shortage would affect it as much as it would affect Genoa.
None of us knows the real reason behind the changed HEDT plans. The speculation is totally fine and also really interesting, but the whole forum seems to make judgments along these pure speculations...
 

biostud

Lifer
Feb 27, 2003
18,367
4,909
136
So if zen4 does not launch with 3Dcache, do you think it will be a regular part of the AMD launch CPU cycle?

zen3
zen3+3Dcache
zen4
zen4+3Dcache?
...
...
 

DrMrLordX

Lifer
Apr 27, 2000
21,770
11,089
136
So if zen4 does not launch with 3Dcache, do you think it will be a regular part of the AMD launch CPU cycle?

zen3
zen3+3Dcache
zen4
zen4+3Dcache?
...
...

Could be. TSMC won't be ready for stacked N5 until Q4 2022 at the earliest anyway. We already know AMD is producing Genoa without stacked cache, so it stands to reason that Raphael will use the same CCDs (without stacked cache).
 
Reactions: Tlh97 and Joe NYC
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |