- Mar 3, 2017
- 1,749
- 6,614
- 136
Ran out of Poll optionswhy nothing about execution ALU , Load/store , dispatch / retire
Nice find. I think this is what excites Mike Clark. I always wondered about this type of execution. Looks like deep integration of FPGA to CPU. In the patent they are also talking about optimizing the execution based on the integer or floating point workflow. Probably they also could map CPU instructions to trigger execution of GPU code. ultimate HSA!!!.I hope we will see implementation of some amds new patens
Well, they will have to add more execution ports... because CinebenchTop of my personal wishlist for the FPU is not wider ALUs, but faster gather.
Seems I was totally wrong about the packaging on Genoa. But at least what I was discussing about has been applied in some form to MI300.Hmmm ... I dont think that is the route AMD will take with Genoa.
View attachment 51995
Zen4 CCD from the Gigabyte leak likely has two SDP/IF links.
On top of that to support 96 or even 128 cores would mean they need to support up to 512 SerDes links.
Way too much power wasted and looking at the routing for Rome above already is very complicated.
On Rome they had to route the links underneath the CCD.
And in ISSCC 2021, Sam Naffziger already alluded to interposers/higher density interconnects (highlighing by me). This was before Lisa announced 3D V-Cache.
View attachment 51998
In fact from this slide we knew the second item already is coming to Zen3. (Cache while not exactly memory is backed by SRAM which is memory)
From TSMC's offical data, CoWoS-L with LSI/Si bridges is proven and it reaches 3x reticle size which can cover all chiplets for a hypothetical 16 CCD EPYC.
View attachment 51997
Anyway, I think AMD will most likely go with some sort of interposer, probably CoWoS-R if not CoWoS-L if there is really no need for super high density interconnects. i.e. if 4um contact pitch is enough (i.e. CoWoS-R) instead of the high density CoWoS-L (<1um pitch)
If not, they will burn power linking those 96/128 cores, it is not sustainable.
You can read yourself the paper by Naffziger
Zen4 and Zen5 should use the same packages. (Just like Zen...Zen3)
Most likely with the same CCD and IO die arrangements in the packages as well.
I would expect:
Zen4 ---> Zen5
(1) General use of PAM4 for the SERDES so:
- PCIe-5 --> PCI-6 doubles the bandwidth using the same frequency but 2 bit instead of 1 bit per clock edge.
- XGMI3 --> XGMI4 doubles the bandwidth between the IO die and the CCD's using the same number of pins
(2) Doubling the number of cores for each CCD's is enabled by doubling the SERDES bandwidth.
- 16 cores per CCD
- The same number of serdes IO lines
- L3 VCache in increments of 128 MB per die
View attachment 52102
This might not be as Bergamo which is supposedly using dual CCX designs to get 16 Cores per chiplet, but pure speculation at this point of course.We do see core counts growing, and we will continue to increase the number of cores in our core complex that are shared under an L3. As you point out, communicating through that has both latency problems, and coherency problems, but though that's what architecture is, and that's what we signed up for. It’s what we live for - solving those problems. So I'll just say that the team is already looking at what it takes to grow to a complex far beyond where we are today, and how to deliver that in the future.
Zen5 in a nutshell.Ran out of Poll options
I have my doubts about Strix Point being on any variant of N3.Granite Ridge would probably be on N4 and Strix Point on N3/E
Seems like 24C Zen4 AM5 would have accomplished that while sticking with the EPYC:AMx 4:1 ratio of previous generations.Growing core counts would help keep Intel off their back. If AMD had gone to 32 cores for Zen 4, for example, Raptor Lake would have been DOA for multicore workloads.
That would be cool but I'm not convinced, average consumers might somewhat favor Intel's total core count 'advantage' right now but I predict they will stop caring in 2 years+. I think 50% increase to 12c CCX is more likely as that will keep the dies small and (most importantly) cheap and people frankly won't need more.3) Core count - between 50% and 2x more. I still lean more towards 2x increase, so 32C/64T should be the new flagship for mainstream desktop (like 7950x is now).
Stacking of cores and memory (L3) is a norm so I expect they will evolve in the right direction.
IMHO we are likely to see a 2025 Zen5 / RDNA4 APU which probably will be fabbed on an N3 variant node.
"You should expect to see , you know, 4nm and 3nm versions of Zen 5 and you will see them in 2024" - Lisa Su
Oooooof 😨View attachment 68589
Strix Point is Zen 5 on an "advanced node" in 2024.
Phoenix Point is Zen 4 on N4 in early 2023
View attachment 68591
During FAD 22 Q & A, Forrest was trying to be obscure when he said we can expect Zen CPU cores to be on multiple nodes going forward but then Lisa jumped in and straight up said we can expect Zen 5 on both N3 and N4 in 2024.
May not be Strix Point on N3 but some Zen 5 SoC will be on N3 in 2024.
Found it, timestamped video
N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.But those efficiency gains are very very significant -22% Power at Iso Perf if AMD were to go for N4P.
I don't suppose it will be, N4X is too leaky for chiplets being shared with server processors.Since it's in 2024, wouldn't AMD be using N4X? They did use N5 HPC for Zen4.
TSMC Unveils N4X Node: Extreme High-Performance at High Voltages
www.anandtech.com
Actually it is relatively speaking, because we will never really know what AMD used as baseline for DTCO for Zen 4 and what they have changed in the PDK.N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.
Nah... 50% is way too much, AMD is not Apple. At best I expect Zen 5 to gain 20 to 25% more transistor per core (area * scaling) or around 85mm2 CCDs (assuming they still even have CCD concept by then). Zen 3 is barely 9% MTr gain over Zen 2. N5 --> N4 is hardly any density gain. Bean counters will not allow a CCD of around 110mm2. Not sure what they are making with N3, I guess mobile but dont know.we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.
Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?
Perhaps it's only DT and mobile Granite Ridge that we'll see N4P. Strix Point, APUs, and Turin will be on N3E. Basically, only the markets that demand the best perf/W and best density get the best node. DT is not one of those markets, especially when it's a market where people are more perf/$ sensitive. N4P would give AMD the ability to keep costs in check while raising perf/$.N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.
Does beg the question, however. With Zen 5 widely expected to bring a much "bigger" core, but only small improvements on the process side (pre-N3), then what're the implications for core counts and/or cost? Rumors seem to indicate that Turin is looking to be around 120 cores, give or take. If, for discussion purposes, we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.
Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?