Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

itsmydamnation · Sep 29, 2022

why nothing about execution ALU , Load/store , dispatch / retire

im guessing 6 wide decode, 6 wide ALU ( two clusters of 3 + zen3/4 like other units sharing register ports ) , 10 issue/dispatch , 10-12 retire , 512+ rob , extra port to L1D ( load or store)

SMT8

edit: plus uop spill to L1I

poke01 · Sep 30, 2022

AMD needs to wider. Intel and Apple are already 6 and 8 wide respectively.

Saylick · Sep 30, 2022

6 wide decode, 12 op/cycle mop cache dispatch, 10 wide execute. Keep the double pumped AVX512, no real need for single cycle AVX512. Larger caches in general (mop, BTB, L1, L2, etc). Clocks will not increase much, only 6 GHz boost. IPC gains well north of 20%, ideally 25%+ via 50% wider overall core (width and IPC never scale proportionally). More accelerators.

DisEnchantment · Sep 30, 2022

itsmydamnation said:
why nothing about execution ALU , Load/store , dispatch / retire

Ran out of Poll options

bakyt115 · Sep 30, 2022

I hope we will see implementation of some amds new patens

Patents Show Possible AMD Zen 5 Architectural Layout: Dual Fetch/Decode, Like Intel's Little Cores [Report] | Hardware Times

The other day a patent was discovered that used AMD’s entire Zen core architecture to explain the workings of a modern microprocessor. The problem? It was an Intel patent. The controversy was resolved with Intel releasing the following statement: When filing a patent application, citation to...

www.hardwaretimes.com

https://www.sumobrain.com/patents/wipo/Processor-with-multiple-op-cache/WO2022066560A1.html

https://www.reddit.com/r/hardware/comments/koqglp

also there is patent about storing uopcach in L3

Insert_Nickname · Sep 30, 2022

I would think going wider would be the logical option, but beyond that I have no idea.

I expect to upgrade to 5 or it's successor eventually. But both my main systems run on 3 so there is no rush. Gives the platform time to mature too.

eek2121 · Sep 30, 2022

I suspect AMD will up the core count to stay competitive. They are doing okay now, but if Intel sticks with 8+16 they may have a harder time in the future.

inf64 · Sep 30, 2022

Here is what I expect:

1) IPC - AMD had targets in the 40-50% range for radical uarch. departures or redesigns. Examples are Excavator -> Zen 1 (~50%, overshot the target) and Zen 1 (non + version) -> Zen3 ( ~40% cumulative). Therefore, I expect them to target the same 40-50% range going from Zen3 -> Zen 5. My guess is 45% IPC jump target Vs Zen 3 (vanilla) which should mean that Zen 5 could have ~28-30% higher IPC versus Zen 4 (vanilla).

2) Clocks - I expect stagnation or at best slight bump for the max ST target (nothing too radical, ~5%; this should put Zen 5 at best on ~6Ghz range for one threaded workloads).
For MT workloads, I expect no changes as we will have more cores (so 5-5/2Ghz all core boost).

3) Core count - between 50% and 2x more. I still lean more towards 2x increase, so 32C/64T should be the new flagship for mainstream desktop (like 7950x is now).
Stacking of cores and memory (L3) is a norm so I expect they will evolve in the right direction.

4) SMT and AVX512 - I expect no change in SMT (number of threads), so I expect they will use 2 threads per core. AVX512 could become 2x 512 native implementation but I doubt this will happen (so AVX512 implementation remains the same IMO).

5) Accelerators could be a game changer if software support is there. This could accelerate certain workloads by an order of magnitude higher VS traditional core count increases.

6) big. Little - I don't expect "big.Little" approach on mainstream desktop parts ( Ryzen 8000) - AMD will have specific accelerators for targeted workloads. On mobile parts it's a possibility though, something like Zen 5c or Zen 4c on 3nm along with Zen 5 cores (on the same monolithic die, sharing the same ISA and L3 cache)

So yeah, no wonder AMD's engineers are excited about it, on paper it could be a monumental performance and efficiency jump. For ST, up to similar 30% versus flagship Ryzen 7000. For MT, up to 2.5x faster than 16C Zen 4 (if they launch 32C parts). Couple that with possible huge performance jumps in accelerated (Xilinx) workloads, along with 3nm power improvements, Zen 5 looks ready to take on Arrow Lake.

Tuna-Fish · Sep 30, 2022

I don't think the FPU will be widened until there is substantial amount of AVX-512 code out there. Certainly not Zen5, probably not Zen6 or 7 either. If Intel reintroduces it soon, then Zen8 or so would be realistic.

Top of my personal wishlist for the FPU is not wider ALUs, but faster gather.

JustViewing · Sep 30, 2022

bakyt115 said:
I hope we will see implementation of some amds new patens

Nice find. I think this is what excites Mike Clark. I always wondered about this type of execution. Looks like deep integration of FPGA to CPU. In the patent they are also talking about optimizing the execution based on the integer or floating point workflow. Probably they also could map CPU instructions to trigger execution of GPU code. ultimate HSA!!!.

naad · Sep 30, 2022

Whole package change like ADL, massive increase in int/fp registers and in flight load and stores, 6 wide or more at least, OoO resrouces out of the wazoo, I fully expect them to either double or 1.5x increase (almost) everything zen4 has

Though I think avx512 will remain "double pumped", avx512's advantage is in its shuffles/masks, not its vector throughput, AMD probably knows this best

DisEnchantment · Oct 2, 2022

Tuna-Fish said:
Top of my personal wishlist for the FPU is not wider ALUs, but faster gather.

Well, they will have to add more execution ports... because Cinebench

DisEnchantment said:
Hmmm ... I dont think that is the route AMD will take with Genoa.

View attachment 51995
Zen4 CCD from the Gigabyte leak likely has two SDP/IF links.
On top of that to support 96 or even 128 cores would mean they need to support up to 512 SerDes links.
Way too much power wasted and looking at the routing for Rome above already is very complicated.
On Rome they had to route the links underneath the CCD.

And in ISSCC 2021, Sam Naffziger already alluded to interposers/higher density interconnects (highlighing by me). This was before Lisa announced 3D V-Cache.
View attachment 51998
In fact from this slide we knew the second item already is coming to Zen3. (Cache while not exactly memory is backed by SRAM which is memory)

From TSMC's offical data, CoWoS-L with LSI/Si bridges is proven and it reaches 3x reticle size which can cover all chiplets for a hypothetical 16 CCD EPYC.
View attachment 51997

Anyway, I think AMD will most likely go with some sort of interposer, probably CoWoS-R if not CoWoS-L if there is really no need for super high density interconnects. i.e. if 4um contact pitch is enough (i.e. CoWoS-R) instead of the high density CoWoS-L (<1um pitch)
If not, they will burn power linking those 96/128 cores, it is not sustainable.
You can read yourself the paper by Naffziger

https://d3smihljt9218e.cloudfront.net/lecture/13766/slideshow/a8919637b2ff693a934db77ff29044fd.pdf

Seems I was totally wrong about the packaging on Genoa. But at least what I was discussing about has been applied in some form to MI300.

Granite Ridge would probably be on N4 and Strix Point on N3/E
N4 would offer very minor density increase vs Raphael [92.7 MTr/mm2], probably edging past 100 MTr/mm2 (in line with TSMC's N4P projections).
But those efficiency gains are very very significant -22% Power at Iso Perf if AMD were to go for N4P.
Also N4P would be in lots of supply by 2024, F18P1-4 and F21P1 in AZ. Around 180K wpm.
F18 P5, P6, P7, P8 would be online by end of 2H23. TSMC is yet to finish construction on P8, but P5 and P6 are already completed with equipment installation and P7 is under way.
There will be over supply of wafers due to slowing semi sales.
Granite Ridge on N4 and Strix Point on N3 would be a good strategy to use capacity from both nodes. Across these two nodes there is 380K wpm of wafers.
There are no customers for so much capacity in this current environment, not sure what TSMC will do. This is on top of 200K+ of N6/N7

I think we can safely assume 6 GHz normal operation as guaranteed on Granite Ridge. Currently I am already hitting 5.88 GHz out of the box with my 7950X. When Limited to 85 C, Tjmax my 7950X runs at 5.735 GHz in hysteresis loop.

Right now bigger question is how will AMD add more cores, where is the space. AM5 seems tight to add another chiplet.
I am inclined to think in the direction of @Hans de Vries as mentioned here

Hans de Vries said:
Zen4 and Zen5 should use the same packages. (Just like Zen...Zen3)
Most likely with the same CCD and IO die arrangements in the packages as well.

I would expect:

Zen4 ---> Zen5

(1) General use of PAM4 for the SERDES so:
- PCIe-5 --> PCI-6 doubles the bandwidth using the same frequency but 2 bit instead of 1 bit per clock edge.
- XGMI3 --> XGMI4 doubles the bandwidth between the IO die and the CCD's using the same number of pins

(2) Doubling the number of cores for each CCD's is enabled by doubling the SERDES bandwidth.
- 16 cores per CCD
- The same number of serdes IO lines
- L3 VCache in increments of 128 MB per die

View attachment 52102

Cores on top on cache like could leave ample room for adding more cores. Cores on top of cache, like in MI300 and then multi layer for some of them in V Cache scenarios.
Thermals would be a major challenge to be solved for a hypothetical >16 core chip, but the improved power characteristics of N4 could help a bit.
Is there even enough BW for more than 16 cores with DDR5 in all core loads? Currently with my 7950X I am getting 84 GB/s BW @6000 MT/s, which is almost 50% more than what I get on my 5950X
84 mm2 CCD would be a decent jump in MTr count if it were not stacked, which would roughly be more than 25% MTr gain (for instance Zen 3 got a 9% MTr gain over Zen 2). For stacked Cores the it is a completely unknown hypothesis

BTW, Mike Clark already alluded to adding more cores to the CCX but as mentioned, L3 latency is going to suffer.

We do see core counts growing, and we will continue to increase the number of cores in our core complex that are shared under an L3. As you point out, communicating through that has both latency problems, and coherency problems, but though that's what architecture is, and that's what we signed up for. It’s what we live for - solving those problems. So I'll just say that the team is already looking at what it takes to grow to a complex far beyond where we are today, and how to deliver that in the future.

This might not be as Bergamo which is supposedly using dual CCX designs to get 16 Cores per chiplet, but pure speculation at this point of course.

eek2121 · Oct 2, 2022

Growing core counts would help keep Intel off their back. If AMD had gone to 32 cores for Zen 4, for example, Raptor Lake would have been DOA for multicore workloads.

coercitiv · Oct 2, 2022

DisEnchantment said:
Ran out of Poll options

Zen5 in a nutshell.

soresu · Oct 2, 2022

DisEnchantment said:
Granite Ridge would probably be on N4 and Strix Point on N3/E

I have my doubts about Strix Point being on any variant of N3.

It should be announced the same year as Granite Ridge, possibly even earlier than Granite Ridge if AMD follows its current APU announcement at CES cadence with Phoenix Point announcement for CES 2023.

All this makes me think it's most likely to land on the same process as Granite RIdge CCD's.

IMHO we are likely to see a 2025 Zen5 / RDNA4 APU which probably will be fabbed on an N3 variant node.

soresu · Oct 2, 2022

eek2121 said:
Growing core counts would help keep Intel off their back. If AMD had gone to 32 cores for Zen 4, for example, Raptor Lake would have been DOA for multicore workloads.

Seems like 24C Zen4 AM5 would have accomplished that while sticking with the EPYC:AMx 4:1 ratio of previous generations.

CakeMonster · Oct 2, 2022

inf64 said:
3) Core count - between 50% and 2x more. I still lean more towards 2x increase, so 32C/64T should be the new flagship for mainstream desktop (like 7950x is now).
Stacking of cores and memory (L3) is a norm so I expect they will evolve in the right direction.

That would be cool but I'm not convinced, average consumers might somewhat favor Intel's total core count 'advantage' right now but I predict they will stop caring in 2 years+. I think 50% increase to 12c CCX is more likely as that will keep the dies small and (most importantly) cheap and people frankly won't need more.

Huge disclaimers about server markets here which would overrule everything, if its needed there it might pay off for AMD to go with 16c CCX but it doesn't make sense to me on the consumer parts unless it just trickles down there.

I sure want more cores, but its a luxury with regards to transistors, and a luxury before the games or consumer applications demand it (although it would be fun as heck to have major headroom that creative people could come up with purposes for, which is a rare occasion in the hardware market).

DisEnchantment · Oct 2, 2022

soresu said:
IMHO we are likely to see a 2025 Zen5 / RDNA4 APU which probably will be fabbed on an N3 variant node.

Strix Point is Zen 5 on an "advanced node" in 2024.

Phoenix Point is Zen 4 on N4 in early 2023

During FAD 22 Q & A, Forrest was trying to be obscure when he said we can expect Zen CPU cores to be on multiple nodes going forward but then Lisa jumped in and straight up said we can expect Zen 5 on both N3 and N4 in 2024.
May not be Strix Point on N3 but some Zen 5 SoC will be on N3 in 2024.

Found it, timestamped video

"You should expect to see , you know, 4nm and 3nm versions of Zen 5 and you will see them in 2024" - Lisa Su

soresu · Oct 2, 2022

DisEnchantment said:
View attachment 68589

Strix Point is Zen 5 on an "advanced node" in 2024.

Phoenix Point is Zen 4 on N4 in early 2023

View attachment 68591

During FAD 22 Q & A, Forrest was trying to be obscure when he said we can expect Zen CPU cores to be on multiple nodes going forward but then Lisa jumped in and straight up said we can expect Zen 5 on both N3 and N4 in 2024.
May not be Strix Point on N3 but some Zen 5 SoC will be on N3 in 2024.

Found it, timestamped video

Oooooof 😨

Counting the rumour that N33 is on TSMC N6 that means RDNA3 is going to be on at least 4 separate node/node variants.

That's approaching ARM level IP fab versatility.

Phoenix should be a really nice boost from Rembrandt at this rate.

Panino Manino · Oct 2, 2022

"Greater than 6 Wide Decode"
I wanted to vote for exactly 6 wide. What I do?

RTX · Oct 2, 2022

Since it's in 2024, wouldn't AMD be using N4X? They did use N5 HPC for Zen4.

TSMC Unveils N4X Node: Extreme High-Performance at High Voltages

www.anandtech.com

Exist50 · Oct 2, 2022

DisEnchantment said:
But those efficiency gains are very very significant -22% Power at Iso Perf if AMD were to go for N4P.

N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.

Does beg the question, however. With Zen 5 widely expected to bring a much "bigger" core, but only small improvements on the process side (pre-N3), then what're the implications for core counts and/or cost? Rumors seem to indicate that Turin is looking to be around 120 cores, give or take. If, for discussion purposes, we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.

Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?

DisEnchantment · Oct 2, 2022

RTX said:
Since it's in 2024, wouldn't AMD be using N4X? They did use N5 HPC for Zen4.

TSMC Unveils N4X Node: Extreme High-Performance at High Voltages

www.anandtech.com

I don't suppose it will be, N4X is too leaky for chiplets being shared with server processors.

Exist50 said:
N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.

Actually it is relatively speaking, because we will never really know what AMD used as baseline for DTCO for Zen 4 and what they have changed in the PDK.
But if they are moving to N4 based DTCO there must some gains behind that. So I will give them benefit of having a precedent. They were able to optimize and extract more frequency at same power on N7 for Zen 3 vs Zen 2 after all. May not be the the -22% power, but lets see if TSMC (or AMD) is blowing hot air.

Exist50 said:
we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.

Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?

Nah... 50% is way too much, AMD is not Apple. At best I expect Zen 5 to gain 20 to 25% more transistor per core (area * scaling) or around 85mm2 CCDs (assuming they still even have CCD concept by then). Zen 3 is barely 9% MTr gain over Zen 2. N5 --> N4 is hardly any density gain. Bean counters will not allow a CCD of around 110mm2. Not sure what they are making with N3, I guess mobile but dont know.
Also big chunks of silicon real estate seem already allocated before hand for second GMI, AVX 512 (unless Zen 5 adds second AVX 512 port)
Most interesting thing for me is if there is any core stacking in place.

What is unknown or at least no rumors thus far is what are those N3 and N4 product segments because Lisa mentioned both for 2024.

Regarding how much N3/5 wafers will cost is up in the air, TSMC has a risk of under utilization (recession and loss of chinese customers).
But I am not terribly interested in cost or market share comparisons.

Saylick · Oct 2, 2022

Exist50 said:
N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.

Does beg the question, however. With Zen 5 widely expected to bring a much "bigger" core, but only small improvements on the process side (pre-N3), then what're the implications for core counts and/or cost? Rumors seem to indicate that Turin is looking to be around 120 cores, give or take. If, for discussion purposes, we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.

Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?

Perhaps it's only DT and mobile Granite Ridge that we'll see N4P. Strix Point, APUs, and Turin will be on N3E. Basically, only the markets that demand the best perf/W and best density get the best node. DT is not one of those markets, especially when it's a market where people are more perf/$ sensitive. N4P would give AMD the ability to keep costs in check while raising perf/$.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Platinum Member

Diamond Member

Golden Member

Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Senior member

Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Senior member

Member

Platinum Member

Golden Member

Diamond Member