Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

OneEng2 · Oct 31, 2024

gdansk said:
And what of ARL-H?

I still think it'll be better than Strix Point.

Possibly so. As I have said, Arrow Lake appears to be a very good laptop chip.

Hans Gruber said:
Nobody is talking about a process shrink. N4P is still made on the 5nm process. Going from N5 to N4P is a big generational jump in both performance and efficiency. If the 9800x3D is only 7-8% better than the 7800x3D. That would fall outside (11% performance increase) if AMD made a 7800x3D on N4P. There is a difference between real world and paper statistics. At the same time, TSMC cannot publish performance and power efficiency numbers that are not accurate. Customers would not buy their silicon if the numbers were not accurate.

Forum members should stop reading reviews or watching youtube vidoes on how silicon process works. You either get efficiency or performance increase, not both is BS. It depends on the node and the variant TSMC offers. The N3 3nm silicon only gives a significant efficiency increase over 5nm. There are clock regressions with N3 and no performance uplift. You cannot choose I want performance over efficiency. You have to wait for a N3P to get performance and efficiency gains. When they say it's customized for AMD. That simply means they give them an option to lean towards power or efficiency or a mix of both depending on the process.

When AMD was set to use N3 with Zen 5 they mentioned clock regressions and suddenly that changed when they had to use N4P. No clock regressions but clock increases. People attribute increased clocks to the Zen 5 design. I attribute it to N4P. Zen 4 was originally supposed to be on N5P but they could not get N5P. They settled for the base N5 silicon. A 7800x3D on N4P would yield very good efficiency and performance gains based on the N4P silicon.

Sometimes core density matters more than anything. Take Nvidia's Blackwell. That chip is so advanced they really needed to be on TSMC 3nm but Apple bought all the N3 silicon. Instead, Blackwell is on N4P which provides increased silicon density over all other 5nm processes.

When TSMC says "performance" it isn't referring to benchmarks, and it is qualified as to what it does mean.

Your assertion that Zen 4 would perform the same as Zen 5 on the same process node is nonsensical. You surely see that, right?

Hans Gruber · Oct 31, 2024

OneEng2 said:
Possibly so. As I have said, Arrow Lake appears to be a very good laptop chip.

When TSMC says "performance" it isn't referring to benchmarks, and it is qualified as to what it does mean.

Your assertion that Zen 4 would perform the same as Zen 5 on the same process node is nonsensical. You surely see that, right?

I don't know. Maybe watch all the Hardware Unboxed Zen 5 reviews and get back to me.

adroc_thurston · Oct 31, 2024

Hans Gruber said:
Maybe watch all the Hardware Unboxed Zen 5 reviews and get back to me.

I can plug Phoronix in return. Don't be a lolcow.

fastandfurious6 · Oct 31, 2024

Det0x said:
especially when my next GFX card is expected to cost over 2 grands (5090)

this.... is so insane actually

where is the world going to?

dr1337 · Nov 1, 2024

So any rumors about dual stacked CCDs? Or was all the news about TSVs being on top of the die just people getting their analysis wrong?

Supposedly Zen2 already had TSVs that never got used, maybe a Zen5+ or Zen6 will have fully sandwiched dies? Maybe stacked logic is going to finally become a thing?

I'm still skeptical about them finding a way to join the two CCDs together on a single interposer without changing the IOD, let alone that being economical, but maybe thats what the top side vias are for?

itsmydamnation · Nov 1, 2024

anyone want to buy my 7800X3D...... joking not joking.....

adroc_thurston · Nov 1, 2024

dr1337 said:
Supposedly Zen2 already had TSVs that never got used

It didn't.

dr1337 said:
Maybe stacked logic is going to finally become a thing?

MI300 kinda is already? tons of shoreline I/O and d2d there.

dr1337 · Nov 1, 2024

adroc_thurston said:
MI300 kinda is already? tons of shoreline I/O and d2d there.

Not even remotely the same concept as stacked logic.

Hitman928 · Nov 1, 2024

dr1337 said:
So any rumors about dual stacked CCDs? Or was all the news about TSVs being on top of the die just people getting their analysis wrong?

Supposedly Zen2 already had TSVs that never got used, maybe a Zen5+ or Zen6 will have fully sandwiched dies? Maybe stacked logic is going to finally become a thing?

I'm still skeptical about them finding a way to join the two CCDs together on a single interposer without changing the IOD, let alone that being economical, but maybe thats what the top side vias are for?

No stacked logic. Not sure what you mean by TSVs being on top of the die.

dr1337 · Nov 1, 2024

Hitman928 said:
No stacked logic. Not sure what you mean by TSVs being on top of the die.

High Yield made a video about it, and there was another twitter tech poster (I can't remember) that did another zen 5 die analysis. All from the Fritzchens Fritz pictures. Pretty sure I found both of those from this thread.

So either these people were just making things up, or there would be something else going on with multiple stacks of silicon.

Hitman928 · Nov 1, 2024

dr1337 said:
High Yield made a video about it, and there was another twitter tech poster (I can't remember) that did another zen 5 die analysis. All from the Fritzchens Fritz pictures. Pretty sure I found both of those from this thread.

So either these people were just making things up, or there would be something else going on with multiple stacks of silicon.

I don’t remember them claiming TSVs on the top of the die. Did you mean TSVs on the top die?

dr1337 · Nov 1, 2024

Hitman928 said:
I don’t remember them claiming TSVs on the top of the die. Did you mean TSVs on the top die?

As far as I am aware, Fritzchens Fritz has never taken a picture that wasn't a top down view.

AMD put out a video today that showed the cache being under the die... so what else could I mean eh? Surely you're not getting into semantics about the position of flip chips right?

gaav87 · Nov 1, 2024

Hans Gruber said:
Nobody is talking about a process shrink. N4P is still made on the 5nm process. Going from N5 to N4P is a big generational jump in both performance and efficiency. If the 9800x3D is only 7-8% better than the 7800x3D. That would fall outside (11% performance increase) if AMD made a 7800x3D on N4P. There is a difference between real world and paper statistics. At the same time, TSMC cannot publish performance and power efficiency numbers that are not accurate. Customers would not buy their silicon if the numbers were not accurate.

Forum members should stop reading reviews or watching youtube vidoes on how silicon process works. You either get efficiency or performance increase, not both is BS. It depends on the node and the variant TSMC offers. The N3 3nm silicon only gives a significant efficiency increase over 5nm. There are clock regressions with N3 and no performance uplift. You cannot choose I want performance over efficiency. You have to wait for a N3P to get performance and efficiency gains. When they say it's customized for AMD. That simply means they give them an option to lean towards power or efficiency or a mix of both depending on the process.

When AMD was set to use N3 with Zen 5 they mentioned clock regressions and suddenly that changed when they had to use N4P. No clock regressions but clock increases. People attribute increased clocks to the Zen 5 design. I attribute it to N4P. Zen 4 was originally supposed to be on N5P but they could not get N5P. They settled for the base N5 silicon. A 7800x3D on N4P would yield very good efficiency and performance gains based on the N4P silicon.

Sometimes core density matters more than anything. Take Nvidia's Blackwell. That chip is so advanced they really needed to be on TSMC 3nm but Apple bought all the N3 silicon. Instead, Blackwell is on N4P which provides increased silicon density over all other 5nm processes.

Thats wrong my friend. For tsmc 'performance' increase is BASE clock speed increase at 1.2V. Not benchmark perf +x% !!
Boost clock has nothing to do with it except where they clearly specify Fmax increase. Boost clock depends on many factors but cba educating. So in other words +11% for your imaginary node jump for 7800x3d would increase base clock speed by 11% not the boost clock. Boost clock could still be the same.

adroc_thurston · Nov 1, 2024

gaav87 said:
For tsmc 'performance' increase is BASE clock speed increase at 1.2V.

Not even directly applicable given the amount of DTCO involved in this day and age.

Hans Gruber · Nov 1, 2024

gaav87 said:
Thats wrong my friend. For tsmc 'performance' increase is BASE clock speed increase at 1.2V. Not benchmark perf +x% !!
Boost clock has nothing to do with it except where they clearly specify Fmax increase. Boost clock depends on many factors but cba educating. So in other words +11% for your imaginary node jump for 7800x3d would increase base clock speed by 11% not the boost clock. Boost clock could still be the same.

That's wrong. If you run two identical processors on different processes on the 5nm node. Clock for clock the performance on N4P will be 11% better on average than on N5. That means if you run a 7800x3d @ 1.2v and 4ghz. The N4P at the same clock and voltage would perform 11% better than on N5 with identical clocks and voltage. The efficiency gains means at the same voltage it would take eg. 10% less power to produce the same result. The increased clocks are primarily due to silicon and or voltage.

Fun fact. AMD actually wanted the Zen 4 CPU's on N5P but they had to settle for N5.

Abwx · Nov 1, 2024

Hans Gruber said:
Fun fact. AMD actually wanted the Zen 4 CPU's on N5P but they had to settle for N5.

It is on N5P.

Dr Su reinforced that technology roadmaps are all about making the right choices and the right junctures, and explicitly stated that our 5nm technology is highly optimized for high-performance computing – it’s not necessarily the same as some other 5nm technologies out there

AMD: We’re Using an Optimized TSMC 5nm Process

www.anandtech.com

StefanR5R · Nov 1, 2024

Hans Gruber said:
If the 9800x3D is only 7-8% better than the 7800x3D [...]

One link in your chain of errors is your interpretation that anybody claimed that 9800X3D was 8% faster than 7800X3D. AMD at least did not say that. Here is AMD's claim in their October 31 press release:

A video game computer with a Radeon RX 7900 XTX, DDR5-6000, and 9800X3D in it delivers 8% more video frames per second on average (in AMD's selection of game benchmarks) than a video game computer with a Radeon RX 7900 XTX, DDR5-6000, and 7800X3D in it.

So what does this say about how much faster 9800X3D's integer ALUs perform compared to 7800X3D's?

That's a rhetorical question.

CouncilorIrissa · Nov 1, 2024

Hans Gruber said:
That's wrong. If you run two identical processors on different processes on the 5nm node. Clock for clock the performance on N4P will be 11% better on average than on N5. That means if you run a 7800x3d @ 1.2v and 4ghz. The N4P at the same clock and voltage would perform 11% better than on N5 with identical clocks and voltage. The efficiency gains means at the same voltage it would take eg. 10% less power to produce the same result. The increased clocks are primarily due to silicon and or voltage.

Seek help.

yuri69 · Nov 1, 2024

techjunkie123 said:
AMD EPYC 9655 Benchmarks Show The Terrific Generational Gains With 5th Gen EPYC Review - Phoronix

www.phoronix.com

Aaaaannd we're back to +40% for Zen5. On server looks like the new SKUs did gain about 40% with the same number of cores.

Just like the RDNA3 CU figure... that pushed 40% figure was IPC figure. The tested Turin SKU is a 2.6/4.5GHz compared to Genoa 2.4/3.7GHz.

Josh128 · Nov 1, 2024

Hans Gruber said:
That's wrong. If you run two identical processors on different processes on the 5nm node. Clock for clock the performance on N4P will be 11% better on average than on N5. That means if you run a 7800x3d @ 1.2v and 4ghz. The N4P at the same clock and voltage would perform 11% better than on N5 with identical clocks and voltage.

No. That doesnt make any sense at all. If you run Proc A at 4 GHz and then you run Proc A on a newer, superior node, but identical design and logic, it will perform exactly the same at the same frequency with better efficiency.

LightningZ71 · Nov 1, 2024

Josh128 said:
No. That doesnt make any sense at all. If you run Proc A at 4 GHz and then you run Proc A on a newer, superior node, but identical design and logic, it will perform exactly the same with better efficiency.

That's really going to depend on what was limiting Fmax on that logic. If it was power limited or thermal limited, then, no, you CAN get SOME limited frequency improvement.

But, that's not pertinent to the existing discussion.

TSMC's power/performance/density claims have, for at least the last decade or more, always been in reference to whatever representative test logic chip that they are demoing. I'll repeat, the claims only EXACTLY apply to that ONE, REPRESENTATIVE TEST CHIP, running at a strictly specified frequency, at a strictly specified power level, and implemented at their claimed target circuit density.

AMD does NOT run their chips at that certain power level, at least, not 100% of the time. It may cross that threshold during normal operations, but power draw is dynamic.

AMD does NOT run their chips at that one specific frequency, at least, not 100% of the time. It may cross that frequency threshold during normal operations, but frequency is dynamic.

AMD does not implement their chips at that one particular density level, EVER. AMD typically uses relaxed circuit density in an effort to boost Fmax and reduce leakage among other things. That relaxed density means that their chips have far different behavior than the TSMC test chip.

Now, all that being said, lets look at the 7700X and the 7800X3d. What would be a reasonable expectation of their BEHAVIOR on N4P instead of N5P? The reasonable performance would be that boost frequencies MAY increase by about 200Mhz. Frequency, not benchmark performance. Why would this be? The frequency would increase because N4P does allow slightly higher circuit frequencies at a given power level. It would also be possible because the power draw at those same frequencies would be slightly lower than it would be on N5P, meaning less heat is being generated. This means that any thermal limits aren't hit until a higher frequency, any power limits aren't hit until a higher frequency, and the core can run a little bit faster. But it isn't 11%!!! AMD already has relaxed circuit density on timing critical circuits, achieving a lot of what N4P was supposed to gain over N5P, so whatever improvements are given by the refinement over N5P are less effective. Because N4P offers a very modest improvement to circuit density, AMD could choose to implement a bit more padding on those same critical circuits and maybe get another slight frequency increase, but it's arguable that it wouldn't help.

Continuing, lets look at all-core frequencies. When the whole chip is running at it's power and thermal limits, process refinements are more evident. Just a few % reduction in power draw closer to the given chip's efficiency sweet spot can have a larger impact on highest achievable frequencies for all cores. This means that they MIGHT be able to get 300Mhz more all-core boost frequency at the same power draw and thermal limits as before. Again, 300Mhz FREQUENCY, not performance.

Why do I specify frequency as separate from performance? Because, for the x86 chips that we are focusing on, their achievable performance per Mhz of clock speed decreases as the processor increases it's operating frequency. Why? because the cores then spend more and more of their time waiting on the memory subsystem to serve them with data and work to do. And this is the rub. BASED ON PROCESS ALONE, cache density and performance barely moves at all between N5P to N4P. You MIGHT get 100Mhz more out of the L3. That's not making a big dent on performance.

In summary, you MIGHT get a couple hundred Mhz more boost speed out of the 7700X and 7800X3d (perhaps another 100Mhz on the X3D part because of their notable sensitivity to thermal load), but it isn't going to make up the whole difference between them and their 9XXX comparable parts. You'd expect maybe 2-3% performance improvement ON AVERAGE from that increase ON DESKTOP, ENTHUSIAST usage patterns.

However, you are forgetting that Zen4/5 are also server chips too. As we've just seen Michael post on Phoronix, Zen5 really is managing the rumored 40% improvement over Zen4 on life-for-like SKUs. Not the few % from above, not even 10% from the ideal achievable increase due to thermal/power load improvements at frequencies that are very near the sweet spot for the chips, but big improvements. That was achieved by improvements all over the chips. AMD doesn't get that by just an optical shrink. And, if AMD doesn't get that in servers, they're not going to waste the time and money to even attempt it.

OneEng2 · Nov 1, 2024

Wow. I have to admit, it has been a while since I have seen a forum post as wrong headed as this...

Hans Gruber said:
That's wrong. If you run two identical processors on different processes on the 5nm node. Clock for clock the performance on N4P will be 11% better on average than on N5. That means if you run a 7800x3d @ 1.2v and 4ghz. The N4P at the same clock and voltage would perform 11% better than on N5 with identical clocks and voltage. The efficiency gains means at the same voltage it would take eg. 10% less power to produce the same result. The increased clocks are primarily due to silicon and or voltage.

The non-bold parts are only true for the exact conditions and mix of logic that TSMC quotes the metrics for.

The belief that clock speed of an entire superscalar processor pipeline is dependent ONLY on the performance of the transistors used in the pipeline shows a lack of understanding of how modern CPU's work.

Also, the power assumption is flawed (although perhaps not as off-base as the other assumptions) as well in that it depends on the mix of transistors in the CPU design.

eek2121 · Nov 1, 2024

Gideon said:
Considering what we got from the leaker and the strong hints for 2x X3D dies, I still think the most probable outcome is

5.3 Ghz boost clock for the 9900X3D
5.4 Ghz boost clock for the 9950X3D

Two dies would also make 9900X3D an actually viable product (in gaming)

I would almost bet money on 5.5ghz+ for the 9950X3D. I would absolutely not be shocked if the chip hit 5.7ghz, but 5.5 is plenty. The 9900X3D will be slightly lower.

The 7950X3D topped out at 5.25ghz for the chiplet with the cache.

I don’t think AMD will have cache on both chiplets, however, if they did and also hit 5.7ghz SC, they would have a banger of a chip.

AMD stated that the cache was never the thing holding clocks back to begin with, it was cooling for the cores.

Why is the 9800X3D only at 5.2ghz? Probably mostly as to not step on the 9900X. Thermals are likely still a small factor as well, but these chips are hitting 5.4ghz+ with an overclock and are completely unlocked.

Hitman928 · Nov 1, 2024

dr1337 said:
As far as I am aware, Fritzchens Fritz has never taken a picture that wasn't a top down view.

AMD put out a video today that showed the cache being under the die... so what else could I mean eh? Surely you're not getting into semantics about the position of flip chips right?

I honestly am not sure what you meant so I guess I'll answer the question as asked, no, no one thought the TSVs were on the top of the CCD die.

moinmoin · Nov 1, 2024

Hitman928 said:
I honestly am not sure what you meant so I guess I'll answer the question as asked, no, no one thought the TSVs were on the top of the CCD die.

I think (among others) he's referring to this die shot analysis:

Hans de Vries said:
Zen 5 die: TSV's to the 64 MB X3D L3 cache:

It simply must have enough and well distributed TSV's...

It looks like a all the TSV's providing the power to the X3D L3 die are using an MIM decoupling capacitor for each power and ground pair of TSV's leading to something like an extra ~8500 power/ground TSV's. Under each black square should be two TSV's.

Original die photos: https://www.flickr.com/photos/130561288@N04/

View attachment 108838

Image in the quote:

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Platinum Member

Diamond Member

Senior member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Member

Diamond Member

Platinum Member

Lifer

Elite Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member