Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

MS_AT · Jan 21, 2025

StefanR5R said:
9950X3D is heterogeneous too, but hopefully not as bad as 7950X3D. (That is, with a clock frequency differential small enough such that cache insensitive workloads behave practically identical on both CCXs — hopefully.)

https://blog.hjc.im/spec-cpu-2017 David Huang added some time ago SpecInt results for 9800x3d overclocked to 5.7GHz [the fact that the vanilla score is using slower RAM might muddle things a bit, not sure how each subtest is affected by mem] but also stock 9950x vs 9800x3d show, that 9800x3d is extremely close to 9950x despite the clock deficit, which agrees with C&C results. Slight clock bump on top of what 9800x3d offers, should make both CCDs almost equal.

Joe NYC · Jan 21, 2025

fastandfurious6 said:
for zen 6 they really need to make a super superchip dedicated 100% to real performance, max tdp max vcache everything maxed out, without wasting silicon for NPUs and AVX512 and whatnot

like Halo but 100% CPU

hold the indisputable crown even higher than the popular *800X3D models

If the main Zen 6 CCD is indeed 12 cores, and everything scales 1.5x (L2, L3, V-Cache) then, a 24 thread processor would be a significant upgrade to 9800x3d.

CakeMonster · Jan 21, 2025

Joe NYC said:
If the main Zen 6 CCD is indeed 12 cores, and everything scales 1.5x (L2, L3, V-Cache) then, a 24 thread processor would be a significant upgrade to 9800x3d.

Will we get 9c/18t x600 and 18c/36c x900 CPU's then? That would be a really nice floor for mainstream parts.

StefanR5R · Jan 22, 2025

adroc_thurston said:
Desktop and server do not share the CCD, just the design itself.

StefanR5R said:
More differences than just another stepping?

adroc_thurston said:
Yeah, very very different xtor-level optimizations.

I am guessing:
– Turin CCD: focus on low V-f curve, notably in power-limited load scenarios, of course without compromises to correctness?
– GNR CCD: focus on an ability to operate at high V?

adroc_thurston · Jan 22, 2025

StefanR5R said:
I am guessing:
– Turin CCD: focus on low V-f curve, notably in power-limited load scenarios, of course without compromises to correctness?
– GNR CCD: focus on an ability to operate at high V?

Bingo!
Different xtor skews, GNR trades leakage for perf at >1.2V.

tsamolotoff · Jan 23, 2025

Fairly sure that dual chiplet full x3d didn't go into sales because of optics (also why this new APO-like thing is introduced instead of Gamebar). Just imagine this CPU is released, lots of reviewers and users expect it to perform on par with one CCD model and it still has worse performance in case Gamebar isn't working (which is easy to achieve - just block ms spyware in /etc/hosts and it won't update at all >.< ), this will lead to another 'waste of sand' video by GN, HWU et al.

Thunder 57 · Jan 24, 2025

C&C has an article up testing Zen 5 with the opcache disabled to see how well the 2x4 decoder works. Short answer, in single thread not great. With SMT enabled surprisingly good. It does seem like an obvious place for AMD to work some magic in Zen 6 though, assuming they still care about client performance. The results vary quite a bit so if you are interested check it out:

Disabling Zen 5’s Op Cache and Exploring its Clustered Decoder

Zen 5 has an interesting frontend setup with a pair of fetch and decode clusters.

chipsandcheese.com

igor_kavinski · Jan 24, 2025

tsamolotoff said:
this will lead to another 'waste of sand' video by GN, HWU et al.

For once, AMD should just release a product without any rehearsed marketing speak. When asked what's great about it, they just say, "We will let the community find out for themselves" and everyone frantically scrambles and tests everything under the moon. One month later, people are still finding pros and cons. All this exposure will translate into product sales. Then AMD announces a free X3D2 CPU to every reviewer/community member who discovered a previously unknown perf benefit in any application/game with decent user base. It will be a tremendous red letter day for AMD Marketing.

fastandfurious6 · Jan 24, 2025

igor_kavinski said:
For once, AMD should just release a product without any rehearsed marketing speak. When asked what's great about it, they just say, "We will let the community find out for themselves" and everyone frantically scrambles and tests everything under the moon. One month later, people are still finding pros and cons. All this exposure will translate into product sales. Then AMD announces a free X3D2 CPU to every reviewer/community member who discovered a previously unknown perf benefit in any application/game with decent user base. It will be a tremendous red letter day for AMD Marketing.

yes! a perfect product doesn't need marketing

only discoverability, which already exists in this case

LightningZ71 · Jan 24, 2025

The issue is, we know with near certainty the things that it will be good at. Phoenix has done extensive testing on the Epyc X3D parts and has an excellent roadmap of where it shines. Serve the home also has a few good benches that show the X3D parts' strengths.

99% of their strengths are in semi-pro or full server level tasks. Sell it under the Epyc brand and watch them fly off the shelves.

yottabit · Jan 25, 2025

LightningZ71 said:
The issue is, we know with near certainty the things that it will be good at. Phoenix has done extensive testing on the Epyc X3D parts and has an excellent roadmap of where it shines. Serve the home also has a few good benches that show the X3D parts' strengths.

99% of their strengths are in semi-pro or full server level tasks. Sell it under the Epyc brand and watch them fly off the shelves.

The 16 core chips have always been sort of prosumer no? Who’s buying 16 core JUST for gaming?

Thibsie · Jan 25, 2025

yottabit said:
The 16 core chips have always been sort of prosumer no? Who’s buying 16 core JUST for gaming?

Lotf of "I have a bigger one than yours " ?
Same things as guys playing CS:GO and buying 5090.

MS_AT · Jan 25, 2025

Thunder 57 said:
It does seem like an obvious place for AMD to work some magic in Zen 6 though, assuming they still care about client performance.

To quote part of the article conclusion

Zen 5’s improved op cache focuses on maximizing single threaded performance, while the decoders step in for certain multithreaded workloads where the op cache isn’t big enough.

So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.

Intel’s Lion Cove might not take the gaming crown, but Intel’s ability to run a plain 8-wide decoder at 5.7 GHz should turn some heads. For now, that advantage doesn’t seem to show up. I haven’t seen a high IPC, low-threaded workload with a giant instruction-side cache footprint. But who knows what the future will bring

And Intel's example shows that wider is not necessairly the answer.

yuri69 · Jan 25, 2025

MS_AT said:
So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.

AMD has been beefing the op cache since Zen 1. Zen 5 invested a lot in that cache. They are not about to drop it anytime soon, right?

Wider won't help Zen 5 all they need is to cut the frontend latency. Anyway, that hyped clustered frontend was purposedly done to handle server/SMT.

StefanR5R · Jan 25, 2025

yuri69 said:
They are not about to drop [the op cache] anytime soon, right?

Never; unless they would, for some reason, need to build a special-purpose core which sacrifices perf and perf/W for area savings. [Edit: Or if they, for some reason, made a core for a simpler to decode ISA such as ARM.]

yuri69 said:
Wider won't help Zen 5 all they need is to cut the frontend latency. Anyway, that hyped clustered frontend was purposedly done to handle server/SMT.

The clustered decoder might also help mobile (compared to a single decoder pipeline but with same total width [or even compared to any >4 wide single-pipeline decoder]): While there is only one thread running on a core, power off half of the decoder.

Thunder 57 · Jan 25, 2025

MS_AT said:
To quote part of the article conclusion

So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.

And Intel's example shows that wider is not necessairly the answer.

I don't know why some people thing uop caches are going away in the future. I think AMD should do something with the decoder though. Its worst case has essentially been 4 wide since Zen 1. Maybe they go to 2x5. That may be to big of a change for Zen 6 though.

StefanR5R · Jan 25, 2025

StefanR5R said:
The clustered decoder might also help mobile (compared to a single decoder pipeline but with same total width [or even compared to any >4 wide single-pipeline decoder]): While there is only one thread running on a core, power off half of the decoder.

As for the power consumption aspect, I just rediscovered this post on prices to pay with wide decoders (in ISAs with variable instruction lengths):

Tuna-Fish said:
Even after you know the lengths, you still have to pay the massively wide mux tree to align the instruction starts with the decoders. (Technically, this usually happens after first stage of decode begins on every byte boundary, but the point is, you still have to do it at some point.) This structure is huge and high-latency, and grows quadratically with decode width. (x86 instructions are 1-15 bytes long. First instruction slot selects first byte. Second instruction slot selects any byte between second, and 16th. Third slots selects any between third, and 31st. You get where this is going.) And unless instructions are always the same width, all those transistors switch every cycle, so you pay a lot of power too.

GTracing · Jan 25, 2025

Thunder 57 said:
I don't know why some people thing uop caches are going away in the future. I think AMD should do something with the decoder though. Its worst case has essentially been 4 wide since Zen 1. Maybe they go to 2x5. That may be to big of a change for Zen 6 though.

I could see AMD or more likely Intel drop the micro op cache. Most of the ARM cores don't have one. Apple's M-series are the fastest and most efficient CPUs. It seems to me that wide decoders is more area efficient than a big micro op cache.

Thunder 57 · Jan 25, 2025

GTracing said:
I could see AMD or more likely Intel drop the micro op cache. Most of the ARM cores don't have one. Apple's M-series are the fastest and most efficient CPUs. It seems to me that wide decoders is more area efficient than a big micro op cache.

Decoding is power hungry. Decoding ARM is easier so less need for an opcache. It also allows them to use a longer pipeline with less drawbacks. Of course that's assuming the plan on keeping high frequencies in the future.

GTracing · Jan 25, 2025

Thunder 57 said:
Decoding is power hungry. Decoding ARM is easier so less need for an opcache. It also allows them to use a longer pipeline with less drawbacks. Of course that's assuming the plan on keeping high frequencies in the future.

I'm not convinced that a micro op cache lowers power usage by that much. If that were true, then why don't ARM cores have one? They are more power constrained. The fact that ARM is easier to decode might mean that ARM decoders are less power hungry, but that answer isn't sufficient imho. I don't see how the difference between ARM and x86 could be so big that AMD is worried about decoder power consumption, but Apple isn't.

The way I see it, the micro op cache is primarily a way to increase performance, and that die area might be better spent making other parts of the core wider.

Abwx · Jan 25, 2025

A detail that is not hasardous :

Strix Halo does not support dGPUs and features only 12 PCIe Gen 4 lanes from the CPU.

"Our big Middle cores are better than their army of little cores" AMD's Ben Conrad chats about some of the design decisions behind Ryzen AI APUs and what makes Strix Halo tick

News, Reviews and other Informations about Laptops

www.notebookcheck.net

poke01 · Jan 25, 2025

Abwx said:
A detail that is not hasardous :

"Our big Middle cores are better than their army of little cores" AMD's Ben Conrad chats about some of the design decisions behind Ryzen AI APUs and what makes Strix Halo tick

News, Reviews and other Informations about Laptops

www.notebookcheck.net

thats a big turn off. I was hoping we get dGPU support unlike M chips

poke01 · Jan 25, 2025

Oh well at least we can eGPUs when users want more GPU perf

techjunkie123 · Jan 25, 2025

poke01 said:
thats a big turn off. I was hoping we get dGPU support unlike M chips

Not at all. That's why Strix point exists.

poke01 · Jan 25, 2025

techjunkie123 said:
Not at all. That's why Strix point exists.

more like Fire range but yeah I was hoping for dGPUs

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Platinum Member

Golden Member

Elite Member

Diamond Member

Senior member

Diamond Member

Lifer

Senior member

Platinum Member

Golden Member

Senior member

Senior member

Senior member

Elite Member

Diamond Member

Elite Member

Senior member

Diamond Member

Senior member

Lifer

Diamond Member

Diamond Member

Member

Diamond Member