Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 927 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
458
1,007
96
9950X3D is heterogeneous too, but hopefully not as bad as 7950X3D. (That is, with a clock frequency differential small enough such that cache insensitive workloads behave practically identical on both CCXs — hopefully.)
https://blog.hjc.im/spec-cpu-2017 David Huang added some time ago SpecInt results for 9800x3d overclocked to 5.7GHz [the fact that the vanilla score is using slower RAM might muddle things a bit, not sure how each subtest is affected by mem] but also stock 9950x vs 9800x3d show, that 9800x3d is extremely close to 9950x despite the clock deficit, which agrees with C&C results. Slight clock bump on top of what 9800x3d offers, should make both CCDs almost equal.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,790
4,115
106
for zen 6 they really need to make a super superchip dedicated 100% to real performance, max tdp max vcache everything maxed out, without wasting silicon for NPUs and AVX512 and whatnot

like Halo but 100% CPU

hold the indisputable crown even higher than the popular *800X3D models

If the main Zen 6 CCD is indeed 12 cores, and everything scales 1.5x (L2, L3, V-Cache) then, a 24 thread processor would be a significant upgrade to 9800x3d.
 

StefanR5R

Elite Member
Dec 10, 2016
6,275
9,591
136

tsamolotoff

Senior member
May 19, 2019
224
451
136
Fairly sure that dual chiplet full x3d didn't go into sales because of optics (also why this new APO-like thing is introduced instead of Gamebar). Just imagine this CPU is released, lots of reviewers and users expect it to perform on par with one CCD model and it still has worse performance in case Gamebar isn't working (which is easy to achieve - just block ms spyware in /etc/hosts and it won't update at all >.< ), this will lead to another 'waste of sand' video by GN, HWU et al.
 

Thunder 57

Diamond Member
Aug 19, 2007
3,313
5,416
136
C&C has an article up testing Zen 5 with the opcache disabled to see how well the 2x4 decoder works. Short answer, in single thread not great. With SMT enabled surprisingly good. It does seem like an obvious place for AMD to work some magic in Zen 6 though, assuming they still care about client performance. The results vary quite a bit so if you are interested check it out:

 
Jul 27, 2020
22,372
15,619
146
this will lead to another 'waste of sand' video by GN, HWU et al.
For once, AMD should just release a product without any rehearsed marketing speak. When asked what's great about it, they just say, "We will let the community find out for themselves" and everyone frantically scrambles and tests everything under the moon. One month later, people are still finding pros and cons. All this exposure will translate into product sales. Then AMD announces a free X3D2 CPU to every reviewer/community member who discovered a previously unknown perf benefit in any application/game with decent user base. It will be a tremendous red letter day for AMD Marketing.
 

fastandfurious6

Senior member
Jun 1, 2024
319
413
96
For once, AMD should just release a product without any rehearsed marketing speak. When asked what's great about it, they just say, "We will let the community find out for themselves" and everyone frantically scrambles and tests everything under the moon. One month later, people are still finding pros and cons. All this exposure will translate into product sales. Then AMD announces a free X3D2 CPU to every reviewer/community member who discovered a previously unknown perf benefit in any application/game with decent user base. It will be a tremendous red letter day for AMD Marketing.



yes! a perfect product doesn't need marketing

only discoverability, which already exists in this case
 

LightningZ71

Platinum Member
Mar 10, 2017
2,022
2,458
136
The issue is, we know with near certainty the things that it will be good at. Phoenix has done extensive testing on the Epyc X3D parts and has an excellent roadmap of where it shines. Serve the home also has a few good benches that show the X3D parts' strengths.

99% of their strengths are in semi-pro or full server level tasks. Sell it under the Epyc brand and watch them fly off the shelves.
 

yottabit

Golden Member
Jun 5, 2008
1,566
656
146
The issue is, we know with near certainty the things that it will be good at. Phoenix has done extensive testing on the Epyc X3D parts and has an excellent roadmap of where it shines. Serve the home also has a few good benches that show the X3D parts' strengths.

99% of their strengths are in semi-pro or full server level tasks. Sell it under the Epyc brand and watch them fly off the shelves.
The 16 core chips have always been sort of prosumer no? Who’s buying 16 core JUST for gaming?
 

MS_AT

Senior member
Jul 15, 2024
458
1,007
96
It does seem like an obvious place for AMD to work some magic in Zen 6 though, assuming they still care about client performance.
To quote part of the article conclusion
Zen 5’s improved op cache focuses on maximizing single threaded performance, while the decoders step in for certain multithreaded workloads where the op cache isn’t big enough.
So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.
Intel’s Lion Cove might not take the gaming crown, but Intel’s ability to run a plain 8-wide decoder at 5.7 GHz should turn some heads. For now, that advantage doesn’t seem to show up. I haven’t seen a high IPC, low-threaded workload with a giant instruction-side cache footprint. But who knows what the future will bring
And Intel's example shows that wider is not necessairly the answer.
 

yuri69

Senior member
Jul 16, 2013
602
1,056
136
So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.
AMD has been beefing the op cache since Zen 1. Zen 5 invested a lot in that cache. They are not about to drop it anytime soon, right?

Wider won't help Zen 5 all they need is to cut the frontend latency. Anyway, that hyped clustered frontend was purposedly done to handle server/SMT.
 

StefanR5R

Elite Member
Dec 10, 2016
6,275
9,591
136
They are not about to drop [the op cache] anytime soon, right?
Never; unless they would, for some reason, need to build a special-purpose core which sacrifices perf and perf/W for area savings. [Edit: Or if they, for some reason, made a core for a simpler to decode ISA such as ARM.]

Wider won't help Zen 5 all they need is to cut the frontend latency. Anyway, that hyped clustered frontend was purposedly done to handle server/SMT.
The clustered decoder might also help mobile (compared to a single decoder pipeline but with same total width [or even compared to any >4 wide single-pipeline decoder]): While there is only one thread running on a core, power off half of the decoder.
 
Last edited:

Thunder 57

Diamond Member
Aug 19, 2007
3,313
5,416
136
To quote part of the article conclusion

So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.

And Intel's example shows that wider is not necessairly the answer.

I don't know why some people thing uop caches are going away in the future. I think AMD should do something with the decoder though. Its worst case has essentially been 4 wide since Zen 1. Maybe they go to 2x5. That may be to big of a change for Zen 6 though.
 

StefanR5R

Elite Member
Dec 10, 2016
6,275
9,591
136
The clustered decoder might also help mobile (compared to a single decoder pipeline but with same total width [or even compared to any >4 wide single-pipeline decoder]): While there is only one thread running on a core, power off half of the decoder.
As for the power consumption aspect, I just rediscovered this post on prices to pay with wide decoders (in ISAs with variable instruction lengths):
Even after you know the lengths, you still have to pay the massively wide mux tree to align the instruction starts with the decoders. (Technically, this usually happens after first stage of decode begins on every byte boundary, but the point is, you still have to do it at some point.) This structure is huge and high-latency, and grows quadratically with decode width. (x86 instructions are 1-15 bytes long. First instruction slot selects first byte. Second instruction slot selects any byte between second, and 16th. Third slots selects any between third, and 31st. You get where this is going.) And unless instructions are always the same width, all those transistors switch every cycle, so you pay a lot of power too.
 

GTracing

Senior member
Aug 6, 2021
281
655
106
I don't know why some people thing uop caches are going away in the future. I think AMD should do something with the decoder though. Its worst case has essentially been 4 wide since Zen 1. Maybe they go to 2x5. That may be to big of a change for Zen 6 though.
I could see AMD or more likely Intel drop the micro op cache. Most of the ARM cores don't have one. Apple's M-series are the fastest and most efficient CPUs. It seems to me that wide decoders is more area efficient than a big micro op cache.
 
Reactions: Nothingness

Thunder 57

Diamond Member
Aug 19, 2007
3,313
5,416
136
I could see AMD or more likely Intel drop the micro op cache. Most of the ARM cores don't have one. Apple's M-series are the fastest and most efficient CPUs. It seems to me that wide decoders is more area efficient than a big micro op cache.

Decoding is power hungry. Decoding ARM is easier so less need for an opcache. It also allows them to use a longer pipeline with less drawbacks. Of course that's assuming the plan on keeping high frequencies in the future.
 

GTracing

Senior member
Aug 6, 2021
281
655
106
Decoding is power hungry. Decoding ARM is easier so less need for an opcache. It also allows them to use a longer pipeline with less drawbacks. Of course that's assuming the plan on keeping high frequencies in the future.
I'm not convinced that a micro op cache lowers power usage by that much. If that were true, then why don't ARM cores have one? They are more power constrained. The fact that ARM is easier to decode might mean that ARM decoders are less power hungry, but that answer isn't sufficient imho. I don't see how the difference between ARM and x86 could be so big that AMD is worried about decoder power consumption, but Apple isn't.

The way I see it, the micro op cache is primarily a way to increase performance, and that die area might be better spent making other parts of the core wider.
 

Abwx

Lifer
Apr 2, 2011
11,677
4,569
136
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |