Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

StefanR5R · Jul 15, 2024

branch_suggestion said:
Will be interesting to see iso-clock benches with neutered FPU Strix vs full FPU GNR.

Have AMD themselves mentioned that the FPUs differ?

Mopetar · Jul 15, 2024

Philste said:
At Computex they claimed ZEN5 trashes 14900K. If these Numbers are right, 14900K will be slightly faster instead

All existing 14900K numbers are worthless. I don't care how good they look if the settings required to attain them mean I'll be on my 4th RMA sometime later this year.

Jan Olšan · Jul 15, 2024

DavidC1 said:
I'm not convinced the clustered decode on Zen 5 works well on ST. David Huang got zero from his ST tests

He used a sequence of NOPs specially crafted to measure it, not a realistic code. The explanation could be that the sequence had no branches. Different microbenchmarking code would be need to catch the effect of both decoder clusters getting used.

igor_kavinski · Jul 15, 2024

Hail The Brain Slug said:
If this is the case, 9950X will be priced at or below $600. I am going extremely optimistically with $499 and know that will be unlikely unless I can manifest it hard enough. I've got my copy of The Secret on my desk and I try to re-read parts of it every day.

You are too noble. I would manifest for it to fall into my lap

itsmydamnation · Jul 15, 2024

Jan Olšan said:
He used a sequence of NOPs specially crafted to measure it, not a realistic code. The explanation could be that the sequence had no branches. Different microbenchmarking code would be need to catch the effect of both decoder clusters getting used.

yes , i said in the other thread the fact they are trying to look further ahead for branches and that the uop cache is multi ported says they are trying to run ahead and stick stuff in the op cache

HurleyBird · Jul 15, 2024

Jan Olšan said:
He used a sequence of NOPs specially crafted to measure it, not a realistic code. The explanation could be that the sequence had no branches. Different microbenchmarking code would be need to catch the effect of both decoder clusters getting used.

I'm not sure whether this is pertinent to the benchmark results, but it's worth mentioning Clark stated no op (NOP) fusion was removed in Zen 5 during the Chips & Cheese interview. Reading between the lines a bit, it sounds like AMD sacrificed a number of optimizations that would have needed to be rebuilt for dual decode on the alter of cadence.

Jan Olšan · Jul 15, 2024

There were similar cases of features in Zen 2 not making it to Zen 3 but then reappearing in Zen 4. It probably shows how they are not lying when saying they basically re-architect the core anew in the odd-numbered generations (refining than in the even ones). Sometimes they don't invest the effort to bring in all the features from the prior core, probably calculating that they will hit nice IPC gains even without them and preferring to focus on other parts that are perhaps more critical for the new uarch. Sometimes it could be due to not branching the development from the finished n-1 core, but more from some point after n-2 with some bits of n-1. But then the stuff can get back in later.

itsmydamnation · Jul 15, 2024

HurleyBird said:
I'm not sure whether this is pertinent to the benchmark results, but it's worth mentioning Clark stated no op fusion was removed in Zen 5 during the Chips & Cheese interview. Reading between the lines a bit, it sounds like AMD sacrificed a number of optimizations that would have needed to be rebuilt for dual decode on the alter of cadence.

except for all the op fusion they kept. why would dual decode affect op fusion , its post decode at dispatch AFAIK.

its probably a width of dispatch vs complexity vs timing/power thing.

branch_suggestion · Jul 16, 2024

Cores do usually gain performance over time as codebases get updated, but Z5 does seem to be an outsized FineWine candidate based on what Clark said.

HurleyBird · Jul 16, 2024

itsmydamnation said:
except for all the op fusion they kept. why would dual decode affect op fusion , its post decode at dispatch AFAIK.

From Mike Clark:

Part of the reason I would say we didn’t put let’s say no op fusion into Zen 5 is that we had that wider dispatch. Zen 1 to Zen 4 had that 6 wide dispatch and 4 ALUs, so getting the most out of that 6-wide dispatch was important and it drove some complexity into the dispatch interface to be able to do that. When looking at having the capability of an 8-wide dispatch and putting no op fusion on top of it, it didn’t really seem to pay off for the complexity because we had that wider dispatch natively. But you may see it come back. Zen 5 is sort of a foundational change to get to that 8-wide dispatch and 6 ALUs. We’re now going to try to optimize that pinch point of the architecture to get more and more out of it and so you know as we move forward, no op fusion is likely to come back as a good leverage of that eight wide dispatch. But for the first generation, we didn’t want to bite off the complexity.

There are two micro-op caches... are there two micro-op queues also? If the two paths converge at dispatch, it's plausible there's complexity in fusing ops that arrive from different paths. If they converge prior to dispatch, maybe fusion is taking place earlier?

itsmydamnation · Jul 16, 2024

HurleyBird said:
From Mike Clark:

There are two micro-op caches... are there two micro-op queues also? If the two paths converge at dispatch, it's plausible there's complexity in fusing ops that arrive from different paths. If they converge prior to dispatch, maybe fusion is taking place earlier?

why lie by omission.............

Mike Clark: We don’t support no op (NOP) fusion. We do have a lot of op fusion that’s similar, we still fuse branches and there’s some other cases that we fuse.

Part of the reason I would say we didn’t put let’s say no op fusion into Zen 5 is that we had that wider dispatch. Zen 1 to Zen 4 had that 6 wide dispatch and 4 ALUs, so getting the most out of that 6-wide dispatch was important and it drove some complexity into the dispatch interface to be able to do that. When looking at having the capability of an 8-wide dispatch and putting no op fusion on top of it, it didn’t really seem to pay off for the complexity because we had that wider dispatch natively. But you may see it come back. Zen 5 is sort of a foundational change to get to that 8-wide dispatch and 6 ALUs. We’re now going to try to optimize that pinch point of the architecture to get more and more out of it and so you know as we move forward, no op fusion is likely to come back as a good leverage of that eight wide dispatch. But for the first generation, we didn’t want to bite off the complexity.

HurleyBird · Jul 16, 2024

itsmydamnation said:
why lie by omission.............

Think we had a miscommunication. Not disagreeing over "except all the op fusion they kept," and didn't mean to imply all op fusion was removed.

MS_AT · Jul 16, 2024

StefanR5R said:
Have AMD themselves mentioned that the FPUs differ?

Nothing I can find. And if what David found is true, I mean the silicon is different and it wasn't an ES effect [not fully working microcode etc] then it's big lie by omission on AMD marketing dept. part seeing the press materials never differentiate Strix-Point Zen5 core from Granite Ridge Zen5 core, they only mention the distinction between Zen5 and Zen5c.

gaav87 · Jul 16, 2024

Saylick said:
RTG: Ladies and Gentlemen, I was told from sources that Zen 6 will have mid-double digits IPC gain. Well, the middle between 10% and 99% is roughly 60%. Zen60% confirmed.

Source (probably): Expect modest gains for Zen 6, like no more than 15% IPC gain.

Watch out!
Red gaming tech will make, a video out of your post

gaav87 · Jul 16, 2024

igor_kavinski said:
You are too noble. I would manifest for it to fall into my lap

Will the leaker increase the screenshot dosage now that r23 scores are out anyway ?

therealmongo · Jul 16, 2024

Bring on all the new user accounts. Blue team in disaster recovery mode

poke01 · Jul 16, 2024

branch_suggestion said:
Cores do usually gain performance over time as codebases get updated, but Z5 does seem to be an outsized FineWine candidate based on what Clark said.

You can get fine wine already with cachyOS with their latest Zen4/5 optimised release.

Excluding the AVX-512 datasets it’s about a 14.5% gain in IPC. It’s clear that Zen 5 is a server first architecture more than every other Zen.

igor_kavinski · Jul 16, 2024

40W: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41252027
60W: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41252719

Goop_reformed · Jul 16, 2024

igor_kavinski said:
40W: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41252027
60W: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41252719

Stop teasing and post the juicy stuff already

tsamolotoff · Jul 16, 2024

igor_kavinski said:
View attachment 103151

40W: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41252027
60W: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41252719

So at 80W Z5 is roughly equivalent to stock 5950x (140w?)

tsamolotoff · Jul 16, 2024

Geddagod said:
The FPU had large changes.
2x vector register file

I was talking about ALU count, looks like it's the same six as Zen4? Did they just made them wider then? I was under impression that the ALU count was substantially increased, hence all this 'jebaited expectations' spiel of last few weeks.

Abwx · Jul 16, 2024

tsamolotoff said:
So at 80W Z5 is roughly equivalent to stock 5950x (140w?)

It s 12% faster than a stock 5950X, from the previous pic at 60W where it is 8.45% below it should match it at about 68W in this plateform.

MS_AT · Jul 16, 2024

tsamolotoff said:
I was talking about ALU count, looks like it's the same six as Zen4? Did they just made them wider then? I was under impression that the ALU count was substantially increased, hence all this 'jebaited expectations' spiel of last few weeks.

Why would they increase the execution resources on the FP side, if they cannot sustain more than 2x512b loads per cycle? [It's still a great improvement from Zen4 btw, that could do only 1x512b]. Not sure what is the story with FP stores if they can do 2x512b or 1x512, but either of them is also nice improvement over zen4 that could only do 0.5x512b per cycle store.

What I want to say is, that to increase fpu resources even further they would have needed to provide more bandwidth, otherwise it would be wasted silicon. [They can already do 2x512b ADDs and 2x512b FMA per cycle and I am not sure Intel ever had that on any core. I remember 2x512b FMA but not sure if concurrent adds were possible].

Det0x · Jul 16, 2024

Dunno if real numbers, but this was posted over at WCCTech "forum" by a 13900KF user

13900KF @ 150w packet power = 34.9k points in Cinebench r23

13900KF @ 170w packet power = 36.3k points in Cinebench r23

13900KF @ 190w packet power = 37.2k points in Cinebench r23

13900KF @ 270w packet power = 40.4k points in Cinebench r23

I'm not too familiar with raptor lakes power/performance curve, are these normal/average numbers for the 13900K SKU ?
Can maybe used as a comparison for the higher ES PPT numbers 🧐

Abwx · Jul 16, 2024

Det0x said:
Dunno if real numbers, but this was posted over at WCCTech "forum" by a 13900KF user

13900KF @ 150w packet power = 34.9k points in Cinebench r23

13900KF @ 170w packet power = 36.3k points in Cinebench r23
View attachment 103153

13900KF @ 190w packet power = 37.2k points in Cinebench r23

There s nothing exceptional, the guy is just unaware that his chip still consume much more than Zen 4 in this very bench which is a best case for Intel.

FI he boast 36k3 at 170W, just imagine that it s about the score of a stock 7950X3D that use barely 130W to do so, with some UV and tweaking like this one you can get at 110W.

FTR a 14900KS does 41k at stock and using 330W, so his score at 270W is not even much better overall than a stock chip.

Guess that s telling at wich point some people are in denial, seeing as great what is actually very mediocre, but hey, that s my prefered brand.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Elite Member

Diamond Member

Senior member

Lifer

Diamond Member

Platinum Member

Senior member

Diamond Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Senior member

Senior member

Senior member

Member

Diamond Member

Lifer

Senior member

Senior member

Senior member

Lifer

Senior member

Golden Member

Lifer