Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 695 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

gdansk

Platinum Member
Feb 8, 2011
2,843
4,228
136
Strix Point is a small step forward. It is more competitive with SDXE in many areas. And it is shipping before M4 shows up in laptops.

But it is almost exactly what I didn't want. 9-10% higher INT 1T with a bigger FP and MT performance increase. I would prefer the opposite. The little cores have adequate performance but cause scheduling issues. Even on laptops that is a problem. The NPU only has a few uses because... Microsoft? Even Lisa Su asked what Microsoft was going to do with it. Apparently nothing. So it is perhaps a waste of die space. The bigger iGPU doesn't have a LLC and the slow memory controller limits its performance so it was also perhaps a waste of die space to add 4 extra CU.
 

CouncilorIrissa

Senior member
Jul 28, 2023
520
1,995
96
Strix Point is a small step forward. It is more competitive with SDXE in many areas. And it is shipping before M4 shows up in laptops.

But it is almost exactly what I didn't want. 9-10% higher INT 1T with a bigger FP and MT performance increase. I would prefer the opposite.
Yeah, I had some questions when I saw how much they beefed up the L1 -> FP PRF bandwidth and FP register file size compared to INT, so larger FP gains were expected, but not to this extent perhaps? I'm very interested in learning why they made the choices they did.

Looks bizarre on the surface, but there must be a good reason to do that.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Time to rewrite software to use FLOATs instead of INTs !!!

Well, Burroughs Large Systems (now sold as the Unisys Libra) use only FP (a strange 48-bit format), and have a lot of other things to recommend them (no buffer overflows! no assembly programming! high-level-language-centric from the start!)

Also, Javascript and Lua (prior to the int types introduced in... 5.3?) use FP as their primary scalar data type, though implementations generally try to avoid actually executing FP code for them when possible.

(Note: None of the above is me genuinely suggesting software be rewritten to use FP types. That sounds horrifying.)
 
Jul 27, 2020
19,613
13,479
146
(Note: None of the above is me genuinely suggesting software be rewritten to use FP types. That sounds horrifying.)
Why? Let's do it and deal with the fallout later


Maybe Kamala's new initiative upon getting elected should be, Everybody learn to rewrite software in FLOATs!
 
Reactions: Joe NYC

Timorous

Golden Member
Oct 27, 2008
1,748
3,239
136
Last edited:

tsamolotoff

Member
May 19, 2019
174
305
136
Time to rewrite software to use FLOATs instead of INTs !!!
If only they've bothered to do GMI-Wide, then Zen5 would probably interest me a bit, otherwise the calc speed is limited by RAM bandwidth. Hopefully x3d SKUs won't be as locked as they are now and more coolable, otherwise its not something that stirs my upgrade-ido.
 

DavidC1

Senior member
Dec 29, 2023
780
1,240
96
The thing is, in Strix Point there is really little that could boost the FP performance. The only thing worth of note is updated scheduler layout, bigger register file and lower add latency. Where int side was boosted with improvements to load, store, and more execution units. I mean int scalar side has seen much more changes comparatively speaking.
Improved load/store is also for FP. You can't be as granular so both 512-bit and 256-bit AVX cores have the same thing. Every time Intel doubled FP execution, they doubled Load/Store.

Integer is that much HARDER than FP to improve. So things such as increased core-to-core latency impacts Int much greater than on FP.

Also, this is another indication to me both P core teams(Intel and AMD) are struggling.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Why? Let's do it and deal with the fallout later


Maybe Kamala's new initiative upon getting elected should be, Everybody learn to rewrite software in FLOATs!

I'll be really specific here. I would rather jump off a cliff than rewrite any amount of crypto or int-oriented media/image code to use FP.

And yes, FP should still be avoided by JS/Lua engines when possible.
 

DavidC1

Senior member
Dec 29, 2023
780
1,240
96
Where is the proof?

For me, 9950X is the proof for Zen 5's frequency scaling. Where is Apple's proof of their silicon's frequency scaling superiority?
Who cares?

This is proof that both Intel and AMD are smoking the high frequency drug again ala Netburst/Bulldozer, for dubious marketing benefits when a better core would do better for sales.

They need a complete rejig of the architecture with clocks aimed at under 5GHz. A mini Conroe/Zen transition.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Who cares?

This is proof that both Intel and AMD are smoking the high frequency drug again ala Netburst/Bulldozer, for dubious marketing benefits when a better core would do better for sales.

They need a complete rejig of the architecture with clocks aimed at under 5GHz. A mini Conroe/Zen transition.

Yeah, "muh 6GHz" only really matters to those looking to something to latch on to.

Absolute single-thread performance matters. Apple does very well on it.
 

MS_AT

Senior member
Jul 15, 2024
209
497
96
Improved load/store is also for FP. You can't be as granular so both 512-bit and 256-bit AVX cores have the same thing. Every time Intel doubled FP execution, they doubled Load/Store.

Integer is that much HARDER than FP to improve. So things such as increased core-to-core latency impacts Int much greater than on FP.

Also, this is another indication to me both P core teams(Intel and AMD) are struggling.
It's not me:
Yeah, so I’ll try, I mean I know it is a little bit complex. So, we like to think about it of the data cache can handle 4 memory operations per cycle and so starting from that baseline on the load side, they can all be loads, 4 loads, we can do. Now based on the size of the load, because we only have a data path to the floating-point unit that’s you know 512 bits on two of the ports. You can only do 2 loads that are floating point.
from C&C Mike Clark interview and since STRIX doesn't have 512 data paths, it did not get improved FP bandwidth vs Zen4 but can do 4 loads on scalar side. At least this is how I understand this quote.
 

naukkis

Senior member
Jun 5, 2002
878
747
136
It's not me:

from C&C Mike Clark interview and since STRIX doesn't have 512 data paths, it did not get improved FP bandwidth vs Zen4 but can do 4 loads on scalar side. At least this is how I understand this quote.

This seems to verifyt only 1 512b load/store per cycle for strix point. But store bandwidth is doubled compared to Zen4. https://blog.hjc.im/zen-5-more-details-1.html
 
Reactions: igor_kavinski

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
OK!! The fabled 35% has finally reared its head! lol


I think Geekerwan's FP testing for Zen 4 is unrealistic. Even in their original Zen 4 testing, they found a <5% increase in IPC from spec2017 FP vs Zen 3, which doesn't make much sense. They also have Zen 4 in phoenix as having 12% lower FP IPC than RWC in MTL edit: should say they also have RWC in MTL as having 12% higher FP IPC than Zen 4, which I find hard to believe.
Raichu's testing found them to be extremely similar, within 5% of each other (tho Intel does have the lead).
 
Reactions: Tlh97 and inf64

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Please don't cry when AMD hits 6 GHz with one hand tied behind their back using TSMC A16.

Yeah, but like, I don't care. Only absolute ST perf (and its cousins, perf/mm2 and perf/W) really matters. It makes no difference to me if that's achieved at 6GHz or 4.5GHz or 1GHz. Right now, 5.7GHz Zen4 (I'll withhold judgment on Zen5 until GNR launches) looks just okay against 4.4GHz M4. It doesn't get treated as magically superior because it's running at a higher clock.
 

MS_AT

Senior member
Jul 15, 2024
209
497
96
I am not sure why it is like that, if due to historical reasons, but people generally treat SIMD performance as FP performance, but there are INT SIMD instructions, they are executed by the "FP" part of the core. So INT code could also use this 512b wide registers on granite ridge. The thing is the legacy software or most of the software we have today is not written with SIMD in mind. Since long time the dominant programming model is the object oriented programming and it doesn't lend well to SIMD. (IMO at least, but it's not a place for this discussion). So what matters more, because you don't have to rewrite or recompile code to use it, is the scalar part of the core and the front-end.
This seems to verifyt only 1 512b load/store per cycle for strix point. But store bandwidth is doubled compared to Zen4. https://blog.hjc.im/zen-5-more-details-1.html
Zen 4 could only do 0.5 512b store per cycle. Or rather it could only do 256b store per cycle. Now Zen5 can do 2x256 stores per cycle [or 1x512] according to the same interview. So Strix Point got the improvement to Store bandwidth not to the load bandwidth.
 
Jul 27, 2020
19,613
13,479
146
Right now, 5.7GHz Zen4 (I'll withhold judgment on Zen5 until GNR launches) looks just okay against 4.4GHz M4. It doesn't get treated as magically superior because it's running at a higher clock.
The thing with these desktop designs is that they are extremely bandwidth starved. If Zen 5 had the kind of membw that M4 has, its IPC would really fly. Apple has overbuilt their memory subsystem so much that even all processing blocks running at 100% usage can't saturate their available membw because they are running too slow to crunch more data. Zen 5/6/7 has greater chances of scaling with better performing memory in future, especially DDR6. Just look at how much Zen 4 flies with X3D's cache bandwidth. Zen5X3D will be an INT/FP data devouring monster.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |