Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

naukkis · Jul 30, 2024

soresu said:
I've heard exactly the same argument about AMD FSA/HSA years back and it never amounted to anything.

It's still just GPGPU compute when you ignore the fluff.

x86 ecosystem won't have working solution. Apple does, every Apple cpu have gpu attached and working driver model to use it. AVX512 instead, how greatly it's implemented in Zen5 brings absolutely zero performance increase to desktop/mobile use cases. Because it's not used to anything as most of installed cpu base won't support it.

desrever · Jul 30, 2024

SarahKerrigan said:
This is a dubious argument. SPEC cannot be made magically faster by a OS.

It can be optimized down to the individual tasks pretty easily, why would you even think this?

Even SPEC itself says to test with multiple OS and compilers.

FlameTail · Jul 30, 2024

naukkis said:
x86 is towards wider SIMD units. ARM side has more 128 bit execution pipes and load/store ports. Apple and Cortex-x925 with its six 128bit NEON pipes is anything but weak in SIMD execution. x86 targets high clocks and many SIMD units and load/store-ports are hard to make working together but Intel E-cores take that same ARM approach - more 128 bit execution ability and performance seems to be there for x86 too.

There's certainly some interesting discussions to be had about design choices in modern CPUs.

Wider SIMD units vs More SIMD units

Monolithic Decoder vs Clustered Decoders

uOP caches vs No uOP cache

L1/L2 cache vs L0/L1/L2/L3 cache

SarahKerrigan · Jul 30, 2024

desrever said:
It can be optimized down to the individual tasks pretty easily, why would you even think this?

Even SPEC itself says to test with multiple OS and compilers.

Nah, explain in depth. What "optimization" can an OS do to make SPEC faster?

DavidC1 · Jul 30, 2024

SarahKerrigan said:
To the extent it failed at all (which it didn't, really, but it obviously isn't with us anymore) - there are reasons way beyond that; every gen late, most of them missing clock targets, IBM having damn near miraculous execution in the early 2000s, general decline of the RISC/UNIX market after K8, perception by potential Itanium OEMs that HP was structurally privileged in IPF, etc, etc, etc, etc

Intel probably really didn't believe in the project as they claim they did. Because the money flow never existed with Itanium to really matter, and a goal of business is to make money.

There are many projects that continued despite the reluctance of the creator because it turned out to be massively financially successful, since even the creator of the project did not forsee what the market saw. Intel demonstrated time and time again when the money is at stake, they eventually come around.

And successful projects also require feedback, which you receive in bounds with x86, and almost nothing with Itanium.

SarahKerrigan · Jul 30, 2024

DavidC1 said:
Intel probably really didn't believe in the project as they claim they did. Because the money flow never existed with Itanium to really matter, and a goal of business is to make money.

There are many projects that continued despite the reluctance of the creator because it turned out to be massively financially successful, since even the creator of the project did not forsee what the market saw. Intel demonstrated time and time again when the money is at stake, they eventually come around.

And successful projects also require feedback, which you receive in bounds with x86, and almost nothing with Itanium.

Intel wasn't the creator.

igor_kavinski · Jul 30, 2024

SarahKerrigan said:
Nah, explain in depth. What "optimization" can an OS do to make SPEC faster?

Nah since he showed M2 failing miserably in those Linux benchmarks, MacOS must be cheating and "detecting" SPEC runs

Their black optimization art is really just a SPEC detector built into their silicon!

And should Apple wish to defend their honor, they better fix whatever is causing their SoCs to tank under Linux!

deathBOB · Jul 30, 2024

I see some people are following Intel’s lead and having their own processor meltdowns.

desrever · Jul 30, 2024

igor_kavinski said:
Nah since he showed M2 failing miserably in those Linux benchmarks, MacOS must be cheating and "detecting" SPEC runs

Their black optimization art is really just a SPEC detector built into their silicon!

SarahKerrigan said:
Nah, explain in depth. What "optimization" can an OS do to make SPEC faster?

Explain in depth why there is difference between different OS in SPEC then?

DavidC1 · Jul 30, 2024

FlameTail said:
Could you elaborate on why that is the case?

This is the SpecCPU suite: https://www.spec.org/cpu2017/Docs/overview.html#benchmarks

Integer: Compiler, Artificial Intelligence, Route Planning, Compression

All require "smarts" to advance. And they are latency sensitive. Like an easy example is games. Like when you are playing games, and the CPU needs to respond to you. Does bandwidth matter? Only to a point. Does latency? Always. Branch prediction accuracy? Absolutely. More units? Yes. More cache? Yes.

FP: Lot of them are 3D modelers where it's highly parallelizable. It likes latency, branch prediction, and more units too, but not as much. The first widely used consumer FP use was for 3D acceleration in games(before Voodoo 3D cards).

The thing is, improving Integer performance gives gains in all areas. AI, Cryptography, web browsing, encoding, games, everything.

Despite the simplicity in explanation, the fact is before the variants of Intel 80486, FP units were discrete cards. An accelerator. Why did Skymont get extra 30% over Integer by simply doubling the number of vector units?

SarahKerrigan said:
Intel wasn't the creator.

Well, they were co-creators, were they not? They were eventually saddled with it anyway. Even if that's the case, HP themselves pretty much gave it away, which kinda speaks for itself.

SarahKerrigan · Jul 30, 2024

desrever said:
Explain in depth why there is difference between different OS in SPEC then?

There isn't a significant difference between OS's in SPEC, ime, assuming both have "normal" support for the hardware (ie, not broken power management.) There's some difference between compilers.

Nothingness · Jul 30, 2024

soresu said:
Try that again in FP/SIMD and M4 dies a horrible death 🤣

4x 128 bit vs 4x 512 bit isn't even a competition.

Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.

StefanR5R · Jul 30, 2024

SarahKerrigan said:
SPEC takes maybe two hours to set up, including installing and writing up a config.

And if somebody is on a toy platform without 1st party compiler support for one of the three languages in SPEC, they can omit 1 int and 7 fp subtests to keep it easy and quick.

SarahKerrigan · Jul 30, 2024

DavidC1 said:
Well, they were co-creators, were they not? They were eventually saddled with it anyway. Even if that's the case, HP themselves pretty much gave it away, which kinda speaks for itself.

HP developed essentially the whole thing as PA-WideWord before Intel joined the project in 1995ish and canceled their own RISC project. The Itanium2 core came out of HP's design group in Fort Collins; the Poulson core that came out in 2012 (and proved to be the last Itanium) came from Intel Hudson.

HP didn't "give it away." There was a calculated move to transfer ownership of the Itanium product line to Intel, even though design work was continuing to be done in Fort Collins, to make it a "neutral" merchant platform where OEMs would not be competing with their CPU vendor.

DavidC1 · Jul 30, 2024

Nothingness said:
Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.

Doubling FP units versus going to wider SIMD is an art of compromise between area, total available market, and power use.

SarahKerrigan said:
HP developed essentially the whole thing as PA-WideWord before Intel joined the project in 1995ish and canceled their own RISC project. The Itanium2 core came out of HP's design group in Fort Collins; the Poulson core that came out in 2012 (and proved to be the last Itanium) came from Intel Hudson.

Ok, but doesn't that strengthen the idea that Intel had no motivation financially or otherwise to make it a real good chip? When Intel first got in the market, the computer market was nonexistent, so it was much easier decision to make which route to go, and it came to be x86. If they had EPIC then, they could have done it.

Same with the iPAX or whatever it was called that Intel tried to push.

Only in benchmarks we can do Apple vs. Apple comparisons. In the real world we can't always normalize them.

naukkis · Jul 30, 2024

Nothingness said:
Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.

That's easy. All Apple code uses same 128 bit Neon instructions and have full fp SIMD power. For Zen5 situation is different. 99% of codebase use 128bit SSEx and 1% is 256bit avx2. Zero percent uses AVX512. For general code Apple's solution is so much better that it ain't funny. x86 codebase is badly fragmented and ain't moving forward because Intel won't want to support newest instructions on all models. Intel will focus in next generations to bring up performance to those 128-bit units which actually have software support. Extensions without proper top to bottom installed cpu support are useless.

Doug S · Jul 30, 2024

igor_kavinski said:
The longer you stay there, the more screwed you will get. Need to invest again in max 7 years or maybe earlier when RAM/SSD limitations start troubling you or when the SSD writes get exhausted. No incremental upgrades. There's no guarantee that your laptop will even work flawlessly that long unless you are "subscribed" to AppleCare. RAID 1? What's that, asks Apple? Cheaper workstation laptops give you that option. ECC? Again, what's that, Apple asks? You use what we give you, now shut up and go back to saving money for your next whole laptop upgrade, Apple tells you. Yes, I'm sure it's Paradise on your side of the world.

Show me the x86 laptop that takes ECC RAM. And I mean a REAL laptop, not some 7 lb "desktop replacement" brick with a multiple fans that are spinning so fast for anything beyond idle they sound like dentist drills!

igor_kavinski · Jul 30, 2024

Doug S said:
Show me the x86 laptop that takes ECC RAM. And I mean a REAL laptop, not some 7 lb "desktop replacement" brick with a multiple fans that are spinning so fast for anything beyond idle they sound like dentist drills!

Not in the mood to search one (there are a LOT more x86 laptop makers, niche ones).

Pity you don't have an Apple 7 pounder that doesn't make drill sounds. I feel sorry for you.

Nothingness · Jul 30, 2024

desrever said:
Apple can optimize their whole stack from the OS level down, including their browser. And for some benchmarks, they can optimize the entire benchmark too. There just won't be any other CPU that beat them at performance/watt simply due to this. Nobody really knows how much black magic optimization is happening for Apple devices running Apple silicon. Here is Apple silicon on Linux:

View attachment 104180
M2 gets absolutely trashed by Zen 4. M4 vs Strix would probably be similar. Of course optimizations in Linux for isn't perfect so there is gains to be had under Linux for Apple silicon but my point is the hardware isn't the whole thing. The Software matters, probably way more than people give it credit.

This just shows the issue with Phoronix when doing cross architecture comparisons: the software they use will surely have much heavier hand tuning on x86 than on Arm.

Also Linux on Apple wasn't likely properly tuned back then (I'm not sure it is now).

Add to that Phoronix mixes in a single score ST and MT workloads and you have the perfect tool to provide useless comparisons.

In the mean time all people properly compiling and running an OS/ISA agnostic benchmark such as SPEC all show the same thing: Apple is at the top of the ST charts.

Doug S · Jul 30, 2024

Abwx said:
Computerbase test show that the Apple core doesnt manage to hit its max frequency more than 10s, after wich it throttle at half the power, so the ST perf is actually 25-30% lower than what is displayed, so much for the 9950X delivering less perf.

Edit : That s why Geekbench do pauses between each test, that s a blatant help for Apple as their core wouldnt yield good numbers if the tests would be run without delays.

Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint. What's your excuse going to be then? Are you going to say that integer benchmarks are useless, and we should only look at FP? I'm sure if you look hard enough you can find a benchmark that tells you what you want to hear, and can come up with an explanation why that's the One True Benchmark and all the rest have been paid off by Apple to favor them.

igor_kavinski · Jul 30, 2024

Doug S said:
Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint.

Oh yay! People with SPEC fetish have M4 to play with and grin about! The world is finally a bed of roses now.

SarahKerrigan · Jul 30, 2024

Doug S said:
Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint. What's your excuse going to be then? Are you going to say that integer benchmarks are useless, and we should only look at FP? I'm sure if you look hard enough you can find a benchmark that tells you what you want to hear, and can come up with an explanation why that's the One True Benchmark and all the rest have been paid off by Apple to favor them.

Yeah, I have no use for Apple products myself but the constant drumbeat of excuses that have been made for the last ten years about how Apple cores aren't really as fast as they appear is starting to get irritating, especially since the same dumb arguments are increasingly being applied to Cortex.

SarahKerrigan · Jul 30, 2024

igor_kavinski said:
Oh yay! People with SPEC fetish have M4 to play with and grin about! The world is finally a bed of roses now.

Get a hobby.

igor_kavinski · Jul 30, 2024

SarahKerrigan said:
Get a hobby.

AT isn't a hobby???

FlameTail · Jul 30, 2024

SarahKerrigan said:
At this point, I genuinely cannot tell if you are this clueless or if you're doing some kind of bizarre performance art.

So I am not the only one who's baffled. It seems Igor has turned into a different person starting this month!

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Member

Diamond Member

Senior member

Senior member

Senior member

Lifer

Senior member

Member

Senior member

Senior member

Diamond Member

Elite Member

Senior member

Senior member

Senior member

Platinum Member

Lifer

Diamond Member

Platinum Member

Lifer

Senior member

Senior member

Lifer

Diamond Member