Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

SarahKerrigan · Jul 30, 2024

igor_kavinski said:
The thing with these desktop designs is that they are extremely bandwidth starved. If Zen 5 had the kind of membw that M4 has, its IPC would really fly.

On single-thread? No it wouldn't. SPECint is barely sensitive to DRAM bandwidth at all. Essentially no single-threaded load that isn't a vector microbenchmark is going to saturate a modern CPU's memory bandwidth.

This is crap you just hallucinated.

igor_kavinski · Jul 30, 2024

MS_AT said:
I am not sure why it is like that, if due to historical reasons, but people generally treat SIMD performance as FP performance, but there are INT SIMD instructions, they are executed by the "FP" part of the core. So INT code could also use this 512b wide registers on granite ridge. The thing is the legacy software or most of the software we have today is not written with SIMD in mind. Since long time the dominant programming model is the object oriented programming and it doesn't lend well to SIMD. (IMO at least, but it's not a place for this discussion). So what matters more, because you don't have to rewrite or recompile code to use it, is the scalar part of the core and the front-end.

Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!

SarahKerrigan · Jul 30, 2024

igor_kavinski said:
Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!

At this point, I genuinely cannot tell if you are this clueless or if you're doing some kind of bizarre performance art.

CouncilorIrissa · Jul 30, 2024

igor_kavinski said:
Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!

That worked a total of zero times in the history of computing.

igor_kavinski · Jul 30, 2024

SarahKerrigan said:
On single-thread? No it wouldn't. SPECint is barely sensitive to DRAM bandwidth at all. Essentially no single-threaded load that isn't a vector microbenchmark is going to saturate a modern CPU's memory bandwidth.

This is crap you just hallucinated.

I didn't say ST anywhere in my post. Of course I was referring to MT. The combined IPC of all the Zen 5 cores with access to same membw as M4 would be a lot higher.

SarahKerrigan · Jul 30, 2024

igor_kavinski said:
I didn't say ST anywhere in my post. Of course I was referring to MT. The combined IPC of all the Zen 5 cores with access to same membw as M4 would be a lot higher.

"16 cores DESTROY four heavy and four light Apple cores!"

A truly stirring defense of x86.

(Nobody in the history of ever has measured "IPC across all cores.")

DavidC1 · Jul 30, 2024

MS_AT said:
So what matters more, because you don't have to rewrite or recompile code to use it, is the scalar part of the core and the front-end.

These are called general purpose chips for a reason.

uarch gains have always been scalar code. Improving scalar code has the side benefit of improving FP, and AIeee!! as well.

MS_AT · Jul 30, 2024

igor_kavinski said:
Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!

It wasn't meant to be this way. It's as much true for Intel The x64 crowd for some reason is going towards SIMD when ARM (including Apple) in comparison have rather weak SIMD execution side. Might be because x64 guys want to extract as much perf as they can per instruction while ARM guys care less because ARM decoders are easier to implement so they spam decoders and scalar execution units. But as Apple shows for the general purpose software this seems to be a better strategy. [I mean no disrespect to Apple engineers when I say they are spamming something, it's just an observation they they have considerably more resources on that side]. To boot, x64 side is penalized with only 16 GPRs, so it will spill to cache more often vs 32 GPRs on ARM, and since they are clocking higher each spill cost relatively more [I mean access latency is usually > 4 cycles on x64 side and it thinks it's less on ARM side but haven't checked the docks, I might be wrong]

igor_kavinski · Jul 30, 2024

SarahKerrigan said:
At this point, I genuinely cannot tell if you are this clueless or if you're doing some kind of bizarre performance art.

Think about it. AMD is becoming a software company. What if they get AI to write their compilers and some other widely used open source libraries to make the maximum use of their architectures?

SarahKerrigan · Jul 30, 2024

igor_kavinski said:
Think about it. AMD is becoming a software company. What if they get AI to write their compilers and some other widely used open source libraries to make the maximum use of their architectures?

wow.

DavidC1 · Jul 30, 2024

MS_AT said:
It wasn't meant to be this way. It's as much true for Intel The x64 crowd for some reason is going towards SIMD when ARM (including Apple) in comparison have rather weak SIMD execution side. Might be because x64 guys want to extract as much perf as they can per instruction while ARM guys care less because ARM decoders are easier to implement so they spam decoders and scalar execution units.

That's why the saner side for x86 is the Intel E core team ignoring AVX-512 and straight up doubling the number of vector units like ARM has been doing.

CouncilorIrissa · Jul 30, 2024

igor_kavinski said:
Think about it. AMD is becoming a software company. What if they get AI to write their compilers and some other widely used open source libraries to make the maximum use of their architectures?

Mind sharing some of the stuff you're on?

igor_kavinski · Jul 30, 2024

SarahKerrigan said:
"16 cores DESTROY four heavy and four light Apple cores!"

A truly stirring defense of x86.

By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.

igor_kavinski · Jul 30, 2024

CouncilorIrissa said:
Mind sharing some of the stuff you're on?

No banned substances. Just one Black teabag, less than half teaspoon Nescafe instant coffee and one Green teabag. So far.

SarahKerrigan · Jul 30, 2024

CouncilorIrissa said:
Mind sharing some of the stuff you're on?

I'm really not sure you want that.

DavidC1 · Jul 30, 2024

igor_kavinski said:
By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.

Yes, we must laud AMD with all the praise for a 5.7GHz core consuming 30W+ at lower performance level than a 4.4GHz one using less than 8W.

A monster truck with a V16 engine with fuel consumption measured in Gallons per mile performs same in off road conditions as a V4 subcompact hybrid. It's simply awesome!

CouncilorIrissa · Jul 30, 2024

igor_kavinski said:
By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.

about 300 pages ago this literal thread was raving about how 1T performance was the only thing that mattered and now we're back to "muh core count" mantra? lmao

naukkis · Jul 30, 2024

MS_AT said:
It wasn't meant to be this way. It's as much true for Intel The x64 crowd for some reason is going towards SIMD when ARM (including Apple) in comparison have rather weak SIMD execution side. Might be because x64 guys want to extract as much perf as they can per instruction while ARM guys care less because ARM decoders are easier to implement so they spam decoders and scalar execution units. But as Apple shows for the general purpose software this seems to be a better strategy. [I mean no disrespect to Apple engineers when I say they are spamming something, it's just an observation they they have considerably more resources on that side]. To boot, x64 side is penalized with only 16 GPRs, so it will spill to cache more often vs 32 GPRs on ARM, and since they are clocking higher each spill cost relatively more [I mean access latency is usually > 4 cycles on x64 side and it thinks it's less on ARM side but haven't checked the docks, I might be wrong]

x86 is towards wider SIMD units. ARM side has more 128 bit execution pipes and load/store ports. Apple and Cortex-x925 with its six 128bit NEON pipes is anything but weak in SIMD execution. x86 targets high clocks and many SIMD units and load/store-ports are hard to make working together but Intel E-cores take that same ARM approach - more 128 bit execution ability and performance seems to be there for x86 too.

DavidC1 · Jul 30, 2024

igor_kavinski said:
By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.

The reason that we don't need to compare to Bulldozer is because compared to best cores the x86 cores kinda do feel like Bulldozer. The reason people can't make the connection is because x86 cores are insulated by the ISA bubble.

Someone said Intel had a 50% 1T lead over Bulldozer right?

How much faster is M4 over Zen 5 per clock again? Oh right, 59% in Int.

igor_kavinski · Jul 30, 2024

DavidC1 said:
Yes, we must laud AMD with all the praise for a 5.7GHz core consuming 30W+ at lower performance level than a 4.4GHz one using less than 8W.

That monster "electricity eating" core opens way more computing possibilities for the world than Apple's closed everything ecosystem. Apple's core might be general purpose but it's not "general user". Restricted to the elite.

I would have no issue declaring Apple the winner if people could straight away ditch x86 and move to ARM, without taking a hit to their pockets and bank accounts.

DavidC1 · Jul 30, 2024

igor_kavinski said:
I would have no issue declaring Apple the winner if people could straight away ditch x86 and move to ARM, without taking a hit to their pockets and bank accounts.

Hence...

DavidC1 said:
because x86 cores are insulated by the ISA bubble.

Might I say the government consistently giving a win for Intel vs outsiders have to do something with it?

If let's say Nvidia was allowed x86 license with Denver?

soresu · Jul 30, 2024

DavidC1 said:
How much faster is M4 over Zen 5 per clock again? Oh right, 59% in Int.

Try that again in FP/SIMD and M4 dies a horrible death 🤣

4x 128 bit vs 4x 512 bit isn't even a competition.

SME improves some specific things for them in SIMD - but not everything.

igor_kavinski · Jul 30, 2024

DavidC1 said:
How much faster is M4 over Zen 5 per clock again? Oh right, 59% in Int.

That's short-lived. They won't be able to sustain their IPC/frequency increases for long. Haven't you read reports of Mx chips working at up to 117C? Howz that gonna be for some nice cooked silicon degradation?

CouncilorIrissa · Jul 30, 2024

igor_kavinski said:
That monster "electricity eating" core opens way more computing possibilities for the world than Apple's closed everything ecosystem. Apple's core might be general purpose but it's not "general user". Restricted to the elite.

I would have no issue declaring Apple the winner if people could straight away ditch x86 and move to ARM, without taking a hit to their pockets and bank accounts.

Aight, I'm done, that's it for me for today, have fun, folks.

DavidC1 · Jul 30, 2024

soresu said:
Try that again in FP/SIMD and M4 dies a horrible death 🤣

4x 128 bit vs 4x 512 bit isn't even a competition.

SME improves some specific things for them in SIMD - but not everything.

You do know FP is much easier to boost right?

If it was so important why is Sapphire Rapids regarded as almost a worthless chip?

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Lifer

Senior member

Senior member

Lifer

Senior member

Senior member

Senior member

Lifer

Senior member

Senior member

Senior member

Lifer

Lifer

Senior member

Senior member

Senior member

Senior member

Senior member

Lifer

Senior member

Diamond Member

Lifer

Senior member

Senior member