- Mar 3, 2017
- 1,747
- 6,598
- 136
The thing with these desktop designs is that they are extremely bandwidth starved. If Zen 5 had the kind of membw that M4 has, its IPC would really fly.
Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!I am not sure why it is like that, if due to historical reasons, but people generally treat SIMD performance as FP performance, but there are INT SIMD instructions, they are executed by the "FP" part of the core. So INT code could also use this 512b wide registers on granite ridge. The thing is the legacy software or most of the software we have today is not written with SIMD in mind. Since long time the dominant programming model is the object oriented programming and it doesn't lend well to SIMD. (IMO at least, but it's not a place for this discussion). So what matters more, because you don't have to rewrite or recompile code to use it, is the scalar part of the core and the front-end.
Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!
That worked a total of zero times in the history of computing.Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!
I didn't say ST anywhere in my post. Of course I was referring to MT. The combined IPC of all the Zen 5 cores with access to same membw as M4 would be a lot higher.On single-thread? No it wouldn't. SPECint is barely sensitive to DRAM bandwidth at all. Essentially no single-threaded load that isn't a vector microbenchmark is going to saturate a modern CPU's memory bandwidth.
This is crap you just hallucinated.
I didn't say ST anywhere in my post. Of course I was referring to MT. The combined IPC of all the Zen 5 cores with access to same membw as M4 would be a lot higher.
These are called general purpose chips for a reason.So what matters more, because you don't have to rewrite or recompile code to use it, is the scalar part of the core and the front-end.
It wasn't meant to be this way. It's as much true for Intel The x64 crowd for some reason is going towards SIMD when ARM (including Apple) in comparison have rather weak SIMD execution side. Might be because x64 guys want to extract as much perf as they can per instruction while ARM guys care less because ARM decoders are easier to implement so they spam decoders and scalar execution units. But as Apple shows for the general purpose software this seems to be a better strategy. [I mean no disrespect to Apple engineers when I say they are spamming something, it's just an observation they they have considerably more resources on that side]. To boot, x64 side is penalized with only 16 GPRs, so it will spill to cache more often vs 32 GPRs on ARM, and since they are clocking higher each spill cost relatively more [I mean access latency is usually > 4 cycles on x64 side and it thinks it's less on ARM side but haven't checked the docks, I might be wrong]Yes! That's the argument I needed to support my idea that rewritten software would benefit from Zen 5 more!
Think about it. AMD is becoming a software company. What if they get AI to write their compilers and some other widely used open source libraries to make the maximum use of their architectures?At this point, I genuinely cannot tell if you are this clueless or if you're doing some kind of bizarre performance art.
Think about it. AMD is becoming a software company. What if they get AI to write their compilers and some other widely used open source libraries to make the maximum use of their architectures?
That's why the saner side for x86 is the Intel E core team ignoring AVX-512 and straight up doubling the number of vector units like ARM has been doing.It wasn't meant to be this way. It's as much true for Intel The x64 crowd for some reason is going towards SIMD when ARM (including Apple) in comparison have rather weak SIMD execution side. Might be because x64 guys want to extract as much perf as they can per instruction while ARM guys care less because ARM decoders are easier to implement so they spam decoders and scalar execution units.
Mind sharing some of the stuff you're on?Think about it. AMD is becoming a software company. What if they get AI to write their compilers and some other widely used open source libraries to make the maximum use of their architectures?
By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights."16 cores DESTROY four heavy and four light Apple cores!"
A truly stirring defense of x86.
No banned substances. Just one Black teabag, less than half teaspoon Nescafe instant coffee and one Green teabag. So far.Mind sharing some of the stuff you're on?
Mind sharing some of the stuff you're on?
Yes, we must laud AMD with all the praise for a 5.7GHz core consuming 30W+ at lower performance level than a 4.4GHz one using less than 8W.By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.
about 300 pages ago this literal thread was raving about how 1T performance was the only thing that mattered and now we're back to "muh core count" mantra? lmaoBy the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.
It wasn't meant to be this way. It's as much true for Intel The x64 crowd for some reason is going towards SIMD when ARM (including Apple) in comparison have rather weak SIMD execution side. Might be because x64 guys want to extract as much perf as they can per instruction while ARM guys care less because ARM decoders are easier to implement so they spam decoders and scalar execution units. But as Apple shows for the general purpose software this seems to be a better strategy. [I mean no disrespect to Apple engineers when I say they are spamming something, it's just an observation they they have considerably more resources on that side]. To boot, x64 side is penalized with only 16 GPRs, so it will spill to cache more often vs 32 GPRs on ARM, and since they are clocking higher each spill cost relatively more [I mean access latency is usually > 4 cycles on x64 side and it thinks it's less on ARM side but haven't checked the docks, I might be wrong]
The reason that we don't need to compare to Bulldozer is because compared to best cores the x86 cores kinda do feel like Bulldozer. The reason people can't make the connection is because x86 cores are insulated by the ISA bubble.By the time Apple gives people the same number of cores, AMD's design and core counts will have scaled to new heights.
That monster "electricity eating" core opens way more computing possibilities for the world than Apple's closed everything ecosystem. Apple's core might be general purpose but it's not "general user". Restricted to the elite.Yes, we must laud AMD with all the praise for a 5.7GHz core consuming 30W+ at lower performance level than a 4.4GHz one using less than 8W.
Hence...I would have no issue declaring Apple the winner if people could straight away ditch x86 and move to ARM, without taking a hit to their pockets and bank accounts.
Might I say the government consistently giving a win for Intel vs outsiders have to do something with it?because x86 cores are insulated by the ISA bubble.
Try that again in FP/SIMD and M4 dies a horrible death 🤣How much faster is M4 over Zen 5 per clock again? Oh right, 59% in Int.
That's short-lived. They won't be able to sustain their IPC/frequency increases for long. Haven't you read reports of Mx chips working at up to 117C? Howz that gonna be for some nice cooked silicon degradation?How much faster is M4 over Zen 5 per clock again? Oh right, 59% in Int.
Aight, I'm done, that's it for me for today, have fun, folks.That monster "electricity eating" core opens way more computing possibilities for the world than Apple's closed everything ecosystem. Apple's core might be general purpose but it's not "general user". Restricted to the elite.
I would have no issue declaring Apple the winner if people could straight away ditch x86 and move to ARM, without taking a hit to their pockets and bank accounts.
You do know FP is much easier to boost right?Try that again in FP/SIMD and M4 dies a horrible death 🤣
4x 128 bit vs 4x 512 bit isn't even a competition.
SME improves some specific things for them in SIMD - but not everything.