Question Qualcomm's first Nuvia based SoC - Hamoa

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

H433x0n

Golden Member
Mar 15, 2023
1,068
1,273
96
ELI5 - Is there something inherent to ARM designs that prevent it from clocking to 5ghz? I don’t think I’ve ever seen a modern ARM core running at clocks seen in Intel / AMD cores.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,864
3,417
136
ELI5 - Is there something inherent to ARM designs that prevent it from clocking to 5ghz? I don’t think I’ve ever seen a modern ARM core running at clocks seen in Intel / AMD cores.
No, if anything it might be easier given decode and load store should be easier.

But high speed design is also an area that needs experience and expertise.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I think the main benefit for the industry at large is that we will get to buy properly designed ARM chip and full performance will no longer be locked to Apple's walled garden.
I don't really care about exact layout of cache, it is obvious from performance and memory bandwidth that this is serious effort and not another "we took vanilla flavoured ARM's design with least cache possible and combined it with our secret sauce (served in our company canteen)".

Performance at this level and efficiency would eventually open doors at non-cloud server market, i hope they can iterate fast, increase the cluster size at least to 8C ( where industry has experience with AMD stuff already ). Should not have trouble competing there with secret sauce laced "designs" for at least 1-2 years?

My main worry now is Qualcomm, without even following ARM industry, they somehow register to me as company full of byzantine corporate politics and marketing morons. Now that can end any engineers aspirations in no time. I can easily see the "secret sauce applier" department not exactly loving the situation.
 

Gideon

Golden Member
Nov 27, 2007
1,712
3,931
136
Most exciting CPU launch since the M1. Really looking forward to seeing the Nuvia core get benchmarked.
I agree.

The hardware seems quite solid, but it will face tough competition next year from Meteor Lake, Zen 5, M3 ...

If it had arrived this year (as was originally scheduled) it would have been the best all-around chip for laptops (hw wise), but the jury is still out for next year.

But never mind the hardware. All of this will be useless if it's hampered by poor OS support even if it were 30% better. So WARM really needs to deliver

Anyway, today should be another, more technical, keynote (probably this one?). Hopefully we get more architectural data on Oryon there.
 

FlameTail

Diamond Member
Dec 15, 2021
3,161
1,805
106
One thing I find bizarre is that the SDXE (Snadragon X Elite) is only 50% faster than the Apple M2 in multi-core. That would put it on par with M2 Max in multi-core performance, which has an 8P+4E CPU.

Whereas the SDXE has a 12P CPU. So in theory it should be faster, right?
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I see the M2 Pro/Max are only about 50% faster than the M2 in Geekbench 6, which appears to be the tool Qualcomm use for their comparisons. Which makes their new chip about as fast as an M2 Pro/Max in multicore tasks
GB 6 "MT" is a complete joke, and comparisons like these show why. Using it you simply can't compare differently sized chips and infer their potential performance in multicore tasks. You only get a view on how performance is in a few specific tamely multithreaded tasks, disadvantaging chips that have more cores compared to chips that have fewer cores but at higher frequency (which is already covered by ST as well). So the second sentence in the quote is exactly the kind of conclusion one can't make with GB 6 "MT".
 

Gideon

Golden Member
Nov 27, 2007
1,712
3,931
136
GB 6 "MT" is a complete joke, and comparisons like these show why. Using it you simply can't compare differently sized chips and infer their potential performance in multicore tasks. You only get a view on how performance is in a few specific tamely multithreaded tasks, disadvantaging chips that have more cores compared to chips that have fewer cores but at higher frequency (which is already covered by ST as well). So the second sentence in the quote is exactly the kind of conclusion one can't make with GB 6 "MT".
Yeah, it surely isn't scaling well at all past ~8 cores.

What's worse it vastly prefers higher frequency and more unified cache to more cores

Just look at the official processor comparion list (and select the multi-core tab):
https://browser.geekbench.com/processor-benchmarks/


Some examples:


The 16 core Ryzen 9 7950X and 96 core EPYC 9654 get nearly the same score (with the latter losing in MT!). Just compare this to Geekbench 5 even, when the difference between the two is at least 5x in MT score.

Then again, if the chip is big cores only and those clock to 3.8 GHz all-core turbo it should probably still outperform a M2 Pro a bit, instead of pretty much tying it (when comparing M2 to M2 Pro).

Perhaps the cache layout has something to do with it? The 3 4-core clusters hurting performance vs the unified 8 big cores on the M2:



All in all, let's not forget it's Qualcomm's first iteration of the chip. Compromises were surely made (as with, say, Zen 1). Based on the info they've shared so far, it looks pretty darn good all considering.
(obvioulsy some caution is adviced case it's marketing)
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,959
2,182
136
One thing I find bizarre is that the SDXE (Snadragon X Elite) is only 50% faster than the Apple M2 in multi-core. That would put it on par with M2 Max in multi-core performance, which has an 8P+4E CPU.
Didn't they say that they achieve that >M2 1T GB result by overclocking just 2 cores?

Presumably that means that matching that perf for all 12C would raise the TDP ceiling significantly.

I suspect that those 3x 4C clusters are running at lower clocks when there is a highly threaded task being run.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Yeah, it surely isn't scaling well at all past ~8 cores.

What's worse it vastly prefers higher frequency and more unified cache to more cores

Sounds like >99% of laptop, non HEDT desktop use case to me. The rest 1% are from AMD/Intel marketing, running ideally scaling benchmarks like CB23?

I think their MT score is held back by "SoC" level limitations, like too small L3 cache, IMC getting swamped by too many request. 4C cluster size is not helping either, pretty much means inter cluster core comms take a huge hit and impact performance.

Still, for 1st generation, 3250'ish is GREAT SC GB6 score to be at, MT scaling being not optimal is pretty much expected from a chip with TOTAL cache budget of 42MB ?
 

Gideon

Golden Member
Nov 27, 2007
1,712
3,931
136
Didn't they say that they achieve that >M2 1T GB result by overclocking just 2 cores?

Presumably that means that matching that perf for all 12C would raise the TDP ceiling significantly.

I suspect that those 3x 4C clusters are running at lower clocks when there is a highly threaded task being run.
It's not overclock, it's a boost algorithm, Based on the slides ALL the cores should be able to hit 3.8 Ghz if there is thermal headroom(and it would be really odd to test in thermally limited conditions, considerint M2 Max is also available in desktop verisons)

oh and that >M2 Max 1T score means, that when equaling the score it takes 30% less power, why would it drastically change with all cores?

Timestamped:



 
Reactions: Tlh97

Tigerick

Senior member
Apr 1, 2022
686
576
106
Sounds like >99% of laptop, non HEDT desktop use case to me. The rest 1% are from AMD/Intel marketing, running ideally scaling benchmarks like CB23?

I think their MT score is held back by "SoC" level limitations, like too small L3 cache, IMC getting swamped by too many request. 4C cluster size is not helping either, pretty much means inter cluster core comms take a huge hit and impact performance.

Still, for 1st generation, 3250'ish is GREAT SC GB6 score to be at, MT scaling being not optimal is pretty much expected from a chip with TOTAL cache budget of 42MB ?
According to AT, each CPU core should have 3MB L2 cache total 36MB. Plus 6MB L3 cache; that's slightly less than M2's 8MB.
 

Gideon

Golden Member
Nov 27, 2007
1,712
3,931
136
According to AT, each CPU core should have 3MB L2 cache total 36MB. Plus 6MB L3 cache; that's slightly less than M2's 8MB.
12MB of L2 (per complex) with only 6MB L3?

Gives me some AMD Phenom vibes, with it's 2MB L3 barely matching the total L2 cache. Bumping it to 6MB dod wonders for Phenon II even for SKUS with similar core counts/ clocks:

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |