Question Qualcomm's first Nuvia based SoC - Hamoa

poke01 · Nov 8, 2022

Qualcomm's working on a 2024 PC chip codename "Hamoa" with up to 12 (8P+4E).

Said to the same cache layout as M1. large private L1$, per-cluster L2$ (cluster = 4 cores, 12MB for every cluster) and a lot of LLC.

source:

https://twitter.com/x/status/1589405172979339264

HurleyBird · Oct 24, 2023

StinkyPinky said:
I think Arm will win. It is just more efficient.

Or maybe optimizing for power first, IPC second-ish, and frequency last-ish just tends to get you to a better place than when you try to do it the other way around.

adroc_thurston · Oct 24, 2023

HurleyBird said:
Or maybe optimizing for power first, IPC second-ish, and frequency last-ish just tends to get you to a better place than when you try to do it the other way around.

tell that to A17 P-core kek

poke01 · Oct 24, 2023

adroc_thurston said:
tell that to A17 P-core kek

A17 P core is good after the iOS 17.0.3 update.

Also testing it in a phone might not be the best way judge the code. Thermals and all. Let's see the M3 Macs.

adroc_thurston · Oct 25, 2023

poke01 said:
A17 P core is good after the iOS 17.0.3 update.

No, it has ughhhh let's say worse fundamentals than Everest.

Thibsie · Oct 25, 2023

adroc_thurston said:
No, it has ughhhh let's say worse fundamentals than Everest.

You're not really helping, here 🤔

poke01 · Oct 25, 2023

adroc_thurston said:
No, it has ughhhh let's say worse fundamentals than Everest.

If you make this analysis from geekerwan video it's not a good assumption to make.

If not from there, please share your thoughts.

Henry swagger · Oct 25, 2023

We need a technical breakdown of the core architecture next

igor_kavinski · Oct 25, 2023

Henry swagger said:
We need a technical breakdown of the core architecture next

So you can tell Intel how to design better CPUs?

H433x0n · Oct 25, 2023

ELI5 - Is there something inherent to ARM designs that prevent it from clocking to 5ghz? I don’t think I’ve ever seen a modern ARM core running at clocks seen in Intel / AMD cores.

itsmydamnation · Oct 25, 2023

H433x0n said:
ELI5 - Is there something inherent to ARM designs that prevent it from clocking to 5ghz? I don’t think I’ve ever seen a modern ARM core running at clocks seen in Intel / AMD cores.

No, if anything it might be easier given decode and load store should be easier.

But high speed design is also an area that needs experience and expertise.

JoeRambo · Oct 25, 2023

I think the main benefit for the industry at large is that we will get to buy properly designed ARM chip and full performance will no longer be locked to Apple's walled garden.
I don't really care about exact layout of cache, it is obvious from performance and memory bandwidth that this is serious effort and not another "we took vanilla flavoured ARM's design with least cache possible and combined it with our secret sauce (served in our company canteen)".

Performance at this level and efficiency would eventually open doors at non-cloud server market, i hope they can iterate fast, increase the cluster size at least to 8C ( where industry has experience with AMD stuff already ). Should not have trouble competing there with secret sauce laced "designs" for at least 1-2 years?

My main worry now is Qualcomm, without even following ARM industry, they somehow register to me as company full of byzantine corporate politics and marketing morons. Now that can end any engineers aspirations in no time. I can easily see the "secret sauce applier" department not exactly loving the situation.

Gideon · Oct 25, 2023

NTMBK said:
Most exciting CPU launch since the M1. Really looking forward to seeing the Nuvia core get benchmarked.

I agree.

The hardware seems quite solid, but it will face tough competition next year from Meteor Lake, Zen 5, M3 ...

If it had arrived this year (as was originally scheduled) it would have been the best all-around chip for laptops (hw wise), but the jury is still out for next year.

But never mind the hardware. All of this will be useless if it's hampered by poor OS support even if it were 30% better. So WARM really needs to deliver

Anyway, today should be another, more technical, keynote (probably this one?). Hopefully we get more architectural data on Oryon there.

FlameTail · Oct 25, 2023

One thing I find bizarre is that the SDXE (Snadragon X Elite) is only 50% faster than the Apple M2 in multi-core. That would put it on par with M2 Max in multi-core performance, which has an 8P+4E CPU.

Whereas the SDXE has a 12P CPU. So in theory it should be faster, right?

moinmoin · Oct 25, 2023

jeanlain said:
I see the M2 Pro/Max are only about 50% faster than the M2 in Geekbench 6, which appears to be the tool Qualcomm use for their comparisons. Which makes their new chip about as fast as an M2 Pro/Max in multicore tasks

GB 6 "MT" is a complete joke, and comparisons like these show why. Using it you simply can't compare differently sized chips and infer their potential performance in multicore tasks. You only get a view on how performance is in a few specific tamely multithreaded tasks, disadvantaging chips that have more cores compared to chips that have fewer cores but at higher frequency (which is already covered by ST as well). So the second sentence in the quote is exactly the kind of conclusion one can't make with GB 6 "MT".

Gideon · Oct 25, 2023

moinmoin said:
GB 6 "MT" is a complete joke, and comparisons like these show why. Using it you simply can't compare differently sized chips and infer their potential performance in multicore tasks. You only get a view on how performance is in a few specific tamely multithreaded tasks, disadvantaging chips that have more cores compared to chips that have fewer cores but at higher frequency (which is already covered by ST as well). So the second sentence in the quote is exactly the kind of conclusion one can't make with GB 6 "MT".

Yeah, it surely isn't scaling well at all past ~8 cores.

What's worse it vastly prefers higher frequency and more unified cache to more cores

Just look at the official processor comparion list (and select the multi-core tab):
https://browser.geekbench.com/processor-benchmarks/

Some examples:

CPU	Cores	Geekbench 6 score (MT)
Intel Core i9-13900KS	24	21744
AMD EPYC 9554	64	20172
Intel Xeon w9-3495X	56	19738
AMD Ryzen 9 7950X3D	16	19704
AMD EPYC 9654	96	19153
AMD Ryzen Threadripper PRO 5995WX	64	18205
AMD Ryzen 9 7900X	12	16840
AMD Ryzen 7 7700X	8	15163
AMD Ryzen 5 7600X	6	12664
AMD EPYC 7742	64	11782

The 16 core Ryzen 9 7950X and 96 core EPYC 9654 get nearly the same score (with the latter losing in MT!). Just compare this to Geekbench 5 even, when the difference between the two is at least 5x in MT score.

Then again, if the chip is big cores only and those clock to 3.8 GHz all-core turbo it should probably still outperform a M2 Pro a bit, instead of pretty much tying it (when comparing M2 to M2 Pro).

Perhaps the cache layout has something to do with it? The 3 4-core clusters hurting performance vs the unified 8 big cores on the M2:

All in all, let's not forget it's Qualcomm's first iteration of the chip. Compromises were surely made (as with, say, Zen 1). Based on the info they've shared so far, it looks pretty darn good all considering.
(obvioulsy some caution is adviced case it's marketing)

DrMrLordX · Oct 25, 2023

moinmoin said:
So the second sentence in the quote is exactly the kind of conclusion one can't make with GB 6 "MT".

Need third-party benchmarks.

soresu · Oct 25, 2023

FlameTail said:
One thing I find bizarre is that the SDXE (Snadragon X Elite) is only 50% faster than the Apple M2 in multi-core. That would put it on par with M2 Max in multi-core performance, which has an 8P+4E CPU.

Didn't they say that they achieve that >M2 1T GB result by overclocking just 2 cores?

Presumably that means that matching that perf for all 12C would raise the TDP ceiling significantly.

I suspect that those 3x 4C clusters are running at lower clocks when there is a highly threaded task being run.

JoeRambo · Oct 25, 2023

Gideon said:
Yeah, it surely isn't scaling well at all past ~8 cores.

What's worse it vastly prefers higher frequency and more unified cache to more cores

Sounds like >99% of laptop, non HEDT desktop use case to me. The rest 1% are from AMD/Intel marketing, running ideally scaling benchmarks like CB23?

I think their MT score is held back by "SoC" level limitations, like too small L3 cache, IMC getting swamped by too many request. 4C cluster size is not helping either, pretty much means inter cluster core comms take a huge hit and impact performance.

Still, for 1st generation, 3250'ish is GREAT SC GB6 score to be at, MT scaling being not optimal is pretty much expected from a chip with TOTAL cache budget of 42MB ?

Gideon · Oct 25, 2023

soresu said:
Didn't they say that they achieve that >M2 1T GB result by overclocking just 2 cores?

Presumably that means that matching that perf for all 12C would raise the TDP ceiling significantly.

I suspect that those 3x 4C clusters are running at lower clocks when there is a highly threaded task being run.

It's not overclock, it's a boost algorithm, Based on the slides ALL the cores should be able to hit 3.8 Ghz if there is thermal headroom(and it would be really odd to test in thermally limited conditions, considerint M2 Max is also available in desktop verisons)

oh and that >M2 Max 1T score means, that when equaling the score it takes 30% less power, why would it drastically change with all cores?

Timestamped:

soresu · Oct 25, 2023

Gideon said:
It's not overclock, it's a boost algorithm

Meh - to me it's all semantics about clocking cores beyond their ideal perf/watt window.

Gideon · Oct 25, 2023

The isa seems to be stuck at ARM 8.7 though:

https://x.com/felixclc_/status/1716901096856400155?s=46&t=tNnMS9VB0KADxuy7OaVjLQ

Understandable, considering the need to get it out of the door, but still slightly dissapointing.

I'd also prefer even a microcoded implementation of SVE to no SVE at all, especially for a 2024 product

Tigerick · Oct 25, 2023

JoeRambo said:
Sounds like >99% of laptop, non HEDT desktop use case to me. The rest 1% are from AMD/Intel marketing, running ideally scaling benchmarks like CB23?

I think their MT score is held back by "SoC" level limitations, like too small L3 cache, IMC getting swamped by too many request. 4C cluster size is not helping either, pretty much means inter cluster core comms take a huge hit and impact performance.

Still, for 1st generation, 3250'ish is GREAT SC GB6 score to be at, MT scaling being not optimal is pretty much expected from a chip with TOTAL cache budget of 42MB ?

According to AT, each CPU core should have 3MB L2 cache total 36MB. Plus 6MB L3 cache; that's slightly less than M2's 8MB.

poke01 · Oct 25, 2023

Geekbench is excellent for ST, not so much MT

Gideon · Oct 25, 2023

Tigerick said:
According to AT, each CPU core should have 3MB L2 cache total 36MB. Plus 6MB L3 cache; that's slightly less than M2's 8MB.

12MB of L2 (per complex) with only 6MB L3?

Gives me some AMD Phenom vibes, with it's 2MB L3 barely matching the total L2 cache. Bumping it to 6MB dod wonders for Phenon II even for SKUS with similar core counts/ clocks:

AMD Phenom II X4 940 & 920: A True Return to Competition

www.anandtech.com

coercitiv · Oct 25, 2023

What a breath of fresh air. Even if it's just competitive with Apple by next year, it's still a massive step in the right direction. The rest of the world was put on notice, they better have very good output in years to come.

Question Qualcomm's first Nuvia based SoC - Hamoa

Golden Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Senior member

Golden Member

Senior member

Lifer

Golden Member

Platinum Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Lifer

Platinum Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member