Discussion Qualcomm Snapdragon Thread

Ghostsonplanets · Oct 24, 2024

FlameTail said:
This confirms that the Fastconnect 7900 isn't integrated into the 8 Elite SoC (which is made on 3nm), but that it's a discrete chip.

It isn't integrated anymore? Interesting decision I guess. I wonder if FC7900 is bundled with Snap 8E or OEMs can choose one but not the other.

FlameTail said:
Does Adreno 830 exclusively use Binned Direct mode?
View attachment 110196
Vince suggests that Adreno 830 could be using TBIM (Tile Based Immediate Mode Rendering), which is a hybrid of the tradional mobile TBDR (Tile Based Deferred Rendering) and traditional desktop IMR (Immediate Mode Rendering).

https://www.reddit.com/r/hardware/comments/1ga2nhf/comment/ltb8h5t

That make sense because IMR is very bandwidth demanding and was abandoned more than a decade ago by Nvidia and AMD.

FlameTail · Oct 24, 2024

Ghostsonplanets said:
It isn't integrated anymore? Interesting decision I guess. I wonder if FC7900 is bundled with Snap 8E or OEMs can choose one but not the other.

Fastconnect chips were always discrete;

The Qualcomm® FastConnect™ 7800 Mobile Connectivity System is an advanced 14nm Wi-Fi and Bluetooth®....

FastConnect 7800 | Qualcomm

Wi-Fi 7, premium Wi-Fi 6/6E, and leading Bluetooth audio come together in this powerful and versatile connectivity system to unlock extreme performance for mobile, compute, and XR experiences.

www.qualcomm.com

The FastConnect 6900 is an advanced. 14nm 2x2 Mobile Connectivity system

Qualcomm FastConnect 6900 System | Integrated Mobile Wi-Fi Chipset with DBS technology | Qualcomm

Stay connected with the Qualcomm FastConnect 6900 System. This integrated mobile Wi-Fi chipset comes with DBS technology for reliable performance.

www.qualcomm.com

FlameTail · Oct 24, 2024

I want to revisit the topic of Apple SoCs performing disproportionately better in Geekbench 6 Multi Core test.

Now that Geekerwan has published his review of Snapdragon 8 Elite.

We can compare A18 Pro and 8 Elite.

FlameTail said:
SPEC numbers extracted from Geekerwan's graphs.

A18-P A18-E Oryon-L Oryon-M
Clock speed 4.04 GHz 2.2 GHz 4.32 GHz 3.53 GHz
SPEC INT 10.7 3.3 8.9 5.2
SPEC FP 16.0 5.0 14.0 8.0
Core size 3.0 mm² 0.8 mm² 2.2 mm² 0.9 mm²

A18-P is ~18% faster than Oryon-L.

Oryon-M is ~60% faster than A18-E.

Oryon-M has ~40% of the performance of Oryon-L. A18-E has ~27% of the performance of A18-P.

I will take Oryon-L as the 100% baseline.

A18-P = 118%
A18-E = (118 × 27%) = 32%
Oryon-L = 100%
Oryon-M = (100 × 60%) = 60%

A18 Pro
= 2P + 4E
= 2 (118%) + 4 (32%)
= 364%

8 Elite
= 2L + 6M
= 2 (100%) + 6 (60%)
= 560%

So theoretically MT performance of 8 Elite should be 53% higher.

Geekbench 6.3 Multi
A18 Pro = 9000 points
8 Elite = 10500 points

8 Elite is only about 15% faster on average than A18 Pro in GB6 MT.

Why? How? What's the reason for thus disparity?

A18 Pro does have SME, and Geekbench 6.3 boosts the score for SME. But this boost is only about 10% in ST. And consider that there is one SME block per cluster, so SME performance doesn't linearly scale up with core count. That means the MT boost by SME of A18 Pro to Geekbench 6 Multicore test would be <10%.

This is nothing new. Ever since Geekbench 6 released, Apple A CPUs have been performing disproportionately better in it than their Android series counterparts. So much so, that there is a running joke that Geekbench 6 = Applebench.

But we are hardware enthusiasts, and I want to get to the truth of this.

Back in the day, I thought that one reason for the disparity might be Apple's superior 2-tier (L1/L2) cache hierarchy. Whereas Qualcomm used ARM's inferior 3-tier (L1/L2/L3) hierarchy. But now Snapdragon 8 Elite also employs a similar cache subsystem as Apple. Snapdragon 8 Elite even has more cache than Apple A18 Pro (24 MB L2 vs 20 MB L2).

I want answers.

Edit: The figure I used for Oryon-M was wrong. It's 60% of Oryon-L, not 40% of Oryon-L. This changes my conclusion.

Edit2 : Grammar and Formatting.

gdansk · Oct 24, 2024

FlameTail said:
Conclusion: Apple CPUs are not disproportionately faster in Geekbench 6 Multi-core.

Weird. Your own line of reasoning shows it performing disproportionately better. Not a long enough context window?

naukkis · Oct 24, 2024

FlameTail said:
Snapdragon 8 Elite
2 × Phoenix-L @ 4.32 GHz [12 MB sL2]
6 × Phoenix-M @ 3.53 GHz [12 MB sL2]

Snapdragon 7 Gen 4 (Speculation)
2 × Phoenix-L @ 3.9 GHz [8 MB sL2]
2 × Phoenix-M @ 3.2 GHz [4 MB sL2]

Will a quad core CPU be unsuitable for Android today?

Even $150 phones have '8-core CPU' nowadays.

Original big.LITTLE concept had 4+4 cores which all could be used together. So Android software was build to use 8-threads. There might be really odd performance problems and even deadlocks in Android ecosystem if cpu has less than 8-threads. For performance so many thread/cpu cores aren't needed but Android pretty much won't work with any less than 8-threads.

FlameTail · Oct 24, 2024

gdansk said:
Weird. Your own line of reasoning shows it performing disproportionately better. Not a long enough context window?

Edit: The figure I used for Oryon-M was wrong. It's 60% of Oryon-L, not 40% of Oryon-L. This changes my conclusion.

You can read my analysis again. Sorry for the error.

gdansk · Oct 24, 2024

FlameTail said:
Edit: The figure I used for Oryon-M was wrong. It's 60% of Oryon-L, not 40% of Oryon-L. This changes my conclusion.

You can read my analysis again. Sorry for the error.

Hmm, I wonder if you should compare to SPEC nt instead of extrapolating from 1t. Does anyone test that?

DZero · Oct 24, 2024

FlameTail said:
Snapdragon 8 Elite
2 × Phoenix-L @ 4.32 GHz [12 MB sL2]
6 × Phoenix-M @ 3.53 GHz [12 MB sL2]

Snapdragon 7 Gen 4 (Speculation)
2 × Phoenix-L @ 3.9 GHz [8 MB sL2]
2 × Phoenix-M @ 3.2 GHz [4 MB sL2]

Will a quad core CPU be unsuitable for Android today?

Even $150 phones have '8-core CPU' nowadays.

Unless they have a Small Phoenix core, that does not make sense. I expect 2 Phoenix L and 6 M but at much lower clock and with the name of Snapdragon 7 Plus.

Raqia · Oct 24, 2024

poke01 said:
Yeah Qualcomm isn’t getting an x86 license unless something very very crazy happens

I think AMD would be happy to negotiate a better deal than ARM in the event of an Intel acquisition to keep their bread and butter going; doubt they'll want to pivot to ARM given their new found zeal for licensing revenues at any cost: ARM's architectural license is already in place w/ Qualcomm, so this current, rather weak law suit was their best bet out of few to re-negotiate pricing.

Another scenario, Qualcomm joins the x86 consortium as a third member: it's not out of the question that they bring forward an x86 implementation (probably for a streamlined version of the ISA like x86S rather than the full warts x86-64...) based on their current u-arch. Intel and AMD are feeling the heat from ARM from below, hence the formation of the consortium which would benefit from additional implementers joining. This is complicated by the fact that Qualcomm's implementations may well be much better initially than Intel or AMD's...

Not sure how Qualcomm's RISC-V efforts for big, user application facing cores are going, but some of the changes they're proposing seem to align well to slapping a RISC-V frontend to their u-arch so they seem to be exploring all of these options to squirm free from ARM or as leverage in negotiations...

moinmoin · Oct 24, 2024

FlameTail said:
8 Elite is only about 15% faster on average than A18 Pro in GB6 MT.

Why? How? What's the reason for thus disparity?

GB6's "MT" is not MT and as such shouldn't be used for multi core performance comparisons like you attempted there.

FlameTail · Oct 24, 2024

moinmoin said:
GB6's "MT" is not MT and as such shouldn't be used for multi core performance comparisons like you attempted there.

My question wasn't "How much faster actually is 8 Elite than A18 Pro in MT?"

I am asking "Why does A18 Pro (and previous Apple CPUs) perform so well in the Geekbench 6 Multicore test?"

What quality is there in those CPUs that make them perform disproportionately better? Or is Apple handing money under the table to Primate Labs, to boost the scores of their SoCs in Geekbench 6? Thus making the 'Geekbench 6 = Applebench' theory a reality (I wasn't a fan of the theory in the first place).

FlameTail · Oct 24, 2024

DZero said:
Unless they have a Small Phoenix core, that does not make sense. I expect 2 Phoenix L and 6 M but at much lower clock and with the name of Snapdragon 7 Plus.

Snapdragon 8 Elite
2 × Phoenix-L @ 4.32 GHz [12 MB L2]
6 × Phoenix-M @ 3.53 GHz [12 MB L2]

Snapdragon 8s Gen 4 / 7+ Gen 4 (Speculation)
2 × Phoenix-L @ ~4.0 GHz [8 MB L2]
6 × Phoenix-M @ ~2.5 GHz [8 MB L2]

Snapdragon 7 Gen 4 (Speculation)
2 × Phoenix-M @ ~3.5 GHz [6 MB L2]
6 × Phoenix-M @ ~2.5 GHz [6 MB L2]

Using Phoenix-M as the prime core in 7 Gen 4 wouldn't be bad.

We know from Geekerwan's 8 Elite review that Phoenix-M has 60% of the peak performance of Phoenix-L.

Phoenix-L @ 4.3 GHz scores ~3250 in Geekbench 6 ST. So Phoenix-M @ 3.5 GHz would score about 2000 points.

Now the 7 Gen 4 configuration that I speculated above has half the L2 cache, so let's deduct 10%.

That gives 1800 points in GB6 ST for Snapdragon 7 Gen 4 (Speculated).

That's quite respectable performance for a midrange SoC, and a large uplift from the Snapdragon 7 Gen 3 (~1200 points).

GTracing · Oct 24, 2024

FlameTail said:
My question wasn't "How much faster actually is 8 Elite than A18 Pro in MT?"

I am asking "Why does A18 Pro (and previous Apple CPUs) perform so well in the Geekbench 6 Multicore test?"

What quality is there in those CPUs that make them perform disproportionately better? Or is Apple handing money under the table to Primate Labs, to boost the scores of their SoCs in Geekbench 6? Thus making the 'Geekbench 6 = Applebench' theory a reality (I wasn't a fan of the theory in the first place).

GB6 doesn't scale linearly with more cores. You can compare other CPUs and see similar results. For example there's only a 20% difference between the 8 core 16 thread 1700X vs 4 core 8 thread 1500X. https://browser.geekbench.com/v6/cpu/compare/8462630?baseline=8458189

Nothingness · Oct 24, 2024

GTracing said:
GB6 doesn't scale linearly with more cores. You can compare other CPUs and see similar results. For example there's only a 20% difference between the 8 core 16 thread 1700X vs 4 core 8 thread 1500X. https://browser.geekbench.com/v6/cpu/compare/8462630?baseline=8458189

Yeah but some of the tests almost do scale linearly (in particular the rendering one). Not all, far from it, and that's why the global score can be misleading.

Meteor Late · Oct 24, 2024

FlameTail said:
I want to revisit the topic of Apple SoCs performing disproportionately better in Geekbench 6 Multi Core test.

Now that Geekerwan has published his review of Snapdragon 8 Elite.

We can compare A18 Pro and 8 Elite.

A18-P is ~18% faster than Oryon-L.

Oryon-M is ~60% faster than A18-E.

Oryon-M has ~40% of the performance of Oryon-L. A18-E has ~27% of the performance of A18-P.

I will take Oryon-L as the 100% baseline.

A18-P = 118%
A18-E = (118 × 27%) = 32%
Oryon-L = 100%
Oryon-M = (100 × 60%) = 60%

A18 Pro
= 2P + 4E
= 2 (118%) + 4 (32%)
= 364%

8 Elite
= 2L + 6M
= 2 (100%) + 6 (60%)
= 560%

So theoretically MT performance of 8 Elite should be 53% higher.

Geekbench 6.3 Multi
A18 Pro = 9000 points
8 Elite = 10500 points

8 Elite is only about 15% faster on average than A18 Pro in GB6 MT.

Why? How? What's the reason for thus disparity?

A18 Pro does have SME, and Geekbench 6.3 boosts the score for SME. But this boost is only about 10% in ST. And consider that there is one SME block per cluster, so SME performance doesn't linearly scale up with core count. That means the MT boost by SME of A18 Pro to Geekbench 6 Multicore test would be <10%.

This is nothing new. Ever since Geekbench 6 released, Apple A CPUs have been performing disproportionately better in it than their Android series counterparts. So much so, that there is a running joke that Geekbench 6 = Applebench.

But we are hardware enthusiasts, and I want to get to the truth of this.

Back in the day, I thought that one reason for the disparity might be Apple's superior 2-tier (L1/L2) cache hierarchy. Whereas Qualcomm used ARM's inferior 3-tier (L1/L2/L3) hierarchy. But now Snapdragon 8 Elite also employs a similar cache subsystem as Apple. Snapdragon 8 Elite even has more cache than Apple A18 Pro (24 MB L2 vs 20 MB L2).

I want answers.

Edit: The figure I used for Oryon-M was wrong. It's 60% of Oryon-L, not 40% of Oryon-L. This changes my conclusion.

Edit2 : Grammar and Formatting.

Well, should be pretty simple, it's just not possible for Oryon to use all 8 cores at their maximum frequency or close to that, power consumption would just be insane. While Oryon M cores are much more powerful than Apple E cores, they also consume much, MUCH more power. Even when you downclock them to match Apple E cores, they are not more efficient, they are less efficient.

On the contrary, Apple E cores can run pretty much at their maximum frequency or very close in MT, because their power consumption is already very low.

Very rough estimate extrapolating from SC power draw, 8 elite would consume around 32W or so if all cores were at maximum frequency, 7x2 + 3x6, of course it depends on workload and GB power draw != Spec power draw, but very roughly let's say 30W. From where are we lowering frequencies to reach the around half power consumption we are getting in GB MT? A18 pro doesn't need to lower frequencies much, as if we add SC power consumption from all cores it won't be much more than MT power consumption, let's say each P core draws 7W, each E core draws 0.8W, so we are barely above 16W.

DZero · Oct 24, 2024

FlameTail said:
Snapdragon 8 Elite
2 × Phoenix-L @ 4.32 GHz [12 MB L2]
6 × Phoenix-M @ 3.53 GHz [12 MB L2]

Snapdragon 8s Gen 4 / 7+ Gen 4 (Speculation)
2 × Phoenix-L @ ~4.0 GHz [8 MB L2]
6 × Phoenix-M @ ~2.5 GHz [8 MB L2]

Snapdragon 7 Gen 4 (Speculation)
2 × Phoenix-M @ ~3.5 GHz [6 MB L2]
6 × Phoenix-M @ ~2.5 GHz [6 MB L2]

Using Phoenix-M as the prime core in 7 Gen 4 wouldn't be bad.

We know from Geekerwan's 8 Elite review that Phoenix-M has 60% of the peak performance of Phoenix-L.

Phoenix-L @ 4.3 GHz scores ~3250 in Geekbench 6 ST. So Phoenix-M @ 3.5 GHz would score about 2000 points.

Now the 7 Gen 4 configuration that I speculated above has half the L2 cache, so let's deduct 10%.

That gives 1800 points in GB6 ST for Snapdragon 7 Gen 4 (Speculated).

That's quite respectable performance for a midrange SoC, and a large uplift from the Snapdragon 7 Gen 3 (~1200 points).

Since Qualcom will have problems with ARM I expect this...

Snapdragon 8 Elite
2 × Phoenix-L @ 4.32 GHz [12 MB L2]
6 × Phoenix-M @ 3.53 GHz [12 MB L2]

Snapdragon 7 Gen 4 (Speculation) AKA: 7 Plus
2 × Phoenix-L @ ~3.7 GHz [12 MB L2]
6 × Phoenix-M @ ~2.8 GHz [8 MB L2]

For the lower tier it can go with this:

Snapdragon 6 Gen 4 (Speculation) AKA: 6
2 × Phoenix-M @ ~3.2 GHz [4 MB L2]
6 × Phoenix-M @ ~2.5 GHz [4 MB L2]

Snapdragon 4 Gen 4 (Speculation) AKA: 4s
2 × Phoenix-M @ ~2.5 GHz [4 MB L2]
6 × Phoenix-M @ ~2.0 GHz [2 MB L2]

An alternative could be:

Snapdragon 6 Gen 4 (Speculation) AKA: 6
4 × Phoenix-M @ ~3.0 GHz [4 MB L2]
4 × Phoenix-M @ ~2.4 GHz [4 MB L2]

Snapdragon 4 Gen 4 (Speculation) AKA: 4s
4 × Phoenix-M @ ~2.4 GHz [4 MB L2]
4 × Phoenix-M @ ~2.0 GHz [2 MB L2]

Why? saving cost from ARM licences and using all the dies, failed ones become 7 series.

Meanwile 6 and 4 uses another uARCH which is cheaper and can go with TSMC 4 or even 5 nm.

Doug S · Oct 24, 2024

MS_AT said:
What Andrei said is raising an interesting dillema does the end user care about uarch comparison or the actual performance in context of Geekbench version (6.2 vs 6.3). I mean I understand that introducing SME will have less impact in android ecosystem due to fragmentation, but I guess that vertically integrated Apple is already making use of it due to having much better control over software stack.

At least this gives some context to Qualcomm slides where 6.2 was used, as from uarch vs uarch it's more fair seeing SME unlike Neon is not part of the core.

On the other hand if Andrei wanted to do accurate uarch vs uarch comparisons he should have enabled AVX for x64 when doing comparisons on Qualcomm slides as the AVX units belong to the core the same Neon units do (I guess he is somewhat involved with those if the rumours are accurate). But yea, that is a digression

Benchmark results are always a snapshot in time. Remember the differences in GB results for Apple depending on which OS version is used, i.e. the regression with iOS 18.0 compared to developer betas during the summer, and the improvement people running 18.1 betas reported? I imagine you see the same sort of stuff with different Android versions. That kind of stuff swamps out small things like using jemalloc to get a better SPEC result.

You'll never get a "fair" comparison because next week an OS update could increase or decrease performance levels for one, a new compiler version might provide a boost in SPEC, and so forth.

Nothingness · Oct 24, 2024

Doug S said:
Benchmark results are always a snapshot in time. Remember the differences in GB results for Apple depending on which OS version is used, i.e. the regression with iOS 18.0 compared to developer betas during the summer, and the improvement people running 18.1 betas reported? I imagine you see the same sort of stuff with different Android versions. That kind of stuff swamps out small things like using jemalloc to get a better SPEC result.

You'll never get a "fair" comparison because next week an OS update could increase or decrease performance levels for one, a new compiler version might provide a boost in SPEC, and so forth.

And yet people draw conclusions after seeing variations of a few percents.

I think I already explained how this is done during CPU development: extracts of benchmarks/apps or instruction traces are run through simulation (accurate model, RTL simulation, FPGA, emulator) in a completely controlled platform. That's the only way to measure things accurately when you're expecting improvements from a change in the range of 0.1 or 0.2% (yes, some tweaks only bring that).

MS_AT · Oct 24, 2024

Doug S said:
Benchmark results are always a snapshot in time. Remember the differences in GB results for Apple depending on which OS version is used, i.e. the regression with iOS 18.0 compared to developer betas during the summer, and the improvement people running 18.1 betas reported? I imagine you see the same sort of stuff with different Android versions. That kind of stuff swamps out small things like using jemalloc to get a better SPEC result.

You'll never get a "fair" comparison because next week an OS update could increase or decrease performance levels for one, a new compiler version might provide a boost in SPEC, and so forth.

Sure, that is why you won't be able to provide 100% fair comparison, but it doesn't mean you cannot strive to reach perfection I don't expect every one will suddenly start to use the same OS version and compiler settings to run SPEC comparisons, but it would be definitely nice if they could provide informations about environments they used. Bonus points for being consistent with what they do.

Nothingness said:
I think I already explained how this is done during CPU development: extracts of benchmarks/apps or instruction traces are run through simulation (accurate model, RTL simulation, FPGA, emulator) in a completely controlled platform. That's the only way to measure things accurately when you're expecting improvements from a change in the range of 0.1 or 0.2% (yes, some tweaks only bring that).

Side question, what is the threshold the improvement is deemed meaningful? 0.01%, 0.05% or everything is weighted against the cost and if 0.01% improvement is "free" it will go in anyway?

poke01 · Oct 24, 2024

GB6 isn’t the best to compare MT, plus in a phone you have a lot of variables.

gdansk · Oct 24, 2024

poke01 said:
GB6 isn’t the best to compare MT, plus in a phone you have a lot of variables.

Hmm. I think GB6 MT is best fit for phones and thin laptops. Where you're not running embarrassingly parallel workloads.

It is less relevant for high end desktops and is entirely meaningless for servers.

poke01 · Oct 24, 2024

gdansk said:
Hmm. I think GB6 MT is best fit for phones and thin laptops. Where you're not running embarrassingly parallel workloads.

It is less relevant for high end desktops and is entirely meaningless for servers.

Flame is talking about why it’s not scaling well, one reason could be not maintaining clocks and GB6 not meant for parallel MT.

Nothingness · Oct 24, 2024

MS_AT said:
Side question, what is the threshold the improvement is deemed meaningful? 0.01%, 0.05% or everything is weighted against the cost and if 0.01% improvement is "free" it will go in anyway?

After a few iterations of the same microarchitecure everything that brings performance with acceptable power and area is picked, because the uarch usually has reached a balanced point where more performance requires larger changes and these are done earlier in the design cycle. So, yes, if the cost is low, any improvement goes in.

Of course it's more complex. Very often a small change can bring a small benefit in one place, and some regression in another workload. At this point one has to make a decision, which boils down to what workload is considered the more important, to decide if the change should be kept.

Meteor Late · Oct 24, 2024

I don't understand how do you guys would expect so much higher MT scores from Oryon, like >30% higher?

Apple cores are more energy efficient. Apple P core is more efficient than Oryon P core, and Apple E core is more efficient than Oryon M core, when comparing both power at the same performance level and performance at the same power level. Yes, Oryon M core can be much more performant, but that is irrelevant in the Smartphone form factor if the power needed is so much higher.

We have a CPU with 33% more cores, however, these are less efficient cores than their counterpart. If they were as efficient, let's say you could downclock them a bit and get 25% more MT at the same power. But since they are less efficient than their counterpart, you cannot get that much MT difference, you get less difference so that the core energy efficiency deficit is alleviated.

Why does then, Oryon beat D9400 in MT by a decent margin in terms of efficiency, if seemingly the difference with the P cores is very small and D9400 E core is slightly more efficient? because D9400 X4 core sucks, especially in floating point. And there are 3 of them.

FlameTail · Oct 24, 2024

FlameTail said:
'Geekbench 6 = Applebench'

ROFL.

Source

	A18-P	A18-E	Oryon-L	Oryon-M
Clock speed	4.04 GHz	2.2 GHz	4.32 GHz	3.53 GHz
SPEC INT	10.7	3.3	8.9	5.2
SPEC FP	16.0	5.0	14.0	8.0
Core size	3.0 mm²	0.8 mm²	2.2 mm²	0.9 mm²

Discussion Qualcomm Snapdragon Thread

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Member

Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Member

Diamond Member