Discussion Qualcomm Snapdragon Thread

Page 123 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
19,613
13,477
146
I'm a bit confused.

Are pipeline stages == clock cycles?

That's what DavidC1 seems to be implying with that post.
 

GTracing

Member
Aug 6, 2021
78
192
76
I'm a bit confused.

Are pipeline stages == clock cycles?

That's what DavidC1 seems to be implying with that post.
No, but adding pipeline stages allows designers to increase clockspeed.

In an CPU with no pipelining, the core has to decode and execute the whole instruction in a single cycle. It takes a long time for the chip to do all that work.

Pipelining is splitting the work into multiple stages. By splitting the work into 10 stages, each stage does about 1/10th of the work to decode and execute an instruction. And then you can clock the processor much faster. The downside is that it takes 10 cycles to complete a single instruction. But while the first instruction is on stage 6, the second one can be on stage 5, the third instruction can be on stage 4 etc. So overall it's still a huge speedup.

Where this gets hairy is when there's a branch in the code. The CPU doesn't know what instruction comes next until the branch finishes executing, so it just guesses. CPUs are pretty good at guessing correctly, but when they get it wrong they have to discard all the work they did on the "guess". The time lost working on the wrong guess is known as the branch misprediction penalty.
 

Nothingness

Diamond Member
Jul 3, 2013
3,030
1,971
136
@GTracing Good summary, thanks

I would just add that even after dispatching instructions the penalty is variable due to instructions not needing the same number of stages/cycles to execute (int add vs FP64 fma for instance).

Also as @DavidC1 said in his post there are several other variables that change latency. That’s why I don’t understand what the single 11 figure for X925 represents. It should be a range or flagged as a max latency. I looked at X925 Software Optimization Guide and couldn’t find any information about that.
 

FlameTail

Diamond Member
Dec 15, 2021
3,759
2,206
106
Cheaper Snapdragon laptops are coming.... Next year!?

“We expect PC to be the next biggest driver of diversification for the company,” says Amon. Business will be “slow and steady as the market transitions,” he admitted, but we already see some good signs coming out of the woodwork. Amon said that some Snapdragon X PCs have already sold out, while Geekbench also posted on X that 6.5% of Geekbench 6 benchmarks from June 15 to July 15, 2024, were also run on Snapdragon X devices — good signs for Qualcomm, especially as it had launched less than a month before that.
 

FlameTail

Diamond Member
Dec 15, 2021
3,759
2,206
106
Sebastian Aaltonen comments again about GPU architecture of Qualcomm.


And now you understand why GPU-driven games such as Rainbox Six Siege run very poorly on Qualcomm Snapdragon X. Same for Nanite.
Other Android GPU vendors have similar bottlenecks. This is not just Qualcomm. These GPUs are optimized for traditional VS+PS workloads. All big data (matrices, material) should be loaded from uniform buffer using fixed address. This hits fast paths.
If you want to know how HypeHype’s new renderer optimizes around these bottlenecks, check our new SIGGRAPH 2024 presentation (in Moving Mobile Graphics track). Slide deck will be public next week.
Most people don’t know that biggest improvement in Turing wasn’t ray-tracing or tensor cores. It was 28 cycle L1$ latency (vs 85 cycles) and 2x L1$ bandwidth (source: CUDA paper). This is great for GPU-driven render + V-buffer. And great for ray-tracing too.
Nvidia wasn’t great in GPU compute before Turing. They also lacked async compute. Kepler also emulated LDS atomics. Turing was a massive enabler for Nvidia -> $3 trillion company now. But Pascal had so fast geometry processing that gamers didn’t notice GCNs compute advantage.
Back in the Pascal days Nvidia made a series of blog posts advicing devs to use uniform buffers for their new deferred lighting shaders. IIRC Pascal was suffering 30% from modern raw buffer compute code. Now Qualcomm is in the same position when you run modern workloads.
 
Last edited:
Reactions: igor_kavinski

soresu

Diamond Member
Dec 19, 2014
3,190
2,463
136
Nvidia wasn’t great in GPU compute before Turing. They also lacked async compute. Kepler also emulated LDS atomics. Turing was a massive enabler for Nvidia -> $3 trillion company now. But Pascal had so fast geometry processing that gamers didn’t notice GCNs compute advantage.

This is a big understatement.

If not for their CUDA walled garden they would not have the market dominance in GPU general compute they enjoy at the moment.

Case in point CDNAx is still based on Vega/GFX9.

GFX9 remains good enough for straight compute, it just was poorly designed to scale with games while filling the CUs with work.

I have heard rumblings though that a near future iteration of CDNA could be based on GFX13 or 14 (RDNA5 onward).
 

FlameTail

Diamond Member
Dec 15, 2021
3,759
2,206
106
Snapdragon 8 Gen 4 might match the Single Core performance of Apple A18 in Geekbench 5, where SME does not add to the score.
 

jdubs03

Senior member
Oct 1, 2013
683
307
136
Snapdragon 8 Gen 4 might match the Single Core performance of Apple A18 in Geekbench 5, where SME does not add to the score.
Do we know the X Elite GB5 score? I can’t find anything on it. That’d be a decent proxy.
Would have to get around 2500-2600 to challenge.
 

jdubs03

Senior member
Oct 1, 2013
683
307
136
Awesome thank you.

Yeah so based on that the X1E-84-100, the score is around 1940 (median of 5 results). The A17 Pro is already at 2130ish on average.

The A18 Pro will probably be around 2400* EDIT. Doesn’t seem possible that the Snapdragon 8 Gen 4 will be anywhere near that. I don’t even know if we can expect it to be at 2000.
 
Last edited:
Reactions: Nothingness

jdubs03

Senior member
Oct 1, 2013
683
307
136
That score I think is for the 80W reference design.
The 23W reference was 124.
And I’ve seen scores around for X1E-84 of 125/126 and 129 (in a Best Buy review) from the Galaxy Book 4 Edge, which seems most representative of real world results as of now.
 

hemedans

Senior member
Jan 31, 2015
223
113
116
Awesome thank you.

Yeah so based on that the X1E-84-100, the score is around 1940 (median of 5 results). The A17 Pro is already at 2130ish on average.

The A18 Pro will probably be around 2400* EDIT. Doesn’t seem possible that the Snapdragon 8 Gen 4 will be anywhere near that. I don’t even know if we can expect it to be at 2000.
4.2ghz is massive jump in frequency, that plus minor ipc improvement it's possible for 8 gen 4 to reach ~3500 in GB 6 which is A18 level.
 

FlameTail

Diamond Member
Dec 15, 2021
3,759
2,206
106
4.2ghz is massive jump in frequency, that plus minor ipc improvement it's possible for 8 gen 4 to reach ~3500 in GB 6 which is A18 level.
The rumour says 4.37 GHz ST boost for 8G4.


X Elite @ 4.3 GHz can hit 3200 in Linux (3000 in Windows). If assume that carries over to Android (because it uses the Linux kernel), we might be looking at about 3250 points for 8G4.

I also see a possibility that Phoenix-L in 8G4 might have some single digit (<5%) IPC gains compared to Phoenix in X Elite, due to a more robust memory subsystem etc...
 

jdubs03

Senior member
Oct 1, 2013
683
307
136
Tbh I’d be very surprised if a mobile derivative of the X Elite will perform as high as their laptop part at its’ highest wattage.
Keep in mind their GB6 score of ~2965 was for the 80W reference part. Their 23W reference design achieved ~2765.

If I had to guess, it’d be under 3000.
3250 seems too high. That’s almost 10% faster than the 80W reference.
Why wouldn’t they want to use that same performance core in their flagship laptops?
 

FlameTail

Diamond Member
Dec 15, 2021
3,759
2,206
106
Tbh I’d be very surprised if a mobile derivative of the X Elite will perform as high as their laptop part at its’ highest wattage.
Keep in mind their GB6 score of ~2965 was for the 80W reference part. Their 23W reference design achieved ~2765.

If I had to guess, it’d be under 3000.
3250 seems too high. That’s almost 10% faster than the 80W reference.
Why wouldn’t they want to use that same performance core in their flagship laptops?
A single core is definetely not guzzling 80W.

Also that 3200 is for Linux. In Windows it does 3000 (for reasons that I cannot explain -_-)
 

jdubs03

Senior member
Oct 1, 2013
683
307
136
A single core is definetely not guzzling 80W.

Also that 3200 is for Linux. In Windows it does 3000 (for reasons that I cannot explain -_-)
Surely not. But the 23W design is more representative of a mobile SKU.
Using the 2765 baseline from the 23W, apply the Linux adjustment gets to 2950. Being generous with another 5% for IPC improvements is 3100. And another adjustment to get to that 4.37Ghz vs 4.3, gets to 3150.

It’s tough to assume that an 8W phone would score that high. Just due to the form factor. If they can hit 3000 that would be a surprise.
 
Reactions: coercitiv
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |