Discussion Future ARM Cortex + Neoverse µArchs Discussion

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
17,967
11,709
116
Nvidia is reportedly working with MediaTek on a PC chip based on MediaTek’s existing Chromebook chip, Kompanio, which will use TSMC’s CoWoS 2.5D to package the CPU with the GPU, media report.
nGreedia preparing to be kicked in the nuts and out of future x86 laptop designs as both Intel and AMD increasingly offer more powerful iGPUs and their own dGPUs to laptop OEMs.
 
Jul 27, 2020
17,967
11,709
116
Yeah. At this rate more and more consumers have less use/need for a dGPU.
It's also much easier to optimize for iGPU since it will always have a higher marketshare and developers can rely on satisfying a larger subset of the market with a baseline acceptable performance of their game. They are beginning to give up on dGPU optimization and forcing users to rely on reducing graphical settings or use DLSS/FSR/XeSS to get 60 fps or more.
 

poke01

Golden Member
Mar 8, 2022
1,413
1,621
106
Cortex X4 even wider decode stage than A17 (10 vs 9). 670 ROB for A17.
Cortex X4 has a 18% Int increase vs X3 but a 39% power increase. +7% FP, +24% power increase!!


A17 P core has a 28% Int lead over the X4 while using just 2% more power.
A17 P core has a 50% Fp lead over X4 while using 5% more power.
X4 doesn’t match A15 in either Int or Fp performance

Source: Geekerwan but take with a grain of salt unless others also have similiar results
 
Last edited:
Reactions: Tlh97

ikjadoon

Member
Sep 4, 2006
150
258
146
Cortex X4 even wider decode stage than A17 (10 vs 9). Bigger ROB than A17 (384 vs 321). I think this was already known.
Cortex X4 has a 18% Int increase vs X3 but a 39% power increase. +7% FP, +24% power increase!!


A17 P core has a 28% Int lead over the X4 while using just 2% more power.
A17 P core has a 50% Fp lead over X4 while using 5% more power.
X4 doesn’t match A15 in either Int or Fp performance

Source: Geekerwan but take with a grain of salt unless others also have similiar results

Is X4 a 10-wide decode or an 8-wide decode + 10-wide dispatch?

HWCooling:
But ARM is also betting on core widening in other pipeline stages, especially in the frontend. The processor has eight parallel instruction decoders, which is also a record number for ARM’s cores (Apple also has eight decoders). These decoders can deliver eight instructions per cycle to the following processing stages. Dispatch then supports up to 10 micro-ops per cycle (not all instructions are decoded to one micro-op).

HWCooling has the Cortex-X4 as 8x instruction decode → 10x instruction dispatch per cycle (the discrepancy 8 vs 10 being that some instructions are decoded to 1+ micro-ops). Arm slides aren't specific on # decoders, but journalists were able to ask questions at the briefing, so I lean towards HWCooling here.

Though, HWCooling (and others) reported X4's micro-op cache was removed entirely, which is partially true. However, Gary Explains asked Arm and Arm shared that the dedicated micro-op cache was removed and instead the instruction cache now holds both normal instructions and micro-op instructions (each with identical dispatch & pipeline length).
 

FlameTail

Diamond Member
Dec 15, 2021
3,182
1,810
106
Cortex X4 even wider decode stage than A17 (10 vs 9). Bigger ROB than A17 (384 vs 321). I think this was already known.
Cortex X4 has a 18% Int increase vs X3 but a 39% power increase. +7% FP, +24% power increase!!


A17 P core has a 28% Int lead over the X4 while using just 2% more power.
A17 P core has a 50% Fp lead over X4 while using 5% more power.
X4 doesn’t match A15 in either Int or Fp performance

Source: Geekerwan but take with a grain of salt unless others also have similiar results
A17 P's ROB is 321? Source?

If that's true it means Apple has shrunk the ROB.

A15P had a 570 entry ROB iirc
 

FlameTail

Diamond Member
Dec 15, 2021
3,182
1,810
106
Do you guys think the Meteor Lake NPU will be stronger than the Hexagon NPU in the X Elite?
 

SpudLobby

Senior member
May 18, 2022
961
655
106
Considering that Apple A17 has TWO P cores are running at 3.78 GHz and 4 e cores running at 2.11 GHz

And this soc has 1 prime core running at 3.3 GHz and 2 cores running 3.15 GHz

not impressive.
Did you leave off the 3 other cores running at 2.97 GHz?
 

SpudLobby

Senior member
May 18, 2022
961
655
106
@ikjadoon it seems although ARM has closed the IPC gap with Apple P and Oryon, it seems they still have a long way to go for energy efficiency.
This was always the predictable result. IPC (well perf/GHz) of course is a directional indicator that probably means you can lower frequencies and get similar performance on a lot of code for less power due to quadratic voltage/f scaling — but how you get there can still differ in a way that means the X4 might still be less efficient than competitors of a similar class.

One thing in that vein is the cache.

Apple have 2.5x the X4’s total L1 and then their L2 is 16MB shared for two cores — even if symmetrical that’s 8MB (and they can access more than that).

And then they’ve had a 16MB SLC since the A13 and A15 was 32MB, A16 and A17 Pro are down to 24MB. The CPU can access that and it keeps data movement from DRAM less frequent.

QC & Arm made a step with the L2 & L3 on the 8 Gen 3 but they’re going to need a bigger punch to get to Apple.

This is also a good demonstration of why you shouldn’t expect Zen 5 mobile (on ST when you actually look at dynamic power draw like these videos do with the motherboard) to match Qualcomm or Apple at very low power. Even Arm can’t do that and they specialize in this.
 
Reactions: Tlh97

SpudLobby

Senior member
May 18, 2022
961
655
106
Arm's earlier ~7% GB6 uplift seems to have been measured with GB6.0. The latest testing is now with GB6.1+ and Arm had a major IPC uplift this year: +10% IPC uplift. That's stellar cadence.

Percentages are normalized to the X3.
  1. Apple A17 Pro - 774 pts / GHz (3.78 GHz / 2,926 pts) - 121.1%
  2. Qualcomm SDXE / Oryon - 750 pts / GHz (4.30 GHz / 3,227 pts) - 117.4%
  3. Qualcomm SD8G3 / Arm Cortex-X4 - 706 pts / GHz (3.30 GHz / 2,329 pts) - 110.5%
  4. Qualcomm SD8G2 / Arm Cortex-X3 - 639 pts / GHz (3.36 GHz / 2,146 pts) - 100.0%
Arm's Cortex-X4, in GB6 IPC, is really closing the gap vs Apple P-cores that, on average, have achieved very low IPC uplifts since 2020.

As we talked about in the Qualcomm Oryon threads, Oryon was delayed so long + Arm executed so consistently that the Cortex-X4 (devices launching 7+ months before Oryon devices) achieves 94% of the Oryon IPC. We've been waiting 3+ years for NUVIA's Phoenix uArch and it's now coming out that Arm's IPC was just right behind it all along.

That is incredible.
Yes indeed but remember something.

The IPC is instrumental. What we care about is efficiency & performance — and specifically performance at lower power/energy efficiency while maintaining great peak performance.

As these results show, Arm still use ~ 25% more power on a similar process node to lose to an an A16 or A15 by 8-14%. If they boost clocks from where they’re already at it’s possible it gets worse. Or alternatively I think clocking the A16 down a bit to match the X4 and it’d win on power efficiency given where it is on the curve.

Still it’s a great core given the area constraints wrt cache and all.


I also want to note this isn’t insanely surprising — if the X1 to X3 was still like a roughly 20-25% boost in IPC, people just weren’t paying attention. 8 Gen 2 can be found around 1400-1500 GB5 even for the 3.2GHz models (e.g. non-Samsung) and the power use at that performance is great. It’s been a slow but steady climb that motivated morons keep denying.

Hopefully we’ll see X4/5’s in laptops soon.
 
Reactions: ikjadoon and Tlh97

soresu

Platinum Member
Dec 19, 2014
2,968
2,192
136
Hopefully we’ll see X4/5’s in laptops soon.
I wouldn't expect Mediatek or Samsung to miss out on the feeding frenzy once WoA becomes a free for all, even RockChip might take a bite at some point.

THough I think nVidia is likely to be the first at that particular trough from the non QC crowd.

I just hope that they have something like ARM System Ready to keep it all standardised, the age of non standard ARM SoC's really needs to die now.
 

SpudLobby

Senior member
May 18, 2022
961
655
106
I wouldn't expect Mediatek or Samsung to miss out on the feeding frenzy once WoA becomes a free for all, even RockChip might take a bite at some point.

THough I think nVidia is likely to be the first at that particular trough from the non QC crowd.

I just hope that they have something like ARM System Ready to keep it all standardised, the age of non standard ARM SoC's really needs to die now.
I am mostly talking about Nvidia yeah. Though MediaTek I could see 2026/7
 

FlameTail

Diamond Member
Dec 15, 2021
3,182
1,810
106
So now that so many players are jumping into the ARM PC bandwagon in a new ARMs race for CPUs, who do you think will win?

Or to phrase it more appropriately, who do you think will be most successful?
 

FlameTail

Diamond Member
Dec 15, 2021
3,182
1,810
106
Nvidia's strength is their excellent GPU IP, but other than that, do they have anything going for them?

Qualcomm's strength is the experience they have in making ARM SoCs as well as being early to the WoA game. Their extensive IP set such as the Oryon CPU, Adreno GPU, Hexagon NPU, Spectra ISP, Sensing Hub and 5G modems combined makes for a very strong solution.

Samsung and Mediatek if they are entering this game will probably be making SoCs with stock ARM cores. Mediatek reportedly will license Nvidia GPU IP and Samsung already uses AMD's RDNA GPU IP in their Exynos Mobile SoCs.

Which puts Nvidia and AMD in a weird position if they are also entering the space themselves (!?).
 

FlameTail

Diamond Member
Dec 15, 2021
3,182
1,810
106
Samsung and Mediatek if they are entering this game will probably be making SoCs with stock ARM cores.
I say probably because the RGCloudS guy on twitter has been saying Samsung is cooking some exotic ARM cores. Apparently they are going to use some special modified Cortex X5s in their 2025 Exynos 'Dream Chip'.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |