Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

soresu · Nov 12, 2019

Nothingness said:
According to Andrei's review of iPhone 11, Cortex-A76 is slightly more power efficient than A11/A12/A13. But its performance is about that of Apple A10.

A77 should be about the same then, considering it reportedly has about the same perf/watt due to increased power consumption with that thicc 20% IPC gain.

Hopefully A78 focuses on perf/watt some more, I'd rather have an increase in perf/watt to go with 10% IPC gain than another 20% gain with no movement, unfortunately batteries just aren't gaining energy density fast enough.

thunng8 · Nov 12, 2019

Nothingness said:
According to Andrei's review of iPhone 11, Cortex-A76 is slightly more power efficient than A11/A12/A13. But its performance is about that of Apple A10.

That's not quite fair on the Apple chips. If Apple targeted the performance of the Cortex-a76 the Apple chip would be much more efficient. Even Andrei's estimate for power usage of the A12 at ~1.73Ghz (at that frequency the A12 would still beat the performance of the A76) is 0.71W - ie. about 3X more efficient than the A76.

soresu · Nov 12, 2019

thunng8 said:
That's not quite fair on the Apple chips. If Apple targeted the performance of the Cortex-a76 the Apple chip would be much more efficient. Even Andrei's estimate for power usage of the A12 at ~1.73Ghz (at that frequency the A12 would still beat the performance of the A76) is 0.71W - ie. about 3X more efficient than the A76.

Great, now scale the A76 down to the same performance delta it previously had against Axx in the first place and recalculate again.

You have forgotten that both sides can play that game on any given process.

The point is not what frequencies they could use, but what frequencies they DO use in shipping products.

Speculation beyond that it meaningless, as much as myself lamenting AMD clocking their GPU's too high to compete against nV with small dies, and shredding their perf/watt efficiency in the process.

Apple use their own little cores to cover the lower end of the power spectrum anyway, so that point is moot.

Nothingness · Nov 13, 2019

thunng8 said:
That's not quite fair on the Apple chips. If Apple targeted the performance of the Cortex-a76 the Apple chip would be much more efficient. Even Andrei's estimate for power usage of the A12 at ~1.73Ghz (at that frequency the A12 would still beat the performance of the A76) is 0.71W - ie. about 3X more efficient than the A76.

I'm not sure where this A12 @~1.73 GHz comes from. Can you please post a link? This looks like measurement of A13 little core.

thunng8 · Nov 13, 2019

Nothingness said:
I'm not sure where this A12 @~1.73 GHz comes from. Can you please post a link? This looks like measurement of A13 little core.

It’s near the bottom of this page

The Apple iPhone 11, 11 Pro & 11 Pro Max Review: Performance, Battery, & Camera Elevated

www.anandtech.com

no estimate for a13 but it should do even better at the same performance level.

thunng8 · Nov 13, 2019

soresu said:
Great, now scale the A76 down to the same performance delta it previously had against Axx in the first place and recalculate again.

You have forgotten that both sides can play that game on any given process.

The point is not what frequencies they could use, but what frequencies they DO use in shipping products.

Speculation beyond that it meaningless, as much as myself lamenting AMD clocking their GPU's too high to compete against nV with small dies, and shredding their perf/watt efficiency in the process.

Apple use their own little cores to cover the lower end of the power spectrum anyway, so that point is moot.

Do you have any figures or estimates for low frequency a76? I have my doubts given how inefficient it is at 2ghz+

soresu · Nov 13, 2019

thunng8 said:
Do you have any figures or estimates for low frequency a76? I have my doubts given how inefficient it is at 2ghz+

The key word there is estimate, neither A76 nor Axx are used at those lower frequencies because of the little cores they are paired with, for precisely the point of covering the lower end of the performance/power spectrum for lighter workloads - even Apple realised the benefit of BigLittle for workload power adaptivity, a necessity considering their big cores power consumption at load.

A55 is designed for significantly lower power than than the 700mW you quoted - 100-250mW on the leaked 2015 ARM roadmap using its codename Ananke, though this was either 16/14nm or 10nm targeted, so likely lower still on 7nm.

Apple's own little cores likely have somewhat higher consumption than A55, but still similarly low consumption relative to the big cores.

Considering as Nothingness pointed out, A76 and Axx have similar perf/watt at their stock clocks, A76 is clearly not as inefficient as you are think it is, and there is no reason to believe it would not scale in efficiency at lower clocks similarly to Axx big core on the current process.

Nothingness · Nov 13, 2019

soresu said:
Apple's own little cores likely have somewhat higher consumption than A55, but still similarly low consumption relative to the big cores.

You have the data for A12 little core here. It obviously puts to shame the big core efficiency

soresu · Nov 13, 2019

Nothingness said:
You have the data for A12 little core here. It obviously puts to shame the big core efficiency

I guessed as much, I wonder given that A65 is OoO whether the successor to A55 in the mainstream will be OoO too - and how much they can get out of in order designs efficiency wise before OoO is necessary to keep making forward progress.

soresu · Nov 13, 2019

Nothingness said:
You have the data for A12 little core here. It obviously puts to shame the big core efficiency

From the end of that page:

"In the face-off against a Cortex-A55 implementation such as on the Snapdragon 855, the new Thunder cores represent a 2.5-3x performance lead while at the same time using less than half the energy."

Something is very off about that, considering the leaked 2015 roadmap put A55 as 100-250mW, yet they say Thunder is using half the energy while the lowest reading on their own charts says 290mW.

That's not even accounting for 7nm either, which wasn't in the picture when that roadmap was made.

Andrei. · Nov 13, 2019

soresu said:
From the end of that page:

"In the face-off against a Cortex-A55 implementation such as on the Snapdragon 855, the new Thunder cores represent a 2.5-3x performance lead while at the same time using less than half the energy."

Something is very off about that, considering the leaked 2015 roadmap put A55 as 100-250mW, yet they say Thunder is using half the energy while the lowest reading on their own charts says 290mW.

That's not even accounting for 7nm either, which wasn't in the picture when that roadmap was made.

Energy isn't power... those Arm figures are CPU only figures yet there's also memory and other things that add to it.

I tested the little cores earlier in the year: https://images.anandtech.com/doci/14072/SPEC2006eff-overview.png

The A13 Thunder cores in that graph are

int2006: 380mW, 3167 Joules, 14.83 SPECspeed
fp2006: 440mW, 2619 Joules, 13.44 SPECspeed

Apple's only around 10% higher power on the SoC level while providing 3x the performance, resulting in half the energy usage.

soresu · Nov 13, 2019

Andrei. said:
Energy isn't power... those Arm figures are CPU only figures yet there's also memory and other things that add to it.

I tested the little cores earlier in the year: https://images.anandtech.com/doci/14072/SPEC2006eff-overview.png

The A13 Thunder cores in that graph are

int2006: 380mW, 3167 Joules, 14.83 SPECspeed
fp2006: 440mW, 2619 Joules, 13.44 SPECspeed

Apple's only around 10% higher power on the SoC level while providing 3x the performance, resulting in half the energy usage.

Edit: Nvm, read your post incorrectly about the link.

Still though, A55 is now 2.5 years old by announcement date, I wonder how A65/E1 fairs in power efficiency by comparison.

soresu · Nov 13, 2019

Given Apple seem to be improving their little cores year on year, I wonder if ARM's R&D has been boosted by SoftBank to achieve a faster cadence on little core improvement - A53 to A55 was certainly a looong time in the IT business.

name99 · Nov 15, 2019

Richie Rich said:
Could someone help to explain A12 Vortex core dis-balance, please?
We know that 50% of instructions are load/store, from those are mostly load ones. Given that it means Vortex core has a significant dis-balance between 6xALUs and 2xLSU.

Second thing is the performance. The two aditional ALUs are simple/branch shared type. This could bring theoretically approx +20-30% IPC? However Vortex deliver +58% of IPC over Skylake which is 3x more than what is expected. Combination of dis-balance and high performance is mystery. There must be something smart inside.

Did Apple engineers developed some new advanced technique at reorder buffer? Something like load ROB predictor was on Conroe? Or are they using such a large instruction window that they can extract very high ILP and eliminate these costly load/store instructionsat the same time?

To answer your questions somewhat:
(a) your load/store characterization is not quite correct. Obviously it depends on the workload, compiler, and ISA, but 25% loads and 10% stores are better approximations.

http://www2.engr.arizona.edu/~tosiron/papers/2018/SPEC2017_ISPASS18.pdf

(ARMv8 can require fewer load/stores because of pairing, but then the rich ISA [things like short shifts, and the fancy MOVs and CSELs that can modify data in easy ways] means there are also fewer logic instructions. So overall you get about the same numbers, maybe 24% loads and 9% stores for ARMv8 vs the x86-64 numbers above. See eg
https://arxiv.org/pdf/1607.02318.pdf )

(b) so the balance is reasonably OK, ie 1/3. The next thing you have to remember is that numbers like the above give global averages, but performance happens over a window of a few hundred instructions. What matters as much as how many units of various types you have is how much flexibility you have (in queue depth and reordering capabilities) to cope with temporary deviations from these averages. You may have a long stretch (think copying a large data structure) that's mainly load/stores. Or a long stretch that's primarily ALU or FP instructions. If you're dominated by load/stores for a run longer than the OoO window, obviously you're throttled by the number of LS units; likewise if you're dominated by ALU work then you're limited by the number of ALUs.

(c) so what to do? Yes, in an ideal world you provide 50 LS units and 100 ALU units and you're never throttled by anything. In a non-deal world, you have to make tradeoffs.
LS units are ferociously complicated. ALU units are basically simple.
SO
it makes sense to provide more ALU units than the naive averages suggest...
Yes, sure, much of the time you won't use those extra ALUs (certainly not when copying data, or in "balanced" loops that read/write data and perform a fairly trivial manipulation).
BUT there will be some loops that are dominated by ALU operations, loops where every value you read in gets manipulated a lot before the next value is read. And for loops like THAT, the extra ALUs will kick in and speed things up.

CPU design is always a balancing act. The things you have to balance include the fact that code comes in phases. Looking at windows of 1000 instructions or so, there will be phases that are LS rich, ALU rich, FPU rich, branch rich... Given that, you want extra backup capability along every dimension! This isn't practical, but it IS practical to add backup capability along the easiest dimensions.
ALU is easiest, FPU second easiest, branch seems trivial but to be useful needs a lot of extra support in the fetch front-end so that's probably third in line to get backup, and LS is definitely the hardest to grow.

name99 · Nov 15, 2019

Nothingness said:
Any ARM Apple-based laptop would be high-end and as expensive as an Intel-based one (and I'm not sure you can find an Intel-based laptop at $300 running a core CPU less than 3 years old).

This is not obvious. iPads go as low as $330. The basic elements of a laptop are there, apart from keyboard, and quality elements (not superb quality like iPad Pro, but the screen, battery, storage, speakers, ... are all still high class not bargain basement).

So it's really a question of how Apple might want to restructure the market once they have total control (ie their own SoC driving laptops). They MIGHT want to stick with the current strategy of "cheap" MacBooks at around $1K and "Pro" at ~$2K+. But they probably have flexibility (in terms of component costs) to drop both of those quite a bit.

More interesting is that they could do something like split the laptop line into three. Keep MacBooks and Pro's much like today, but also introduce something like a "MacBook Lite" that is essentially the HARDWARE of an iPad in a laptop form factor, and running MacOS. Sure, we know that you don't want a Mac with only 128 GB of storage, maybe 6GB of RAM, maybe a 12" screen, maybe one USB-C port.

But a LOT of people would be quite happy with a machine like that, especially if it sells at maybe $550... Apple might even be able to slim MacOS on ARMv8 down to working on 64GB storage and 4GB of RAM, if they move to asset caches and allow things like foreign language strings or drivers to be pulled in over the internet. (One reason MacOS feels a lot heavier than iOS is that any MacOS install can be used on any Mac; if you drop that capability you can remove a fair bit of redundant material.)

Basically if you keep the parts of MacOS that are relevant (eg it's keyboard+trackpad based, uses multiple windows, ...) but drop some functionality that most people don't care about in an iOS way (like OS portability from one device to another, using device-specific asset caches, using the cloud more aggressively for some storage purposes) you can maybe hit $450...

name99 · Nov 15, 2019

beginner99 said:
It's not the core that is the problem, it's the end products that are needed. I've written this dozen times in these apple ditches x86 threads. For laptop up to imac Apple might get a way with a single 8-core chip/soc design as the lower end laptops could reuse the ipad "x" chip version. But where it fails is the mac pro which you now can soon get with up to 28 cores. That market is far, far too small to warrant a custom chip just for it. And given that the new version will release shortly and stay around for at least a couple years, I just don't see them ditching x86.
It's the small size of the mac pro market. Either apple stays on x86, moves to ARM and kills mac pro or still moves and takes loses on a custom mac pro chip. Occams razor. Which is the easiest and most likely path? And note I'm saying x86, this could be AMD too.

(a) Apple can create manycore systems through chiplets.

(b) The market for manycore devices can grow substantially if Apple could pull the price down. Something like the iMac Pro is a very nice machine, apart from the CPU, in terms of everything else (IO, speakers, SD card, fan, ...) being top of the line. If dropping the Intel tax means Apple can sell those at say $4000 rather than $5000, with 16 cores, your demand for chiplet-based SoCs starts to go up...

(c) Apple runs lots of server farms. It makes sense (financial, performance, security, control) for them to move those to Apple cores as soon as practical. That's another huge target for chiplet based SoCs.

beginner99 · Nov 15, 2019

name99 said:
(a) Apple can create manycore systems through chiplets.

Then they would need chiplets and IO die in a laptop. it's cool for server design but in power sensitive workloads? not so much. If you only use the chiplets for mac pro, nothing changes as the market still is way too small.

name99 said:
(b) The market for manycore devices can grow substantially if Apple could pull the price down. Something like the iMac Pro is a very nice machine, apart from the CPU, in terms of everything else (IO, speakers, SD card, fan, ...) being top of the line. If dropping the Intel tax means Apple can sell those at say $4000 rather than $5000, with 16 cores, your demand for chiplet-based SoCs starts to go up...

iMacs are niche in the big picture and desktop is only going down. I'm not saying it's not possible but certainly less profitable than just squeezing intel. Apple always sequeezes their suppliers. Apple probably pays far less for the intel chips than we think.

name99 said:
(c) Apple runs lots of server farms. It makes sense (financial, performance, security, control) for them to move those to Apple cores as soon as practical. That's another huge target for chiplet based SoCs.

Do they really run their own servers? A quick search tell me they use amazon which to me was obvious. Running servers isn't apples core business and it's much more than about hardware. In fact the hardware is the easy part. I get your point but think about it, that would be a huge, huge risk. It's all possible but the most simple and almost certainly cheapest way forward for apple is to keep their mac line on x86.
Again they are releasing a new x86 based mac pro now. They aren't going to announce switcing to ARM anytime soon (not within 1 year certainly) or no one will buy this new mac pro.

jpiniero · Nov 15, 2019

beginner99 said:
Do they really run their own servers? A quick search tell me they use amazon

IIRC They do both but the on premise stuff is not Apple designed.

IntelUser2000 · Nov 16, 2019

Dell XPS 13 7390 2-in-1 "Icelake" review.

Some interesting results.

Compare these two reviews using the GFXBench Aztec Ruins benchmark.

The Dell XPS 13 7390 2-in-1 Review: The Ice Lake Cometh

www.anandtech.com

The Apple iPhone 11, 11 Pro & 11 Pro Max Review: Performance, Battery, & Camera Elevated

www.anandtech.com

The GPU performance is simply incredible on the A13 especially when you compare to the PC camp.

This leads me to believe the reasons for deficiency from the x86 camp aren't from ISA, but due to problems with execution, plus Apple's chip design teams having stellar execution.

Those who were following Intel vs AMD for many years are very aware of both companies difficulties regarding execution.

beginner99 · Nov 16, 2019

IntelUser2000 said:
The GPU performance is simply incredible on the A13 especially when you compare to the PC camp.

This leads me to believe the reasons for deficiency from the x86 camp aren't from ISA, but due to problems with execution, plus Apple's chip design teams having stellar execution.

Those who were following Intel vs AMD for many years are very aware of both companies difficulties regarding execution.

That intel iGPUs are medicore is nothing new really. Also these tests don't show anything about image quality which we now on the PC side there has been a lot of cheating in the early years.

About the execution on the CPU side (I think no one doubts intel iGPU aren't good), see if you design your chips to be in about 3 mobile devices you can optimize it for exactly that including the process node. Intel optimizes process nodes for frequency and performance and not low power. Same for the cores themselves. And same for AMD, especially on the GPU side albeit AMD gpus are as we now rather lacking in efficency.

soresu · Nov 16, 2019

IntelUser2000 said:
This leads me to believe the reasons for deficiency from the x86 camp aren't from ISA, but due to problems with execution, plus Apple's chip design teams having stellar execution.

More likely market segmentation in AMD's case - they want to keep a lower end market for dGPU's, so it's a delicate balance between making the APU's seem impressive, and not having them interfere with dGPU market share.

soresu · Nov 16, 2019

A new custom contender called Nuvia could be entering the ARM market, found this on Phoronix.

Link.

soresu · Nov 16, 2019

IntelUser2000 said:
The GPU performance is simply incredible on the A13 especially when you compare to the PC camp.

For all the good it does them, there hasn't been much real movement on the iOS gaming scene for a while, same with Android when you discount the PUBG/Fortnite stuff.

The average flagship phone has way more power than needed to port XB360/PS3 games, the GTA mobile ports stopped with the GTA3 gen (PS2), yet most games from that gen are not on either Android or iOS.

CPU wise they are far beyond PS4/XB1 at the high end, and GPU is fast catching them up.

We still don't have an official Half Life 2 port for Android outside of the SHIELD exclusive which requires full OpenGL driver support to run.

I had hopes when we saw Titan Quest and KOTOR get ports, but that push seems to have dried up completely now.

Google has no imperative to drive native Android games with Stadia being their priority now - I'm not sure on the Apple front, but I'd wager they are pursuing a similar path.

name99 · Nov 16, 2019

soresu said:
For all the good it does them, there hasn't been much real movement on the iOS gaming scene for a while, same with Android when you discount the PUBG/Fortnite stuff.

The average flagship phone has way more power than needed to port XB360/PS3 games, the GTA mobile ports stopped with the GTA3 gen (PS2), yet most games from that gen are not on either Android or iOS.

CPU wise they are far beyond PS4/XB1 at the high end, and GPU is fast catching them up.

We still don't have an official Half Life 2 port for Android outside of the SHIELD exclusive which requires full OpenGL driver support to run.

I had hopes when we saw Titan Quest and KOTOR get ports, but that push seems to have dried up completely now.

Google has no imperative to drive native Android games with Stadia being their priority now - I'm not sure on the Apple front, but I'd wager they are pursuing a similar path.

Apple's game priority is Apple Arcade:

Apple Arcade - Wikipedia

en.wikipedia.org

There are multiple ways this COULD go. If Apple sees a lot of traction, but also people want more a console-like experience with more oomph, they could upgrade the Apple TV 4K (currently at A10X) to an A12, or an A12X (or even an A13X if this is an iPad upgrade year).

Alternatively Apple could conclude that hard-core gamers are just not a desirable market (there's at least a segment that is barely socialized so always complaining, constantly trying to harass women, gays or any other victim they can find, always trying to find a way to avoid paying or to hack/ruin the experience for others), and as a business it's just much nicer serving the casual gamer market...

I'm not a gamer so I can't comment on what most people want from games, but putting together work Apple has been doing, *I* think there is scope for Apple to introduce some interesting (and totally innovative) AR based games. The idea would be to connect a future aTV (at least A13 based, to get the latest AR hardware with person occlusion) to a camera. The camera could be an iPhone place on the TV, or a dedicated camera.
Once you have a setup like this you can do traditional body-motion type games (think Wii or XBox, DDR type games) by tracking body motion through vision.
BUT you can also create a whole new type of game by pulling the occluded body out of the stream of living room video and putting it into the game context. So you would see your whole body (with your clothes and full appearance) in the game, moving as you move.

Overall I think the way to view Apple's game strategy is that usually Apple isn't interested in copying other company strategies, once you look carefully. So while Netflix is about quantity, aTV+ is about quality --- not that many shows, but each one good enough that people view it as must-see. Likewise I suspect their game strategy is not "how can we provide tech and games that are exactly like consoles?" it's going to be first
- "how can we appeal to people who aren't interested in owning a console?" and then
- "how can we provide a type of game experience that is so hooked into Apple technology that it can't be replicated on a PC or console?"

An obvious way to pursue that second goal would be to create games based on Apple TV+ properties (_See_ and _For All Mankind_ are obvious contenders); but that's kinda been done before. It might happen anyway; but games based on cameras and people occlusion, or on networking together large numbers of iPhones, or on everyone using a central aTV plus an individual iPhone strike me as a better direction for them.
Think something like having a Thanksgiving or Christmas family night where the people who can't get home for the holiday still play Pictionary or whatever via aTV's and iPhones (and lots of camera usage...)

name99 · Nov 16, 2019

jpiniero said:
IIRC They do both but the on premise stuff is not Apple designed.

Apple has a large (and constantly growing) set of data centers. The entire collection is not public knowledge, but if you scroll down this article:
https://9to5mac.com/2018/08/15/look-inside-apple-data-center/
you see a photo that indicates at least some of them.
5 large ones are known in the US [with a sixth on the way] and multiple small ones, 2 large in Europe, and the article photo suggests some sort of (likely still small) centers in at least Brazil, South Africa, Australia, and scattered over Asia.

The data center CONSTRUCTION budget (that's not including filling them or on-going operations) has been (and continues to be, for the foreseeable future) around $2B/year.

They have used (and likely continue to use) AWS, Azure, and Google Cloud. This should hardly surprise anyone. Those clouds offer various specialized services, and it would make sense for various parts of Apple to use them as appropriate. eg when a team starts exploring a new idea for Machine Learning, run it at first on Google, get it robust, start profiling it, then, over the next few years, transition it as appropriate in-house.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Platinum Member

Member

Platinum Member

Platinum Member

Member

Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member

Platinum Member

Senior member

Senior member

Senior member

Diamond Member

Lifer

Elite Member

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Senior member