Solved! ARM Apple High-End CPU - Intel replacement

Page 43 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
Reactions: vspalanki
Solution
What an understatement And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
A78 is doesn't have +20% IPC over the A77. What the article says is that an A78 has "20% sustained performance" over an A77, but that figure also includes the process-dependent gains you get going from 7nm to 5nm.

From the figures on Page 4, a 3 GHz A78 (5nm) has +20% performance compared to a 2.6 GHz A77 (7nm), while a 2.1 GHz A78 has the same performance as a 2.3GHz A77 at half the energy. So the IPC gain is actually somewhere in the 4-9% range.
Yeah, I read it wrong and in a rush to get to the X1 part.
 
Reactions: spursindonesia

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
The real argument here isn't ARM vs x86 ISA.

It's ARM ecosystem versus x86 ecosystem.

Back 25 years ago, they called it "RISC vs CISC", or "RISC vs x86".

Now when you realize ISA differences are about that you can include CPUs such as IBM's Power.

Ecosystem differences easily dwarf any technical differences in the ISA. When you have say 3-4x more resources and research being put in and 10x more manpower being invested, of course it'll do good!
 

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
Yeah, according to their slide on page 4 of the article, in Spec2006 it's 7% IPC improvement over A77 for A78 and 30% for X1.
Yeah, oddly a balanced improvement across integer and FP performance for once in X1.

A shame it will be probably be sometime in the next century before we will see an SBC implementation of X1 - though I live in hope of being proven wrong here.

The fact that they chose the wife of Zeus (the next Neoverse core) as the codename for X1 seems significant, I think this means N2 will be Hera based now - given that A76 was codenamed Enyo, the wife of Ares (N1 codename).

If so that would give a tidy 1.56x IPC boost for N1 -> N2 from core alone, albeit at a power penalty to match

Given we already have at least one codename for next year with Matterhorn, it would be weird to match that with Poseidon for N3 - so I'm putting a guess on a final Greek core codenamed Amphitrite/Thetis, which may be the supersize me variant of Matterhorn, as Hera is to Hercules.

Though I can't quite see from the article whether Austin made both A78 and X1, or just A78 and left the X1 turbo charging to another design center.
 
Last edited:
Reactions: Tlh97

MrTeal

Diamond Member
Dec 7, 2003
3,584
1,743
136
Yeah, oddly a balanced improvement across integer and FP performance for once in X1.

A shame it will be probably be sometime in the next century before we will see an SBC implementation of X1 - though I live in hope of being proven wrong here.

The fact that they chose the wife of Zeus (the next Neoverse core) as the codename for X1 seems significant, I think this means N2 will be Hera now - given that A76 was codenamed Enyo, the wife of Ares (N1 codename).

Given we already have at least one codename for next year with Matterhorn, it would be weird to match that with Poseidon for N3 - so I'm putting a guess on a final Greek core codenamed Amphitrite/Thetis, which may be the supersize me variant of Matterhorn, as Hera is to Hercules.

Though I can't quite see from the article whether Austin made both A78 and X1, or just A78 and left the X1 turbo charging to another design center.

From the AT article on the first page
The X1 is much alike the A78 in its fundamental design – in fact both CPUs were created by the same Austin CPU design team in tandem, but with the big difference that the X1 breaks the chains on its power and area constraints, focusing to get the very best performance with very little regard to the other two metrics of the PPA triangle.
 

awesomedeluxe

Member
Feb 12, 2020
69
23
41
Server rooms are always thermally constrained at some level. The question is whether or not your application(s) can benefit from adding more cores. If Graviton2 is indeed a 110w SoC or less, I could deploy twice as many sockets for Graviton2 as EPYC 7742 assuming scaling is there. There are also issues like VM response time to consider. Sometimes your application requires higher-frequency cores at the expense of efficiency, and that's why both AMD and Intel provide server CPUs to fill that niche.
Oh, for sure. When I say "that might matter less if you are designing a machine that's thermal constrained such that you can literally fit twice as many ARM cores" what I mean is "that might matter less if you are designing a laptop."

Though, I'm not sure how much more Apple's laptops can benefit from "moar cores" either. I definitely see the upside in small machines that can now run 8+4 ARM cores when they previously struggled to accommodate four x86 cores. But it's not clear to me whether a 16" macOS laptop would be able to scale up the A14 effectively, much less an iMac. Sure, they can fit 16 or 32 cores high performance cores, but can they use them?
 

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
Ooof, Samsung's unreleased M6 was going to be an 8 wide uArch according to some new details released just recently at ISCA 2020. Link here.

Ambitious if foolish - if they couldn't do 6 wide right then 8 wide would likely have been throwing a lot of money down the toilet.

They have a table on the main Anandtech page showing uArch details from M1 to the unreleased M6:

 
Last edited:
Reactions: kurosaki and Tlh97

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
Near full support of OGL ES 2.0 for the Bifrost uArch/compiler in the Panfrost open GPU driver.

This is a huge win for fully open use of ARM SoC's using Mali Bifrost GPU's like the Rockchip family (RK3588 wink wink).

Link here.

Obviously there's still quite a lot to do to go from there to OGL ES 3.0 or Vulkan, but it's a decent start and amazing progress for an open effort on a decidedly open averse GPU vendor.
 
Reactions: Tlh97 and Schmide

Tabalan

Member
Feb 23, 2020
41
25
91
Ooof, Samsung's unreleased M6 was going to be an 8 wide uArch according to some new details released just recently at ISCA 2020. Link here.

Ambitious if foolish - if they couldn't do 6 wide right then 8 wide would likely have been throwing a lot of money down the toilet.

They have a table on the main Anandtech page showing uArch details from M1 to the unreleased M6
Well, in same article you have:
During the Q&A of the session, the paper’s presenter, Brian Grayson, had answered questions about the program’s cancellation. He had disclosed that the team had always been on-target and on-schedule with performance and efficiency improvements with each generation. It was stated that the team’s biggest difficulty was in terms of being extremely careful with future design changes, as the team never had the resources to completely start from scratch or completely rewrite a block. It was said that with hindsight, the team would have done different choices in the past with of some of the design directions. This serial design methodology comes in contrast to Arm’s position, having multiple leapfrogging design centres and CPU teams, allowing them to do things such as ground-up re-designs, such the Cortex-A76.

Maybe if Samsung allocated more resources into SARC, results would be better. Also, this graph indicates bigger jump of performance for M6 core.

 

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
Maybe if Samsung allocated more resources into SARC, results would be better. Also, this graph indicates bigger jump of performance for M6 core.
To be fair, the planning, prep and efficiency of ARM's R&D roadmap even before SoftBank was incredibly effective.

That's what comes from having less to work with, you adapt and get more efficient, less wasteful - as opposed to a certain green GPU company and their brand name *Works software initiative, which is pathetically sloppy optimisation wise for SW written by a company with such huge resources.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
So I's here finally. Apple introduced transition to ARM:


"The iPad core is faster than the vast majority of PC laptops"



  • "Will give a whole new level of performance"
  • Apple is making a series of SoCs specifically for the Mac which will have a common architecture with the other Apple SoCs.
  • Apple has introduced a new binary format, Universal 2, to include both x86 and Arm binaries (fat binary's revenge)
  • The Apple Development Platform is using the A12Z
  • Basically confirming that Apple's most critical application developers are on-board with the ISA change
  • Now showing an Arm build (native!) of Microsoft Word and Excel, Photoshop, Final Cut Pro
  • Rosetta emulation with JIT


  • First system by end of year
  • Two year transition time line
 

Doug S

Platinum Member
Feb 8, 2020
2,459
4,014
136
So it's only the dev platform available in Q4 2020?

No the dev platform is available now or very soon to developers. The real machines based on an A14 design and 5nm process will be available before the end of the year.

I look forward to Anandtech doing a big benchmark article comparing those Macs with Intel and putting to rest all the arguments in these forums about GB5 and SPEC2006 somehow being rigged in Apple's favor.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
My comments:
  • Universal 2 fat binary including x86 and ARM binary isn't new. Apple already did the same with PPC->x86 transition.
  • Transition to ARM is very conservative and late move from Apple. Normally from technical point of view they could move to ARM since A11 Monsoon 6xALU core three years ago.
  • Apple ARM transition was caused mainly by ARM LLC big push from Cortex cores. If you look into 2016 when Softbank bought ARM Holding this the beginning of Cortex X1 AKA Hera development and maybe also Matterhorn ARMv9 and SVE2 architecture. Those are huge changes and leap frog performance for every Snapdragon and Chinese Mediatek/HiSilicon manufacturers. Apple had to move to ARM because staying with x86 would be disaster from performance point of view. And very embarrassing since Apple has it's own best ARM cores.
  • But what surprises me a lot is MS Word and Excel in native ARM MacOS binary while MS's own Windows on ARM is using emulation for Surface X Pro.
  • All other native ARM SW from Adobe etc. shows that we can expect a lot of SW also for Win10@ARM. Normally it would take 5 years to adopt ARM ISA but with Apple it will take 1 or 2 years. This is huge.
  • ARM can take majority of servers and PC in time range between 2-5 years. And this is not a joke but very hard reality. Just look at PPC/IPC table what the best performers are right now.
 

Doug S

Platinum Member
Feb 8, 2020
2,459
4,014
136
Seconded. I would even like to see AT do some benchmarks using the dev machine, if possible. No more Geekbench, no more SPEC. Show us some actual application performance.

The dev machine license prohibits benchmarking, which is pretty common for prerelease stuff. Apple doesn't want people to get a wrong idea of how the machines you can buy will perform. We'll just have to wait a bit longer.
 

LightningZ71

Golden Member
Mar 10, 2017
1,652
1,938
136
I suspect that the Dev machine is likely quite I/O hobbled. While I have no factual evidence to back this up, it strikes me as odd for Apple to have integrated the kind of storage and ram throughput that a desktop system would normally have into the A12Z, thus limiting its performance in several metrics as opposed to what a production system that ships later this year might have. I suspect that we might see low end laptops ship with dual channel LPDDR4X (about half the width of an equivalent dual channel DDR4 volume laptop from AMD and Intel has as LPDDR4X has half the bits per channel) and may also only include two PCIe lanes for the NVME SSD. Higher end MacBook Pro systems might come with their higher end SoC that has twice the memory channels and perhaps 16 external PCIe lanes, 8 for a dGPU and 2 X 4 for NVME SSDs.

The Dev platform, being based on the A12Z, is likely memory bandwidth constrained and storage I/O constrained in a way that a production system won't be, meaning that it may not be a good measuring stick for achievable performance for shipping systems while still giving usable performance for dev work.
 
Reactions: scannall

Doug S

Platinum Member
Feb 8, 2020
2,459
4,014
136
Making a dev machine that's slower than the ones you'll sell to consumers is a smart strategy (though perhaps not deliberate as such)

If something runs well on the A12Z dev platform, you know it will run better on the much faster A14 based retail machines. If they have faster memory bandwidth and I/O than the dev machines all the better. We'd have better software all around if developers used (or at least did most of their testing on) lower end hardware.
 

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
But what surprises me a lot is MS Word and Excel in native ARM MacOS binary while MS's own Windows on ARM is using emulation for Surface X Pro.
You assume that they are using identical UI code on Windows and Mac, which they likely are not given the difference in approach you just described.

Microsoft are far more touchy about retaining backwards compatibility on Windows due to enterprise customers who are themselves extremely gun shy of changes for good reasons of their own, so I imagine that until MS have the Windows x86 Office codebase running 1:1 with a native ARM codebase that they will not transition to it.

A similar problem existed for the Arnold path tracing renderer during development of the new OptiX GPU backend.

They did not announce it as 'production ready' until the output from both GPU and CPU render backends were 1:1 identical, or at least visually so - as not all projects may be possible to render all on the same system and million dollar projects should not have mismatched render output.

Is it possible that the 'native' version on Mac is in fact just the iOS/iPadOS version of Office?
 

soresu

Platinum Member
Dec 19, 2014
2,921
2,142
136
All other native ARM SW from Adobe etc. shows that we can expect a lot of SW also for Win10@ARM. Normally it would take 5 years to adopt ARM ISA but with Apple it will take 1 or 2 years. This is huge.
This depends on API's and the IDE as much as CPU ISA recompilation and any necessary assembly rewrites.

Moving from win32 to UWP is still not a trivial change even within a purely x86 app - a fact demonstrated by the length of time it took to get a port of the Kodi media center on the XB1 store.

What interests me more is the possibility of native game engine releases on ARM platforms meaning that they no longer require emulation to run - which is basically almost the entire Windows game catalog.

The rise of DXVK shows that it is far faster computationally to translate API calls from Direct3D to Vulkan than it is to dynamically recompile x86 to ARM code.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,459
4,014
136
Bah! So much for proper benching of A12x/A12z.

Who cares, that's a nearly two year old core and we already have performance information on it from Anandtech's A12X article. I'm willing to wait another six months or so to get information on a brand new 5nm core in hardware designed around it rather than stale data on a two year old core on a hacked iPad board shoehorned into a Mac Mini case.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |