Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

RetroZombie · Mar 30, 2020

Richie Rich said:
It's actually 60% and prediction is for growing. I'm sorry to put only 55% - this might let you think x86 has a chance but it doesn't

For the ones playing games like hearts and spades sure.

Games more powerful than that you will hit the pc wall where the devices in order to play the game must comply with:

- OS version

- RAM amount

- CPU # cores/performance

- Screen resolution

- Power of the GPU

- Specs of the GPU

- Storage space

- ....

A much worst scenario than what already exists in the pc space.

Doug S · Mar 30, 2020

Carfax83 said:
My argument has always been that while the A13 could be scaled up, it would need to undergo significant changes to the cache hierarchy and the microarchitecture itself, which would result in reducing the very high single threaded IPC you've been raving about this entire thread.

You keep saying this, but offer ZERO support for such a fantastic claim. What are the changes Intel made to the "microarchitecture itself" for the server line vs the consumer line. There are NONE, unless you count the addition of wider AVX instructions which have zero impact on the overall IPC.

NostaSeronx · Mar 30, 2020

Doug S said:
You keep saying this, but offer ZERO support for such a fantastic claim. What are the changes Intel made to the "microarchitecture itself" for the server line vs the consumer line. There are NONE, unless you count the addition of wider AVX instructions which have zero impact on the overall IPC.

SkylakeX has 2x 512-bit FMA support, Skylake has 2x 256-bit FMA support
GoldmontX has 4 MB L2 per four cores, Goldmont normal is 2 MB per four cores and Goldmont plus is 4 MB per four cores and architecture overhaul.
TremontX and Tremont also have similar configs, 2x Front-end to 1x Front-end, multi-clustered execution or memory configs, yadda yadda.

Willowcove is suppose to be redesigned in a couple configs as well.
Mobile WLC for Field-class processors (Intel calls this one the Apple-killer)
Normal WLC for Lake-class processors (Intel employees who have left or are leaving consider this to be the best core in existence)
Server WLC for Rapids-class processors (Ha ha no one is more server than me - that Intel employee)

Sunnycove was also suppose to have split implementations. Sunnycove being highest density track height, and SunnycoveX being lowest density track height. SunnycoveX was suppose to be the first 10nm 5 GHz processor.

Apple being the leadingest edge of ARM can do multiple configs of their cores. One for mobile, one for normal, one for HEDT with ease.

Where would Apple get high-end experience from?

That is right from their pal Intel. Yoinked modems, might as well yoink core team as well.

Richie Rich · Mar 30, 2020

DrMrLordX said:
It appears that Amazon has failed to do just that. Care to explain why?

Amazon failed to scale? Graviton2 offer same performance at half of the price (64v instances 40% cheaper, 16v instances 53% cheaper!) . This is x86 chainsaw massacre and huge win for Amazon's customers (by an old and outdated A76). Imagine new A77 there instead....

Also we don't know what is the typical demand from average customer. I'm pretty sure that Amazon with ARM did some typical load simulation for G2 and they find out that 32MB L3$ is just fine for most of their customers. G2 estimated die size for 32MB L3$ 350mm2 or 384mm2 for 64MB L3$ unveils that the 64MB version is also manufacturable pretty easy. Also 80-core Ampere Altra is 32MB only. IMHO They know very well what they're doing.

Richie Rich · Mar 30, 2020

soresu said:
It depends what you mean by gaming - most games on mobile platforms are either casual or ports of very old games.

They stopped porting GTA games at GTA3 and went no further, whether because of space issues or bandwidth, who knows - but there is certainly problems in that market.

You can't really call it competitive until a significant portion of AAA PC/console games are making it to mobile within less than a year, and right now that is not even close to true. The closest comparison is Switch with some AAA wide market releases, but even then that is on gimped hardware from 2015 that does not even match XB1 for oomph - leaving graphics often less than impressive by comparison.

What is needed is one of the big smartphone makers to make their own dedicated console with a state of the art ARM SoC - Samsung would be a good choice given their choice of RDNA IP would allow them some code parity with the coming console generation.

I'm honestly surprised that Apple never made this a true priority - perhaps afraid of the PR embarrassment that might arise from matching wills to Sony or Nintendo and getting utterly annihilated in sales.

I see your point but I didn't care about games, consoles, game quality etc. I wanted to show where the money flows. And most of money flows into ARM platforms. When you see that 80% (and increasing) of worldwide revenue is made on ARM platforms then x86 can loose money for development soon.

Good ideal about Apples/Samsung console. IMHO this's gonna become reality soon however little bypassed. When we look at ARM's new Mali G77 that thing is awesome. It has 1.3 FP32 TFlops similar to Xbox One. Technically today is possible to connect phone like Galaxy S20 via USB-C -> HDMI cable to TV and play AAA Xbox games on it. Technically. I think this is where Apple aims with Arcade. I think they abandoned the idea of console because they have quite powerful gaming device in iPhone already. Just need to find the way how to connect that to TV (and Nintendo Switch will be replaced). Can you imagine that multiple smartphone manufactures will be able to bring their own game console based on Android and ARM? Every year new generation same way as smartphones? No more waiting almost decade for a new console like with PS and Xbox. Those dinosaurs in MS, Sony and AMD has no clue what storm is coming from smart phone market. Apple and Google knows...

soresu · Mar 30, 2020

Richie Rich said:
When you see that 80% (and increasing) of worldwide revenue is made on ARM platforms then x86 can loose money for development soon.

Won't happen while the big consoles are making any money - and the hype around the next generation coming later this year suggests they won't have any problem in that regard, excepting the financial blowback caused by the current situation of course.

Richie Rich said:
Can you imagine that multiple smartphone manufactures will be able to bring their own game console based on Android and ARM? Every year new generation same way as smartphones? No more waiting almost decade for a new console like with PS and Xbox.

The current modification of "mid term" upgrades has already started a change in the console industry - but bare in mind that developing for the PC is frustrating because of the multitude of platform configurations that need to be tested.

When I mentioned a smartphone manufacturer based console, I meant it in the same way that the big boys do it already - game devs find working to singular platform specs to be a much easier deal overall (and even then mistakes are still made).

DrMrLordX · Mar 30, 2020

Nothingness said:
Not sure where you get info from, but SVE is not deprecated. SVE2 is an extension of SVE.

Fair enough. Regardless, only one production chip will ever use SVE. SVE2 or bust.

DrMrLordX · Mar 30, 2020

Richie Rich said:
Amazon failed to scale?

Absolutely. Look at the SpecINT scores, and Anandtech's own review. Try to commit the entire CPU to one working set and performance tanks. Run a bunch of small VMs with low core counts and performance is much better. Either cache is too small (half the reference Neoverse size) or the interconnect is problematic. Possibly both.

Also, I can imagine game consoles based on ARM hardware with Android. It was called the Ouya:

Ouya - Wikipedia

en.wikipedia.org

A cautionary tale if ever there was one.

Nintendo is using ARM, so ARM is already pretty well-represented in the console world, even if it is an nVidia solution that is really not the best that ARM has to offer. That being said, Nintendo is the only player messing with ARM right now. The entrenched console titans are on x86, and will probably stay there for however long PS5/Xboxwhatever remain current-gen.

soresu · Mar 30, 2020

DrMrLordX said:
Nintendo is using ARM, so ARM is already pretty well-represented in the console world, even if it is an nVidia solution that is really not the best that ARM has to offer. That being said, Nintendo is the only player messing with ARM right now. The entrenched console titans are on x86, and will probably stay there for however long PS5/Xboxwhatever remain current-gen.

I have to admit that while I admire their business acumen, Nintendo have been coming across as increasingly lazy at the hardware design stage of their consoles, a strategy that bit them on their less than secure posterior when nVidia's TX1 was revealed to have hardware level exploits.

Even having said that the Switch is a monster for sales, and likely profit margins too, considering the age of the chip and process + their very slow movement to higher density NAND flash for game cards.

It's a shame that Nintendo won't take that leap anymore.

I would be surprised if there was no market for a near state of the art Switch console - ie at least PS4 lvl gfx power and A76+ CPU, basically a SD 8cx or better.

Sure from a 7 inch screen you won't get much benefit, but in docked mode it would make a clear difference for console and PC ports.

soresu · Mar 30, 2020

Moderators please delete this duplicate post.

soresu · Mar 30, 2020

Moderators please delete this duplicate post.

DrMrLordX · Mar 30, 2020

@soresu

Epic triple post! I'm sure you're having problems with the forums being slow. So am I.

There are some gaming phones out there, and there was the infamous Ouya. Otherwise it's all about software, software, software. If you don't have the titles, what is the point? Switch sales are driven by software. It was interesting to see the Switch get updated hardware with an overclocked, undervolted TX1 that performs better and has better battery life. With an Apple chip the thing could scream. But Apple would make them sell their souls for access to those chips. Not sure how much money Nintendo is saving by using nVidia hardware that is otherwise hard for nVidia to move. NV can offer considerable developer support, which is something you don't get picking a generic ARM solution.

soresu · Mar 31, 2020

DrMrLordX said:
@soresu

Epic triple post! I'm sure you're having problems with the forums being slow. So am I.

There are some gaming phones out there, and there was the infamous Ouya. Otherwise it's all about software, software, software. If you don't have the titles, what is the point? Switch sales are driven by software. It was interesting to see the Switch get updated hardware with an overclocked, undervolted TX1 that performs better and has better battery life. With an Apple chip the thing could scream. But Apple would make them sell their souls for access to those chips. Not sure how much money Nintendo is saving by using nVidia hardware that is otherwise hard for nVidia to move. NV can offer considerable developer support, which is something you don't get picking a generic ARM solution.

Yh definitely some problems atm, probably more people accessing during the lockdowns everywhere.

As to a possible ARM solution for Nintendo - I'd say Qualcomm would have no problem doing so, they already made a custom 8cx for Microsoft (SG1), so it stands to reason that they can probably fill out Nintendo's needs.

Although for my money I would push for Samsung who are already in partnership with AMD for RDNA, plus finally pivoting back to OTS ARM big cores - either A77 or A78 would do nicely, at least 2.5x the performance of A57 at same power on 7nm.

With A78 and RDNA2 gfx feature set they could really make a decent jump beyond the current Switch - especially if they can field an eye tracking VR headset to plug in to it. With foveated rendering, VRS, texture space shading and all the assorted warp techniques that exist, I reckon a decent 2 TFLOP RDNA2 GPU could do quite a lot (memory provided of course).

Nothingness · Mar 31, 2020

DrMrLordX said:
Fair enough. Regardless, only one production chip will ever use SVE. SVE2 or bust.

What makes you think that? Are you sure none of the chips on ARM roadmap is SVE only because SVE2 wasn't finalized? As far as I know ARM hasn't communicated about that.

Nothingness · Mar 31, 2020

DrMrLordX said:
Absolutely. Look at the SpecINT scores, and Anandtech's own review. Try to commit the entire CPU to one working set and performance tanks. Run a bunch of small VMs with low core counts and performance is much better. Either cache is too small (half the reference Neoverse size) or the interconnect is problematic. Possibly both.

"Performance tanks" is a bit exaggerated when in the end the chip still beats the competition on most of the subtests of SPEC, don't you think? And did you see any scaling study of SPEC on x86?

Richie Rich · Mar 31, 2020

DrMrLordX said:
Absolutely. Look at the SpecINT scores, and Anandtech's own review. Try to commit the entire CPU to one working set and performance tanks. Run a bunch of small VMs with low core counts and performance is much better. Either cache is too small (half the reference Neoverse size) or the interconnect is problematic. Possibly both.

Yes, for large instances there is about 30% performance hit. But from economical point of view you don't enjoy 53% cost savings (small VM instances) but only 40% cost savings. So it's like super huge win vs. huge win. But still huge economical win for Graviton2. And pretty nice win in higher performance per thread for G2 too. Not mentioning 1/10th CPU cost and half power consumption.

Regarding SVE:

128-2048 bit
sizeless vector type
optional functions contains BFloat16 for ML and matrix multiplication

SVE2

is instruction extension, length stays 128-2048 bit
a lot of DSP functions
optional functions contains cryptography fncs

I think Fujitsu ARM CPU will be fine with SVE1 and optional fncs for mat mul.

Take a look at Table of Content:

6. List of base SVE functions

6.2. Loads

6.3. Stores

6.4. Prefetches

6.5. Address calculations

6.6. Scalar to vector operations

6.7. Integer arithmetic

6.8. Logical operations

6.9. Shifts

6.10. Integer reductions

6.11. Integer comparisons

6.12. While comparisons

6.13. Counting bits

6.14. Conversion

6.15. Reversal

6.16. Floating-point arithmetic

6.17. Floating-point reductions

6.18. Floating-point comparisons

6.19. Floating-point conversions

6.20. Permutation and selection

6.21. Vector creation

6.22. Vector insertion and extraction

6.23. Predicate creation

6.24. Predicate operations

6.25. Testing predicates

6.26. FFR manipulation

6.27. Counting elements

6.28. Saturating scalar arithmetic

6.29. Reinterpreting data

7. List of optional SVE functions

7.2. BFloat16 extensions

7.3. INT8 matrix multiply extensions

7.4. FP32 matrix multiply extensions

7.5. FP64 matrix multiply extensions

8. List of base SVE2 functions

8.2. While greater comparisons

8.3. Uniform DSP operations

8.4. Widening DSP operations

8.5. Narrowing DSP operations

8.6. Unary narrowing operations

8.7. Non-widening pairwise arithmetic

8.8. Widening pairwise arithmetic

8.9. Bitwise ternary logical instructions

8.10. Large integer arithmetic

8.11. Multiplication by indexed elements

8.12. Uniform complex integer arithmetic

8.13. Widening complex integer arithmetic

8.14. Complex integer dot product

8.15. Extra floating-point conversions

8.16. Floating-point widening multiply-accumulate

8.17. Floating-point integer binary logarithm

8.18. Vector histogram count

8.19. Character match

8.20. Contiguous conflict detection

8.21. Polynomial arithmetic

8.22. Extended table lookup/permute

8.23. Non-temporal gather/scatter

9. List of optional SVE2 functions

9.2. Bit permutation

9.3. AES-128 functions

9.4. SHA-3 functions

9.5. SM4 functions

https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_04_en.pdf?_ga=2.219834265.1849955705.1585569385-435973099.1569595370

RetroZombie · Mar 31, 2020

DrMrLordX said:
Try to commit the entire CPU to one working set and performance tanks. Run a bunch of small VMs with low core counts and performance is much better. Either cache is too small (half the reference Neoverse size) or the interconnect is problematic. Possibly both.

That's one of the amd epyc disadvantage that can also be an advantage.
Having that huge amount of total L3 cache but that is not shared can give it an edge on the light threaded software when many instances of that type of software's are running simultaneous in a server. One cpu core is never out of cache like on the other cpus.
Too bad it's not something that can be attested or benchmarked but i bet epyc runs in circles over those kind of cpus in that type of circumstances.

DrMrLordX · Mar 31, 2020

Nothingness said:
"Performance tanks" is a bit exaggerated when in the end the chip still beats the competition on most of the subtests of SPEC, don't you think? And did you see any scaling study of SPEC on x86?

Eh?

@amrnuke posted comparison numbers between Graviton2, Naples, and Rome, normalized for core count. In SPECInt2006 ST, Graviton2 won. In SPECInt2006 MT, it lost to even Naples. Yes, performanced tanked.

Richie Rich said:
Yes, for large instances there is about 30% performance hit.

Yep! Actually the SPEC numbers from ST to MT made it look significantly worse than that.

But from economical point of view

Stop deflecting. Graviton2's scaling kind of sucks.

RetroZombie · Mar 31, 2020

soresu said:
Nintendo have been coming across as increasingly lazy at the hardware design stage of their consoles, a strategy that bit them on their less than secure posterior when nVidia's TX1 was revealed to have hardware level exploits.

Yeah, they have gone from the great Wii, evolving into the meh Wii U, and in their next gen (switch) they actually stick with the worst part of the WiiU the tablet.
They should got a more powerful wii with revamped controls.

Nothingness · Mar 31, 2020

DrMrLordX said:
Eh?

@amrnuke posted comparison numbers between Graviton2, Naples, and Rome, normalized for core count. In SPECInt2006 ST, Graviton2 won. In SPECInt2006 MT, it lost to even Naples. Yes, performanced tanked.

I was referring to @Andrei. article.

And @amrnuke "showed" that for ST G2 is 83.8% of 7742 and for MT G2 is 79.8% which is very similar to the ST difference. So by your own logic, both being 64-core chips, I have to assume 7742 performance tanks.

Richie Rich · Mar 31, 2020

soresu said:
Yh definitely some problems atm, probably more people accessing during the lockdowns everywhere.

As to a possible ARM solution for Nintendo - I'd say Qualcomm would have no problem doing so, they already made a custom 8cx for Microsoft (SG1), so it stands to reason that they can probably fill out Nintendo's needs.

Although for my money I would push for Samsung who are already in partnership with AMD for RDNA, plus finally pivoting back to OTS ARM big cores - either A77 or A78 would do nicely, at least 2.5x the performance of A57 at same power on 7nm.

With A78 and RDNA2 gfx feature set they could really make a decent jump beyond the current Switch - especially if they can field an eye tracking VR headset to plug in to it. With foveated rendering, VRS, texture space shading and all the assorted warp techniques that exist, I reckon a decent 2 TFLOP RDNA2 GPU could do quite a lot (memory provided of course).

Console based on A78 would be a beast from CPU point of view. Delivering much more performance than PS5 and Xbox

But from GPU side not sure where is the limit for scaling those mobile GPUs:

Nintendo Switch has ..................................... 0.2 TFLOPS FP32
Snapdragon 865 with Adreno 650 has...........1.3 TFLOPS FP32

Gaming is about fun factor where Nintendo excels. Most Android phones can become gaming console from performance point of view. I think combination of 8x A78 with second generation of Valhall Mali G78 GPU could deliver pretty decent performance around 1.5 TFLOPS. I'm not sure how much they can scale it up. Max to 6 TFOPS maybe?

NTMBK · Mar 31, 2020

Richie Rich said:
Console based on A78 would a beast from CPU point of view. Delivering much more performance than PS5 and Xbox

But from GPU side not sure where is the limit for scaling those mobile GPUs:

Nintendo Switch has ..................................... 0.2 TFLOPS FP32

Snapdragon 865 with Adreno 650 has...........1.3 TFLOPS FP32

Gaming is about fun factor where Nintendo excels. Most Android phones can become gaming console from performance point of view. I think combination of 8x A78 with second generation of Valhall Mali G78 GPU could deliver pretty decent performance around 1.5 TFLOPS. I'm not sure how much they can scale it up. Max to 6 TFOPS maybe?

Most phones also run their SoC with much higher power limits than the Switch. They prioritise benchmark wins over long term thermal performance and battery life... whereas the Switch deliberately aims to provide consistent performance for multiple hours. There's a reason why they drastically underclocked the Tegra X1.

Anyway, I doubt Nintendo would ever swap to a non-NVidia vendor for a Switch follow up. They have a custom low level API written by NVidia, NVN, so they can't just swap in any old ARM SoC.

DrMrLordX · Mar 31, 2020

Nothingness said:
I was referring to @Andrei. article.

And @amrnuke "showed" that for ST G2 is 83.8% of 7742 and for MT G2 is 79.8% which is very similar to the ST difference. So by your own logic, both being 64-core chips, I have to assume 7742 performance tanks.

He must have changed his numbers? The difference I saw in the MT difference was huge for Rome. Naples, on the other hand, wasn't as big, though there was still a swing. Naples lost in ST but won in MT.

coercitiv · Mar 31, 2020

NTMBK said:
They have a custom low level API written by NVidia, NVN, so they can't just swap in any old ARM SoC.

So young and compatibility is already a priority...

RetroZombie · Mar 31, 2020

NTMBK said:
There's a reason why they drastically underclocked the Tegra X1.

They always under delivered in arm cpu performance and efficiency and their gpu/apu combos the same, i think some models never got to market because they hit 20 watts, for something to be used in a tablet that was just a no go.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Senior member

Platinum Member

Diamond Member

Senior member

Senior member

Platinum Member

Lifer

Lifer

Platinum Member

Platinum Member

Platinum Member

Lifer

Platinum Member

Platinum Member

Platinum Member

Senior member

Senior member

Lifer

Senior member

Platinum Member

Senior member

Lifer

Lifer

Diamond Member

Senior member