Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

Thala · May 17, 2020

Hitman928 said:
All of this is really just to say (as has been pointed out again and again) that using a single benchmark (even if it's a collection of small benchmarks) and extrapolating that to ultimate IPC is a desktop/workstation/server type environment is silly, but you keep doing it again and again as if it means something.

Did you compile Blender from the sources with same options for both ARM and x86? I guess not...
It is not silly to use Spec/Geekbench as reference...because any other code i compiled pretty much matches, give or take a few precent.

However what is silly is using binaries, where you have no idea what code pathes are compiled in and use this for comparing CPU architectures.

In addition, were you using the Geekbench floating point subscores as reference? I guess the answer is again .. no.

Hitman928 · May 17, 2020

Thala said:
Did you compile Blender from the sources with same options for both ARM and x86? I guess not...
It is not silly to use Spec/Geekbench as reference...because any other code i compiled pretty much matches, give or take a few precent.

However what is silly is using binaries, where you have no idea what code pathes are compiled in and use this for comparing CPU architectures.

In addition, were you using the Geekbench floating point subscores as reference? I guess the answer is again .. no.

Ignoring the seemingly passive aggressive nature of your post. . .

1) I already mentioned that I would much prefer to do this comparison in a controlled environment but that I don't have access to a Rpi4 and don't much feel like purchasing one just for this.

2) I wasn't the one who made the origin claim that Blender and the overall GB4 score show the same perf/Hz disparity between an A72 core and a Zen 2 core. I was happy to wait for professional reviewers to compare more modern ARM cores in this way as others had suggested, but @Richie Rich was the one who claimed that we already knew this was true. I was just trying to check to see if his numbers made sense so I was offering my 2700 as a comparison point. So it sounds like you should direct this post at him instead of me. If you want to compile Blender on an ARM core, I'm happy to do it over again as I can compile with matching options and go from there.

DrMrLordX · May 18, 2020

name99 said:
Thing is, this is NOT a crazy claim.

Sorry, you just lost me.

Thala · May 18, 2020

Hitman928 said:
Ignoring the seemingly passive aggressive nature of your post. . .

1) I already mentioned that I would much prefer to do this comparison in a controlled environment but that I don't have access to a Rpi4 and don't much feel like purchasing one just for this.

2) I wasn't the one who made the origin claim that Blender and the overall GB4 score show the same perf/Hz disparity between an A72 core and a Zen 2 core. I was happy to wait for professional reviewers to compare more modern ARM cores in this way as others had suggested, but @Richie Rich was the one who claimed that we already knew this was true. I was just trying to check to see if his numbers made sense so I was offering my 2700 as a comparison point. So it sounds like you should direct this post at him instead of me. If you want to compile Blender on an ARM core, I'm happy to do it over again as I can compile with matching options and go from there.

Regarding 2) i would have had no objection if you would have explained this reasoning in the first place. However you already concluded from your results, that it is "silly" to use Geekbench/SPEC as reference for other code. This conclusion however cannot be made based on your observation.

Speaking of agressive nature...i am not the one calling others silly.

beginner99 · May 18, 2020

name99 said:
Thing is, this is NOT a crazy claim. It boils down to -- what is the hard part of creating a SoC?
The hardest part is the part tested by GeekBench!
Yes, IN THEORY, an idiot could go to the effort of building or acquiring a kick-ass core, then totally waste it by coupling many such cores to an inadequate uncore, or an inappropriate NoC, or lousy memory controllers.

But it's dumb to assume that because in essence what you are saying is "I'm smarter than the people at Marvell/Ampere/Amazon/Apple/ARM, and they will probably make mistakes that even I know to avoid"...

No to the last paragraph. It's more like that AMD and Intel know better how to do it than the companies listed.

And this really matters for 64-core CPU not just for performance but also for power usage. Remember the anandtech analysis here about uncore power usage? The Zen+ 32-core threadripper used 60 watt for uncore alone.

Or the fact that Skylake-X with mesh had slower ST performance than skylake-desktop with ringbus unless you OCed the uncore with according power rise. The way the cores are connected matters a lot. Also AMDs memory controller was worse than intels also in Zen1.

We can also see that Graviton2 suffers under certain heavy threaded loads which can easily be due to design choices in the uncore.

Thala · May 18, 2020

Richie Rich said:
You answer to people who don't want to understand. That's useless until somebody boot up Linux at A14 device and run some benchmarks like Blender. For example Blender results corresponds with GeekBench4 score pretty well:

Blender:

Zen2 Ryzen 3700X ... 7463 s/GHz

Cortex A72 (RPi4) ... 15443 s/GHz .... that's 48% of Zen2

GB4:

Zen2 Ryzen 3700X ... 1291 pts/GHz

Cortex A72 (RPi4) ...... 645 pts/GHz .... that's 50% of Zen2

Thanks for the numbers. Which Blender version are you running? My Ubuntu (running on Surface Pro X) tells me that version 2.79 is the latest.

Hitman928 · May 18, 2020

Thala said:
Regarding 2) i would have had no objection if you would have explained this reasoning in the first place. However you already concluded from your results, that it is "silly" to use Geekbench/SPEC as reference for other code. This conclusion however cannot be made based on your observation.

Speaking of agressive nature...i am not the one calling others silly.

I guess you didn't follow the whole thread because my replies to Richie Rich contained the reasoning.

I concluded it was silly to use GB4 single threaded score as a reference for workstation/server workloads before running the numbers and to me, that seems readily apparent. However, if you want to argue otherwise, that's fine. I still think it's silly to take a single, overall benchmark score from GB4 and use that to predict super complex and varying workloads that show very different results across a large swath of applications and say that the single score is all we need. If you want to show that a GB4 score is representative of how all these applications perform, by all means, show this evidence because then review sites could save a ton of time and money testing all these different programs to measure performance.

I also don't see how calling an argument silly is aggressive (or passive aggressive). It is straight forward and using softer language than I could have. So you'll have to explain this opinion to me as well.

Thala said:
Thanks for the numbers. Which Blender version are you running? My Ubuntu (running on Surface Pro X) tells me that version 2.79 is the latest.

He said that his 32-bit results are from v2.79 and his 64-bit results are from v2.82.

RasCas99 · May 18, 2020

beginner99 said:
No to the last paragraph. It's more like that AMD and Intel know better how to do it than the companies listed.

And this really matters for 64-core CPU not just for performance but also for power usage. Remember the anandtech analysis here about uncore power usage? The Zen+ 32-core threadripper used 60 watt for uncore alone.

Or the fact that Skylake-X with mesh had slower ST performance than skylake-desktop with ringbus unless you OCed the uncore with according power rise. The way the cores are connected matters a lot. Also AMDs memory controller was worse than intels also in Zen1.

We can also see that Graviton2 suffers under certain heavy threaded loads which can easily be due to design choices in the uncore.

Reading through the thread right now , very interesting read!

I will jump in the thread in the latest page , I have a question about this line you wrote :
"No to the last paragraph. It's more like that AMD and Intel know better how to do it than the companies listed."

Why do you think other companies cannot compete with Intel and AMD , who do you think actually "knows how to do it better" in Intel/AMD ? is chip design is something that is exclusive to a company/Uarch or to the engineers building/expanding those designs/Uarch ? If tomorrow morning I would take AMD/Intel best engineers and hire them at Amazon , how would the next gen Amazon CPU will look like ? how would AMD/Intel one will ? I will leave other obvious differences between the companies you listed vs Intel/AMD in regards to "They know better" quote you made , if you would like I can expand on that later , unless you change your mind after giving it a little more thought.

Some of the posters here have really high level knowledge (I believe you had some nice contributions as well if i am not mistaken!) which was nice to see and read through , but some of the comments here revolves around companies names and x86/ARM as trademarks and not the underlying understanding of how things work when you engineer a SoC , reading it through in 2 sit downs i can actually see how from a technical discussion it became camps going at each other , which is a shame , I hope everyone can be more objective when looking at a technical challenge of assessing SoC`s (start/mid of the thread) vs trying to niptick and have such a firm non changeable stand on things.

I cannot believe no one has changed camps (at least from the frequent posters) , we should have at least few people going "Oh , that actually make sense , I am changing my mind on this!" , unless really we as humans have a really strong feeling/connection to brands be it companies or technologies.

Take care guys and have a nice day!

avAT · May 18, 2020

WWDC is in 35 days and I think there's >50% chance of at least some kind of vague benchmark claims from Apple to go off of.

Thala · May 18, 2020

Hitman928 said:
I concluded it was silly to use GB4 single threaded score as a reference for workstation/server workloads before running the numbers and to me, that seems readily apparent. However, if you want to argue otherwise, that's fine.

Thats is my point. Geekbench or SPEC are highly representative for workstation workloads.
If you compile Blender from sources without any hand optimized code for a particular processor architecture you will come to results highly correlating with SPEC or Geekbench floating point results.

However if you are running a precompiled binary of Blender, you are running an hand optimized AVX/SSE version on your x86 machine and a plain C version on ARM.

If you want to show that a GB4 score is representative of how all these applications perform, by all means, show this evidence because then review sites could save a ton of time and money testing all these different programs to measure performance.

Depends on your goals. If you want to give your readers help for their next CPU purchase, than it pretty much is important how much each SW package is optimized for a particular architecture. This is the goal for most review sites.
But thats not the question here, we want to compare architectures and not different SW optimizations. And thats the goal of SPEC as well.

Hitman928 · May 18, 2020

Thala said:
Thats is my point. Geekbench or SPEC are highly representative for workstation workloads.
If you compile Blender from sources without any hand optimized code for a particular processor architecture you will come to results highly correlating with SPEC or Geekbench floating point results.

However if you are running a precompiled binary of Blender, you are running an hand optimized AVX/SSE version on your x86 machine and a plain C version on ARM.

Depends on your goals. If you want to give your readers help for their next CPU purchase, than it pretty much is important how much each SW package is optimized for a particular architecture. This is the goal for most review sites.
But thats not the question here, we want to compare architectures and not different SW optimizations. And thats the goal of SPEC as well.

No, the question was does GB4 represent the difference in performance between CPUs in production software today. The fact that x86 has years of optimizations is part of that equation for many of those products. Yes, it's an advantage for x86 just as having to support decades of compatibility makes x86 architectures have some overhead that ARM doesn't have to deal with. There's a whole other thread dedicated to comparing the x86 and ARM ISAs. If you want to discuss this, see that thread instead of rehashing it here. With that said, I'd still be happy to compile Blender from source and compare since Blender is open source and it is easy enough to do if you want to run this particular program on ARM.

Doug S · May 18, 2020

Hitman928 said:
I still think it's silly to take a single, overall benchmark score from GB4 and use that to predict super complex and varying workloads that show very different results across a large swath of applications and say that the single score is all we need.

No one disputes that. But the score for a benchmark like SPEC or GB5 which perform a variety of functions is much more useful than the score from Blender, which performs a single function.

Any benchmark, whether it is a single function or a mix of functions or the average of a bunch of different benchmarks is not a good way to determine how something will perform for you - you want to run the actual application mix your server will be running to determine that. If you are running Blender 100% of the time then that's a great benchmark for you. If you are running Blender 5% of the time, Linux kernel compiles 5% of the time, and so forth for a bunch of other stuff then a suite that works like SPEC or GB but used your particular application mix would your ideal benchmark - but such a thing will never exist unless you write it for yourself (and good luck getting others to care)

What we seem to be seeing is "ARM (especially Apple's) scores too high compared with Intel on Geekbench, therefore Geekbench is a bad benchmark". Lather, rinse, repeat with SPEC. If someone compiles Blender to run on an iPhone, and it compares too well with Intel then the goalposts will be moved again.

The argument has been used several times here that "If Geekbench and SPEC are good enough, why does Anandtech run all these other benchmarks?" They do it because those benchmarks all represent something that some people do. I may not care about a gaming benchmark, you may not care about a database benchmark, another guy may not care about an Office task benchmark, but they all provide valuable info for people who do care about those things. That's why Anandtech runs that variety of benchmarks. It isn't providing much as far as improving the "big picture" of performance - no one is trying to add up all those numbers from various benchmarks and provide a single performance figure. They can't, its not possible. But it lets people look more closely at benchmarks that represent what they do.

soresu · May 18, 2020

Doug S said:
If someone compiles Blender to run on an iPhone, and it compares too well with Intel then the goalposts will be moved again.

More like who cares when you don't get 16-64C iPhone's/iPad's, not to mention GPU rendering on Cycles makes CPU horsepower redundant, especially with fixed RT hardware.

I keep seeing these weird mentions of Apple cores in the context of servers in this thread - if they were going to do servers seriously, they would have done so long ago with x86 CPU's.

I'm not sure if they did so before, but they certainly don't do so now for the wider market (ie outside of Apple).

If they aren't doing so now for x86, they aren't going to suddenly turn around and start making server chips out of the blue - they are still a consumer and workstation oriented company. Even then the Mac Pro's don't even have the best workstation platform on the market (EPYC 2) so it's all moot for that score.

Hitman928 · May 18, 2020

Doug S said:
No one disputes that.

It was being disputed, that's the whole point.

But the score for a benchmark like SPEC or GB5 which perform a variety of functions is much more useful than the score from Blender, which performs a single function.

Agreed, but again, that wasn't the point.

Any benchmark, whether it is a single function or a mix of functions or the average of a bunch of different benchmarks is not a good way to determine how something will perform for you - you want to run the actual application mix your server will be running to determine that. If you are running Blender 100% of the time then that's a great benchmark for you. If you are running Blender 5% of the time, Linux kernel compiles 5% of the time, and so forth for a bunch of other stuff then a suite that works like SPEC or GB but used your particular application mix would your ideal benchmark - but such a thing will never exist unless you write it for yourself (and good luck getting others to care)

That's why others (and myself) have said that they want to see independent tests across a large swath of applications. Obviously no review is going to hit every single use case for every single person, but if you have a few database tests with various conditions + a few rendering tests with different scenes and applications + some compile tests with different source codes and compilers, etc. you start to get a pretty good idea of performance in each area. Then people can look at the area that fits their use case. Now do this across a few different review sites and you get a really good understanding of performance. I mean, isn't this how we've done it for years and years with competing x86 CPUs? Why is it all of a sudden pointless to do for ARM CPUs? Obviously there are challenges to it as the available production/gaming software on ARM is limited compared to x86, but there's still lots of examples that you could use.

I will say that part of my problem with SPEC and GB is that even though it touches many different workloads, it does so at a very superficial level for most tests. Just look at the transcoding test in SPEC. It processes 30 seconds of a single video converting from one format to another. No filtering, no resizing, no color correcting, nothing. While that is a nice brief look at performance, I'd much rather see a more thorough test. I mean, Geekbench goes through, I think, 20 or so different tests within a few minutes on a modern CPU. That's a very superficial look at each of the workloads. I'm not saying it's not a valid place to start, but I'm not going to hang my hat on the results either.

What we seem to be seeing is "ARM (especially Apple's) scores too high compared with Intel on Geekbench, therefore Geekbench is a bad benchmark". Lather, rinse, repeat with SPEC. If someone compiles Blender to run on an iPhone, and it compares too well with Intel then the goalposts will be moved again.

I never once have said this or even implied it and I don't think most have either. There's a difference between saying ARM scored too high and saying that's a limited amount of tests and let's see a more full test suite.

The argument has been used several times here that "If Geekbench and SPEC are good enough, why does Anandtech run all these other benchmarks?" They do it because those benchmarks all represent something that some people do. I may not care about a gaming benchmark, you may not care about a database benchmark, another guy may not care about an Office task benchmark, but they all provide valuable info for people who do care about those things. That's why Anandtech runs that variety of benchmarks. It isn't providing much as far as improving the "big picture" of performance - no one is trying to add up all those numbers from various benchmarks and provide a single performance figure. They can't, its not possible. But it lets people look more closely at benchmarks that represent what they do.

I'm confused by your point, if SPEC/GB gives you the info you need to know how CPUs will perform in comparison to each other, why are we wasting time looking at all these variety of benchmarks. Why can't we just look at SPEC and/or GB and pick our CPUs? You seem to be arguing here that we need a variety of tests to know how CPUs will perform in different workloads that might not be reflected by SPEC/GB which is the only thing people have been asking for.

Thala · May 18, 2020

Hitman928 said:
No, the question was does GB4 represent the difference in performance between CPUs in production software today.

That depends on the "production software". If this SW is heavily hand-optimized for x86 but not for ARM - SPEC and Geekbench do _NOT_ reflect difference in performance. But this could not possibly be the question, since the answer is trivial.
SPEC and Geekbench however do reflect performance if the SW is not hand-optimized for one side or the other...thats the whole point of SPEC - comparing the CPU architectures.

Hitman928 · May 18, 2020

Thala said:
That depends on the "production software". If this SW is heavily hand-optimized for x86 but not for ARM - SPEC and Geekbench do _NOT_ reflect difference in performance. But this could not possibly be the question, since the answer is trivial.
SPEC and Geekbench however do reflect performance if the SW is not hand-optimized for one side or the other...thats the whole point of SPEC - comparing the CPU architectures.

Again, it sounds like your issue is with the person who made the original claim and not me.

Thala · May 18, 2020

Hitman928 said:
I'm confused by your point, if SPEC/GB gives you the info you need to know how CPUs will perform in comparison to each other, why are we wasting time looking at all these variety of benchmarks. Why can't we just look at SPEC and/or GB and pick our CPUs? You seem to be arguing here that we need a variety of tests to know how CPUs will perform in different workloads that might not be reflected by SPEC/GB which is the only thing people have been asking for.

Is this a tech forum or some buyers guide? For a buyers guide you have to run more benchmarks due to different optimzation levels of the particular SW potential customers are interested in. This however is not the point here.

Hitman928 · May 18, 2020

Thala said:
Is this a tech forum or some buyers guide? For a buyers guide you have to run more benchmarks due to different optimzation levels of the particular SW potential customers are interested in. This however is not the point here.

1) This forum has been as much about discussing product value as the technical side of things since I have been a member. I don't know why we would stop that now.
2) This thread's topic is about Apple replacing intel with ARM based CPUs so evaluating how ARM based designs perform on typical production/consumer applications seems perfectly in line with the thread topic.
3) There is a thread dedicated to technical discussion about ARM vs x86 ISA so if that's all you want to discuss, there's a thread for that.
4) I still don't know why you're directing all your posts at me when I wasn't the one who made the original claim.

scannall · May 18, 2020

soresu said:
More like who cares when you don't get 16-64C iPhone's/iPad's, not to mention GPU rendering on Cycles makes CPU horsepower redundant, especially with fixed RT hardware.

I keep seeing these weird mentions of Apple cores in the context of servers in this thread - if they were going to do servers seriously, they would have done so long ago with x86 CPU's.

I'm not sure if they did so before, but they certainly don't do so now for the wider market (ie outside of Apple).

If they aren't doing so now for x86, they aren't going to suddenly turn around and start making server chips out of the blue - they are still a consumer and workstation oriented company. Even then the Mac Pro's don't even have the best workstation platform on the market (EPYC 2) so it's all moot for that score.

Apple made servers a long time ago now, stopped in 2009. Worked with quite a few and they really were excellent.

Xserve

soresu · May 18, 2020

scannall said:
Apple made servers a long time ago now, stopped in 2009. Worked with quite a few and they really were excellent.

Xserve

Indeed, the fact that they no longer do so implies a lack of interest at the company management level.

If it was a Jobs related change, he is long gone now and not in the way of a return to former server centric policies, so it seems unlikely for that situation to change right in the middle of a paradigm ISA shift for Mac.

beginner99 · May 19, 2020

RasCas99 said:
I will jump in the thread in the latest page , I have a question about this line you wrote :
"No to the last paragraph. It's more like that AMD and Intel know better how to do it than the companies listed."

Why do you think other companies cannot compete with Intel and AMD , who do you think actually "knows how to do it better" in Intel/AMD ? is chip design is something that is exclusive to a company/Uarch or to the engineers building/expanding those designs/Uarch ? If tomorrow morning I would take AMD/Intel best engineers and hire them at Amazon , how would the next gen Amazon CPU will look like ? how would AMD/Intel one will ? I will leave other obvious differences between the companies you listed vs Intel/AMD in regards to "They know better" quote you made , if you would like I can expand on that later , unless you change your mind after giving it a little more thought.

This was directly related to the post I quoted. In said post the claim was made that if you have a good core, it's trivial to stitch it together and make a 64-core cpu and if people really think Apple, ARM, Amazon are so dumb they can't do it.

My reply then was simply that I and what I perceive many posters disagree with this and that "stitching" the cores together is pretty complicated to get right especially also in terms of power use, latency, consistency and so forth. We can see from the benchmarks that graviton2 under certain very heavy MT loads as pretty bad scaling compared to x86. This can only be explained by what I just wrote. The "stitching together" fabric can't deal with the load or on second thought throttling which means the benefits of ARM using less power isn't that clear anymore.

soresu · May 19, 2020

beginner99 said:
We can see from the benchmarks that graviton2 under certain very heavy MT loads as pretty bad scaling compared to x86.

Intel and AMD designs benefit from a decade and a half of multi core design in the area of scaling - it would be far more amazing if they managed to knock it completely out of the park on the first try.

However this is still only Neoverse N1 - I'd expect the uncore of N3 or N4 to be significantly more impressive even without a significant change in core counts.

DrMrLordX · May 19, 2020

soresu said:
I keep seeing these weird mentions of Apple cores in the context of servers in this thread

There aren't any desktop-oriented ARM CPUs out there right now. You have mobile SoCs and server CPUs. It makes it a little hard to imagine what Apple's replacement for Intel chips will be like.

soresu · May 19, 2020

DrMrLordX said:
There aren't any desktop-oriented ARM CPUs out there right now. You have mobile SoCs and server CPUs. It makes it a little hard to imagine what Apple's replacement for Intel chips will be like.

Unless you count the Kunpeng chips, though their MB's seem over provisioned for the task of consumer desktop use.

DrMrLordX · May 19, 2020

soresu said:
Unless you count the Kunpeng chips, though their MB's seem over provisioned for the task of consumer desktop use.

Kunpeng is server, no? 64c and all that?

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Golden Member

Diamond Member

Lifer

Golden Member

Diamond Member

Golden Member

Diamond Member

Member

Junior Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer