Question Incredible Apple M4 benchmarks...

mikegg · May 14, 2024

Taken from macrumors.com forum poster.

Zen3 to Zen4 had similar Object Detection gains.

Are Zen4 GB6 ST speeds valid?

gdansk · May 14, 2024

mikegg said:
Are Zen4 GB6 ST speeds valid?

Read the thread again it's all been said.
Zen 4 is still only on par with its competition, Raptor Lake, in that subtest. Both using 256 bit vector operations.
SVL for M4 is 128 bits and the ZA size is 128x128 bits and it can do that far fewer cycles. So it ends up twice the score of its competition overall. Cool, but much more niche.

mikegg · May 14, 2024

gdansk said:
Read the thread again it's all been said.
Zen 4 is still only on par with its competition, Raptor Lake, in that subtest. Both using 256 bit vector operations.
SVL for M4 is 128 bits and the ZA size is 128x128 bits and it can do that far fewer cycles. So it ends up twice the score of its competition overall. Cool, but much more niche.

Um... so what's your point? x86 Object Detecton is valid but not M4?

gdansk · May 14, 2024

mikegg said:
Um... so what's your point? x86 Object Detecton is valid but not M4?

Object detection makes almost no difference in the comparison between Raptor Lake / Zen 4 / M3. They're all on par.
M4 object detection is twice the score as all of them. So now it's an issue because now it makes a difference and it is implemented using an even less applicable extension (where 80% of the instructions are for workloads most software will run on the NPU anyway).

And people have been complaining about GB6 since its release for numerous reasons. But I'm sure this is all lost on you. GB6 says M4 is 50% faster than M2 in ST. This is good, Apple is good, so GB6 must be good. Nevermind that it doesn't agree even with Apple's own claims.

mikegg · May 14, 2024

gdansk said:
Object detection makes almost no difference in the comparison between Raptor Lake / Zen 4 / M3. They're all on par.
M4 object detection is twice the score as all of them. So now it's an issue because now it makes a difference and it is implemented using an even less applicable extension (where 80% of the instructions are for workloads most software will run on the NPU anyway).

And people have been complaining about GB6 since its release for numerous reasons. But I'm sure this is all lost on you. GB6 says M4 is 50% faster than M2 in ST. This is good, Apple is good, so GB6 must be good. Nevermind that it doesn't agree even with Apple's own claims.

So basically, M4's score is so high that GB6 is no longer valid.

What do you suggest? Go back to Cinebench R23?

igor_kavinski · May 14, 2024

mikegg said:
So basically, M4's score is so high that GB6 is no longer valid.

Maybe GB6 should properly optimize their tests for Intel/AMD SoCs too, since those have their dedicated NPUs?

poke01 · May 14, 2024

mikegg said:
Cinebench R23?

Cinebench 2024 is better than GB6 even though it’s a renderer benchmark. Just avoid GB6. Looking at SPEC, M4 is not a major leap that GB6 indicates.

mikegg · May 14, 2024

poke01 said:
Cinebench 2024 is better than GB6 even though it’s a renderer benchmark. Just avoid GB6. Looking at SPEC, M4 is not a major leap that GB6 indicates.

That's a weird statement since Cinebench has never known to be correlated with anything. Even in renderer tasks, it's a niche. Use Blender benchmarks instead.

mikegg · May 14, 2024

igor_kavinski said:
Maybe GB6 should properly optimize their tests for Intel/AMD SoCs too, since those have their dedicated NPUs?

GB has a separate NPU benchmark.

mikegg · May 14, 2024

Loving the denials here.

Apple's M series CPUs have always had higher perf/watt, raw performance, and IPC than AMD & Intel. They're generations ahead regardless of the node. But since their scores are so high, people simply don't believe it even though it's been proven over and over again that they are real.

Meanwhile, people still cling to Cinebench (usually R23) to try to make their x86 CPUs look better than they really are.

Even ex-Anandtech Andrei said Cinebench is a generally poor CPU benchmark to use for CPU performance measurements. That's when he was still working at Anandtech and not Qualcomm.

https://www.reddit.com/r/hardware/comments/pitid6/eli5_why_does_it_seem_like_cinebench_is_now_the

gdansk · May 14, 2024

M2 was 7-8 months ahead of Phoenix, both on improved TSMC 5nm processes. Similar GB6 ST composite scores. M2 with lower GB6 MT composite score. Yet when you look into the subtests of that composite it's quite flawed. It's hard to explain how Apple is "generations ahead regardless of the node" in anything except power draw when everything since then is a node ahead.

I understand some Apple enthusiasts are upset that they're following x64 at pursuing clock rates and also following x64 in niche instruction set proliferation but so it is. But be cautious of going the route of saying whatever results GB6 spits out is a "indicator of performance increase" when there's even more subtests that could benefit from AVX-512 and it's only at half-rate or not present at all right now.

As with most my posts - sent from my Apple Silicon MacBook.

okoroezenwa · May 14, 2024

igor_kavinski said:
Maybe GB6 should properly optimize their tests for Intel/AMD SoCs too, since those have their dedicated NPUs?

How is that proper optimisation? The NPUs of the ARM world have never been used for the GB6 CPU test so why should Intel/AMD’s?

igor_kavinski · May 14, 2024

Doug S said:
Are you complaining that they didn't, at their own expense, develop DX12 drivers for their GPU?

YES! That's not really an expense for a company worth trillions. They could get their interns to do that, WITHOUT PAY, as a project to prove their worth for a permanent position.

Doug S said:
I'm not sure if it is still in force, but Microsoft had an exclusive deal with Qualcomm for Windows on ARM. So both of those companies share the blame you can't buy a Macbook and natively boot Windows on it.

And Apple takes no blame? Why do you think Microsoft went with Qualcomm? Coz Apple won't give them ARM chips for Surface devices and especially not at prices that Qualcomm is providing. Do you forget that it was an Apple engineer who in his spare time, got MacOS working on x86 hardware, Steve Jobs saw it, flew to Japan to meet with Sony top dog and pitched an idea of MacOS running on Sony VAIO laptops? Shouldn't be a big task for a few of their engineers to get Windows on ARM running on a Mx device and then just announce that Macbooks are versatile enough to run Windows and literally every useful application in existence. Wouldn't that be a big selling point?

Doug S said:
If Apple really wanted to go walled garden on the Mac, they would locked the bootloader to prevent booting other operating systems. Something that some Windows PCs have done - do you complain about them, or do you not consider that a "walled garden" because all you care about is running Windows?

If Apple isn't promoting the fact that their bootloader is unlocked, there's no guarantee that it will stay unlocked. Maybe they are curious to see how far hackers get with running a functional Linux on their hardware. If they get too close for comfort, there is nothing stopping Apple from locking everything down coz we all know how much Apple fears competition and they do literally everything under the sun to prevent anyone from getting in on their side of the fence. An unlocked bootloader may also be a talent scouting tactic on Apple's part. Search on Github for projects that hack away on Apple hardware and then scoop them up coz if they are doing that much for free, imagine what they could do with proper guidance and creature comforts.

igor_kavinski · May 14, 2024

okoroezenwa said:
How is that proper optimisation? The NPUs of the ARM world have never been used for the GB6 CPU test so why should Intel/AMD’s?

How can we be sure that Apple's SME implementation isn't tapping into its NPU?

igor_kavinski · May 14, 2024

mikegg said:
Loving the denials here.

Apple's M series CPUs have always had higher perf/watt, raw performance, and IPC than AMD & Intel. They're generations ahead regardless of the node.

Don't disagree with all that. But it's mostly MEH for me since I'm not a data scientist, not an artist, not an animator, not an AI junkie, not a musician, not someone trying to look cool in public etc. I'm just a computer enthusiast who does a lot of browsing and gaming and media consumption and for those use cases, despite all the advantages of power efficiency on its side, Apple devices make no money sense for me.

Only reason I'm in this thread is the sensational title of the topic. Incredible? Not for everyone. For the minority of the global population, sure. Doesn't do anything for the little guy.

mikegg · May 14, 2024

igor_kavinski said:
How can we be sure that Apple's SME implementation isn't tapping into its NPU?

Probably because that'd be much slower because you'd be moving data off the CPU caches?

Quite easy to tell when M4 makes it to Macs. Just run powermetrics and see if the NPU uses more power during the test.

It's almost certain that Apple repurposed their AMX coprocessor into SME.

okoroezenwa · May 14, 2024

igor_kavinski said:
How can we be sure that Apple's SME implementation isn't tapping into its NPU?

Is that your actual concern here? That the NPU is taking over? If that’s the case maybe Intel and AMD should take notes. In any case I’m not sure what proper optimisation could be done on Primate Labs’ end if that’s just how things are set up to work on a given SoC.

roger_k · May 14, 2024

gdansk said:
SVL for M4 is 128 bits and the ZA size is 128x128 bits and it can do that far fewer cycles. So it ends up twice the score of its competition overall. Cool, but much more niche.

Has Apple released this information? AMX was 512-bit, it would be very interesting if they went with a smaller size for their SSVE implementation.

gdansk said:
GB6 says M4 is 50% faster than M2 in ST. This is good, Apple is good, so GB6 must be good. Nevermind that it doesn't agree even with Apple's own claims.

GB6 ST improvements from M2 to M4 are pretty much consistent with improvements in individual subtests. It's roundly 40% higher ST score, and individual subtests are between 20% and 50% faster (with Object Detection being an obvious outlier). The aggregate ST score is not as sensitive to individual subtests as some claim.

iPad16,4 vs iPad Pro (12.9-inch, 6th generation) - Geekbench

igor_kavinski said:
How can we be sure that Apple's SME implementation isn't tapping into its NPU?

I don't know, by using common sense and understanding of how CPUs work? I mean, how do you imagine them doing something like that? If they have figured out how to hand off CPU work to a completely separate IP block with ultra-low latency, then their design team is truly unmatched. I just don't get it why you worry about Apple running some CPU code on an NPU but disregard the possibility of, say, Intel running some CPU code on the GPU. It's the same thing after all.

igor_kavinski · May 14, 2024

roger_k said:
I just don't get it why you worry about Apple running some CPU code on an NPU but disregard the possibility of, say, Intel running some CPU code on the GPU. It's the same thing after all.

Is Intel running some GB CPU benchmark code on GPU? Any link/URL? I just want to know the real reason for the Object Detection test's "anomalous" result. Apple didn't announce SME support. GB6 updated their test suite with SME support just before the M4 reveal. So everyone is assuming that Apple is using SME. If they are, what kind of acceleration is GB using for Intel/AMD SoCs? If the test is properly accelerated for one CPU and not others, is that fair?

igor_kavinski · May 14, 2024

okoroezenwa said:
In any case I’m not sure what proper optimisation could be done on Primate Labs’ end if that’s just how things are set up to work on a given SoC.

Shouldn't it be their responsibility to ensure that all CPUs are being used to the best of their capabilities? Or should they just take "under the table" gifts to include acceleration for one CPU and then wait for gifts to arrive from the other CPU vendors before bothering to include relevant acceleration for their CPUs?

roger_k · May 14, 2024

igor_kavinski said:
Is Intel running some GB CPU benchmark code on GPU? Any link/URL?

I do not understand the question. Do you have any evidence that Apple is using the NPU to accelerate Object Detection? Or at least an explanation how this would be achieved technically?

igor_kavinski said:
I just want to know the real reason for the Object Detection test's "anomalous" result. Apple didn't announce SME. GB6 updated their test suite with SME support just before the M4 reveal. So everyone is assuming that Apple is using SME.

Apple did not announce SME, but the GB6 code contains an SME path and uses run-time SME feature detection on Apple platforms. Besides, OS experts have found evidence of SME in the iPadOS images for the new iPad. I think at this point it is fairly safe to assume that SME is there.

igor_kavinski said:
If they are, what kind of acceleration is GB using for Intel/AMD SoCs? If the test is properly accelerated for one CPU and not others, is that fair?

GB6 is using AVX512 with ML-focused extensions, as well as Intel AMX on supported hardware. It is up to the manufacturer to decide which of their products support their own technologies.

And regarding "fair": what is it that you want to measure? Performance of specific subsystem (e.g. integer, floating-point, vector, cache, memory), or performance of a specific application type? It's not like GB or Apple are cheating in a test. There is no magical "accelerate Object Detection" hardware feature. If Apple has implemented new HPC features that can among other things make this subtest faster, isn't that indicative of the processor performance capability, and isn't that what you are interested in knowing? Or do you only care about old/legacy code that cannot use the new IP blocks (but even than old code might use a framework that can benefit from new features). And so on. I think if one is clear about all these things, the problem of "fairness" disappears, and you are only left with the problem of choosing what is relevant for you. And this is not just about SME or Apple. Would you dismiss technologies like AVX512 or crypto instructions because they are implemented in an asymmetrical fashion by different vendors? Or would you dismiss raytracing GPU tests because some vendors have better implementations? That's up to you to decide.

igor_kavinski said:
Shouldn't it be their responsibility to ensure that all CPUs are being used to the best of their capabilities? Or should they just take "under the table" gifts to include acceleration for one CPU and then wait for gifts to arrive from the other CPU vendors before bothering to include relevant acceleration for their CPUs?

They absolutely do that. GB6 supports newest acceleration technologies for x86 platforms as well. I also don't really see why it is fair to penalize Apple who give you a state of the art matrix coprocessor in a tablet just because Intel has decided to cut AVX-512 from their consumer CPUs.

Nothingness · May 14, 2024

igor_kavinski said:
Is Intel running some GB CPU benchmark code on GPU? Any link/URL? I just want to know the real reason for the Object Detection test's "anomalous" result. Apple didn't announce SME support. GB6 updated their test suite with SME support just before the M4 reveal. So everyone is assuming that Apple is using SME. If they are, what kind of acceleration is GB using for Intel/AMD SoCs? If the test is properly accelerated for one CPU and not others, is that fair?

I previously posted a link comparing two Intel chips using the same generation, one with AMX, one without it. And Object Detection speedup was already present. So in a way one could argue GB favored Intel back then, especially given that it's guaranteed much fewer people will be using Intel chips with AMX than Apple chips with AMX/SME.

igor_kavinski · May 14, 2024

roger_k said:
I also don't really see why it is fair to penalize Apple who give you a state of the art matrix coprocessor in a tables just because Intel has decided to cut AVX-512 from their consumer CPUs.

I'm not saying Apple should be penalized. I'm just interested in the technical details of how they are achieving a better score. AMD still has AVX-512 enabled. The GB6 internals document has not been updated if SME is being used in it. It also says that MobileNetV1 is being used which I suppose is an outdated CNN based detector? And that brings me to the problem. It's too early to say the M4 is incredible based on one possibly outdated test result. Come on back down to reality, folks, from whatever plane of existence you guys have transcended to in your euphoria induced excitement.

igor_kavinski · May 14, 2024

Nothingness said:
So in a way one could argue GB favored Intel back then

Excellent point. So if there is evidence that GB is favoring or has favored certain vendors in certain tests, is it a relevant benchmark and should people be excited over the score of a version bump just released not too long ago?

roger_k · May 14, 2024

igor_kavinski said:
I'm not saying Apple should be penalized. I'm just interested in the technical details of how they are achieving a better score. AMD still has AVX-512 enabled. The GB6 internals document has not been updated if SME is being used in it. It also says that MobileNetV1 is being used which I suppose is an outdated CNN based detector? And that brings me to the problem. It's too early to say the M4 is incredible based on one possibly outdated test result. Come on back down to reality, folks, from whatever plane of existence you guys have transcended to in your euphoria induced excitement.

Then don't look at that particular subtest. I really don't understand what this conversation is about. You are concerned about ML-focused workloads? That's fair! Then why not look at other workloads like code compilation, raytracing, or PDF rendering? That uses pretty much the same code on all platforms and there are no accelerators involved. And passively cooled iPad is confidently outperforming the fastest desktop CPU cores here. If you don't think that this is incredible, I don't know what is.

BTW, MobileNetV1 being outdated has very little relevance. It's still matrix multiplication. I don't care much for AI acceleration. I am a scientist, I care about fast vector processing and matrix multiplication for doing stats. And I can tell you that my matrix-heavy Python and R scripts get a 2x performance improvements just from linking to Apple BLAS library that uses the vector coprocessor. So yes, I am excited by SME, because it means I can use these features in my own high-performance numerical code.

Question Incredible Apple M4 benchmarks...

Golden Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Lifer

Golden Member

Golden Member

Golden Member

Golden Member

Platinum Member

Member

Lifer

Lifer

Lifer

Golden Member

Member

Member

Lifer

Lifer

Member

Platinum Member

Lifer

Lifer

Member