Question Incredible Apple M4 benchmarks...

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mikegg

Golden Member
Jan 30, 2010
1,815
445
136


Taken from macrumors.com forum poster.

Zen3 to Zen4 had similar Object Detection gains.

Are Zen4 GB6 ST speeds valid?
 
Reactions: Orfosaurio

gdansk

Platinum Member
Feb 8, 2011
2,478
3,373
136
Are Zen4 GB6 ST speeds valid?
Read the thread again it's all been said.
Zen 4 is still only on par with its competition, Raptor Lake, in that subtest. Both using 256 bit vector operations.
SVL for M4 is 128 bits and the ZA size is 128x128 bits and it can do that far fewer cycles. So it ends up twice the score of its competition overall. Cool, but much more niche.
 

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
Read the thread again it's all been said.
Zen 4 is still only on par with its competition, Raptor Lake, in that subtest. Both using 256 bit vector operations.
SVL for M4 is 128 bits and the ZA size is 128x128 bits and it can do that far fewer cycles. So it ends up twice the score of its competition overall. Cool, but much more niche.
Um... so what's your point? x86 Object Detecton is valid but not M4?
 
Reactions: Orfosaurio

gdansk

Platinum Member
Feb 8, 2011
2,478
3,373
136
Um... so what's your point? x86 Object Detecton is valid but not M4?
Object detection makes almost no difference in the comparison between Raptor Lake / Zen 4 / M3. They're all on par.
M4 object detection is twice the score as all of them. So now it's an issue because now it makes a difference and it is implemented using an even less applicable extension (where 80% of the instructions are for workloads most software will run on the NPU anyway).

And people have been complaining about GB6 since its release for numerous reasons. But I'm sure this is all lost on you. GB6 says M4 is 50% faster than M2 in ST. This is good, Apple is good, so GB6 must be good. Nevermind that it doesn't agree even with Apple's own claims.
 
Last edited:

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
Object detection makes almost no difference in the comparison between Raptor Lake / Zen 4 / M3. They're all on par.
M4 object detection is twice the score as all of them. So now it's an issue because now it makes a difference and it is implemented using an even less applicable extension (where 80% of the instructions are for workloads most software will run on the NPU anyway).

And people have been complaining about GB6 since its release for numerous reasons. But I'm sure this is all lost on you. GB6 says M4 is 50% faster than M2 in ST. This is good, Apple is good, so GB6 must be good. Nevermind that it doesn't agree even with Apple's own claims.
So basically, M4's score is so high that GB6 is no longer valid.

What do you suggest? Go back to Cinebench R23?
 
Reactions: Orfosaurio

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
Cinebench 2024 is better than GB6 even though it’s a renderer benchmark. Just avoid GB6. Looking at SPEC, M4 is not a major leap that GB6 indicates.
That's a weird statement since Cinebench has never known to be correlated with anything. Even in renderer tasks, it's a niche. Use Blender benchmarks instead.
 
Reactions: Orfosaurio

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
Loving the denials here.

Apple's M series CPUs have always had higher perf/watt, raw performance, and IPC than AMD & Intel. They're generations ahead regardless of the node. But since their scores are so high, people simply don't believe it even though it's been proven over and over again that they are real.

Meanwhile, people still cling to Cinebench (usually R23) to try to make their x86 CPUs look better than they really are.

Even ex-Anandtech Andrei said Cinebench is a generally poor CPU benchmark to use for CPU performance measurements. That's when he was still working at Anandtech and not Qualcomm.

 
Reactions: Orfosaurio

gdansk

Platinum Member
Feb 8, 2011
2,478
3,373
136
M2 was 7-8 months ahead of Phoenix, both on improved TSMC 5nm processes. Similar GB6 ST composite scores. M2 with lower GB6 MT composite score. Yet when you look into the subtests of that composite it's quite flawed. It's hard to explain how Apple is "generations ahead regardless of the node" in anything except power draw when everything since then is a node ahead.

I understand some Apple enthusiasts are upset that they're following x64 at pursuing clock rates and also following x64 in niche instruction set proliferation but so it is. But be cautious of going the route of saying whatever results GB6 spits out is a "indicator of performance increase" when there's even more subtests that could benefit from AVX-512 and it's only at half-rate or not present at all right now.

As with most my posts - sent from my Apple Silicon MacBook.
 
Last edited:
Jul 27, 2020
17,673
11,405
106
Are you complaining that they didn't, at their own expense, develop DX12 drivers for their GPU?

YES! That's not really an expense for a company worth trillions. They could get their interns to do that, WITHOUT PAY, as a project to prove their worth for a permanent position.

I'm not sure if it is still in force, but Microsoft had an exclusive deal with Qualcomm for Windows on ARM. So both of those companies share the blame you can't buy a Macbook and natively boot Windows on it.

And Apple takes no blame? Why do you think Microsoft went with Qualcomm? Coz Apple won't give them ARM chips for Surface devices and especially not at prices that Qualcomm is providing. Do you forget that it was an Apple engineer who in his spare time, got MacOS working on x86 hardware, Steve Jobs saw it, flew to Japan to meet with Sony top dog and pitched an idea of MacOS running on Sony VAIO laptops? Shouldn't be a big task for a few of their engineers to get Windows on ARM running on a Mx device and then just announce that Macbooks are versatile enough to run Windows and literally every useful application in existence. Wouldn't that be a big selling point?

If Apple really wanted to go walled garden on the Mac, they would locked the bootloader to prevent booting other operating systems. Something that some Windows PCs have done - do you complain about them, or do you not consider that a "walled garden" because all you care about is running Windows?

If Apple isn't promoting the fact that their bootloader is unlocked, there's no guarantee that it will stay unlocked. Maybe they are curious to see how far hackers get with running a functional Linux on their hardware. If they get too close for comfort, there is nothing stopping Apple from locking everything down coz we all know how much Apple fears competition and they do literally everything under the sun to prevent anyone from getting in on their side of the fence. An unlocked bootloader may also be a talent scouting tactic on Apple's part. Search on Github for projects that hack away on Apple hardware and then scoop them up coz if they are doing that much for free, imagine what they could do with proper guidance and creature comforts.
 
Jul 27, 2020
17,673
11,405
106
Loving the denials here.

Apple's M series CPUs have always had higher perf/watt, raw performance, and IPC than AMD & Intel. They're generations ahead regardless of the node.
Don't disagree with all that. But it's mostly MEH for me since I'm not a data scientist, not an artist, not an animator, not an AI junkie, not a musician, not someone trying to look cool in public etc. I'm just a computer enthusiast who does a lot of browsing and gaming and media consumption and for those use cases, despite all the advantages of power efficiency on its side, Apple devices make no money sense for me.

Only reason I'm in this thread is the sensational title of the topic. Incredible? Not for everyone. For the minority of the global population, sure. Doesn't do anything for the little guy.
 
Reactions: DeathReborn

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
How can we be sure that Apple's SME implementation isn't tapping into its NPU?
Probably because that'd be much slower because you'd be moving data off the CPU caches?

Quite easy to tell when M4 makes it to Macs. Just run powermetrics and see if the NPU uses more power during the test.

It's almost certain that Apple repurposed their AMX coprocessor into SME.
 
Reactions: Orfosaurio

okoroezenwa

Member
Dec 22, 2020
50
49
61
How can we be sure that Apple's SME implementation isn't tapping into its NPU?
Is that your actual concern here? That the NPU is taking over? If that’s the case maybe Intel and AMD should take notes. In any case I’m not sure what proper optimisation could be done on Primate Labs’ end if that’s just how things are set up to work on a given SoC.
 

roger_k

Member
Sep 23, 2021
102
215
86
SVL for M4 is 128 bits and the ZA size is 128x128 bits and it can do that far fewer cycles. So it ends up twice the score of its competition overall. Cool, but much more niche.

Has Apple released this information? AMX was 512-bit, it would be very interesting if they went with a smaller size for their SSVE implementation.

GB6 says M4 is 50% faster than M2 in ST. This is good, Apple is good, so GB6 must be good. Nevermind that it doesn't agree even with Apple's own claims.

GB6 ST improvements from M2 to M4 are pretty much consistent with improvements in individual subtests. It's roundly 40% higher ST score, and individual subtests are between 20% and 50% faster (with Object Detection being an obvious outlier). The aggregate ST score is not as sensitive to individual subtests as some claim.


How can we be sure that Apple's SME implementation isn't tapping into its NPU?

I don't know, by using common sense and understanding of how CPUs work? I mean, how do you imagine them doing something like that? If they have figured out how to hand off CPU work to a completely separate IP block with ultra-low latency, then their design team is truly unmatched. I just don't get it why you worry about Apple running some CPU code on an NPU but disregard the possibility of, say, Intel running some CPU code on the GPU. It's the same thing after all.
 
Jul 27, 2020
17,673
11,405
106
I just don't get it why you worry about Apple running some CPU code on an NPU but disregard the possibility of, say, Intel running some CPU code on the GPU. It's the same thing after all.
Is Intel running some GB CPU benchmark code on GPU? Any link/URL? I just want to know the real reason for the Object Detection test's "anomalous" result. Apple didn't announce SME support. GB6 updated their test suite with SME support just before the M4 reveal. So everyone is assuming that Apple is using SME. If they are, what kind of acceleration is GB using for Intel/AMD SoCs? If the test is properly accelerated for one CPU and not others, is that fair?
 
Jul 27, 2020
17,673
11,405
106
In any case I’m not sure what proper optimisation could be done on Primate Labs’ end if that’s just how things are set up to work on a given SoC.
Shouldn't it be their responsibility to ensure that all CPUs are being used to the best of their capabilities? Or should they just take "under the table" gifts to include acceleration for one CPU and then wait for gifts to arrive from the other CPU vendors before bothering to include relevant acceleration for their CPUs?
 

roger_k

Member
Sep 23, 2021
102
215
86
Is Intel running some GB CPU benchmark code on GPU? Any link/URL?

I do not understand the question. Do you have any evidence that Apple is using the NPU to accelerate Object Detection? Or at least an explanation how this would be achieved technically?

I just want to know the real reason for the Object Detection test's "anomalous" result. Apple didn't announce SME. GB6 updated their test suite with SME support just before the M4 reveal. So everyone is assuming that Apple is using SME.

Apple did not announce SME, but the GB6 code contains an SME path and uses run-time SME feature detection on Apple platforms. Besides, OS experts have found evidence of SME in the iPadOS images for the new iPad. I think at this point it is fairly safe to assume that SME is there.

If they are, what kind of acceleration is GB using for Intel/AMD SoCs? If the test is properly accelerated for one CPU and not others, is that fair?

GB6 is using AVX512 with ML-focused extensions, as well as Intel AMX on supported hardware. It is up to the manufacturer to decide which of their products support their own technologies.

And regarding "fair": what is it that you want to measure? Performance of specific subsystem (e.g. integer, floating-point, vector, cache, memory), or performance of a specific application type? It's not like GB or Apple are cheating in a test. There is no magical "accelerate Object Detection" hardware feature. If Apple has implemented new HPC features that can among other things make this subtest faster, isn't that indicative of the processor performance capability, and isn't that what you are interested in knowing? Or do you only care about old/legacy code that cannot use the new IP blocks (but even than old code might use a framework that can benefit from new features). And so on. I think if one is clear about all these things, the problem of "fairness" disappears, and you are only left with the problem of choosing what is relevant for you. And this is not just about SME or Apple. Would you dismiss technologies like AVX512 or crypto instructions because they are implemented in an asymmetrical fashion by different vendors? Or would you dismiss raytracing GPU tests because some vendors have better implementations? That's up to you to decide.


Shouldn't it be their responsibility to ensure that all CPUs are being used to the best of their capabilities? Or should they just take "under the table" gifts to include acceleration for one CPU and then wait for gifts to arrive from the other CPU vendors before bothering to include relevant acceleration for their CPUs?

They absolutely do that. GB6 supports newest acceleration technologies for x86 platforms as well. I also don't really see why it is fair to penalize Apple who give you a state of the art matrix coprocessor in a tablet just because Intel has decided to cut AVX-512 from their consumer CPUs.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,704
1,308
136
Is Intel running some GB CPU benchmark code on GPU? Any link/URL? I just want to know the real reason for the Object Detection test's "anomalous" result. Apple didn't announce SME support. GB6 updated their test suite with SME support just before the M4 reveal. So everyone is assuming that Apple is using SME. If they are, what kind of acceleration is GB using for Intel/AMD SoCs? If the test is properly accelerated for one CPU and not others, is that fair?
I previously posted a link comparing two Intel chips using the same generation, one with AMX, one without it. And Object Detection speedup was already present. So in a way one could argue GB favored Intel back then, especially given that it's guaranteed much fewer people will be using Intel chips with AMX than Apple chips with AMX/SME.
 
Jul 27, 2020
17,673
11,405
106
I also don't really see why it is fair to penalize Apple who give you a state of the art matrix coprocessor in a tables just because Intel has decided to cut AVX-512 from their consumer CPUs.
I'm not saying Apple should be penalized. I'm just interested in the technical details of how they are achieving a better score. AMD still has AVX-512 enabled. The GB6 internals document has not been updated if SME is being used in it. It also says that MobileNetV1 is being used which I suppose is an outdated CNN based detector? And that brings me to the problem. It's too early to say the M4 is incredible based on one possibly outdated test result. Come on back down to reality, folks, from whatever plane of existence you guys have transcended to in your euphoria induced excitement.
 
Jul 27, 2020
17,673
11,405
106
So in a way one could argue GB favored Intel back then
Excellent point. So if there is evidence that GB is favoring or has favored certain vendors in certain tests, is it a relevant benchmark and should people be excited over the score of a version bump just released not too long ago?
 

roger_k

Member
Sep 23, 2021
102
215
86
I'm not saying Apple should be penalized. I'm just interested in the technical details of how they are achieving a better score. AMD still has AVX-512 enabled. The GB6 internals document has not been updated if SME is being used in it. It also says that MobileNetV1 is being used which I suppose is an outdated CNN based detector? And that brings me to the problem. It's too early to say the M4 is incredible based on one possibly outdated test result. Come on back down to reality, folks, from whatever plane of existence you guys have transcended to in your euphoria induced excitement.

Then don't look at that particular subtest. I really don't understand what this conversation is about. You are concerned about ML-focused workloads? That's fair! Then why not look at other workloads like code compilation, raytracing, or PDF rendering? That uses pretty much the same code on all platforms and there are no accelerators involved. And passively cooled iPad is confidently outperforming the fastest desktop CPU cores here. If you don't think that this is incredible, I don't know what is.

BTW, MobileNetV1 being outdated has very little relevance. It's still matrix multiplication. I don't care much for AI acceleration. I am a scientist, I care about fast vector processing and matrix multiplication for doing stats. And I can tell you that my matrix-heavy Python and R scripts get a 2x performance improvements just from linking to Apple BLAS library that uses the vector coprocessor. So yes, I am excited by SME, because it means I can use these features in my own high-performance numerical code.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |