Question Incredible Apple M4 benchmarks...

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

poke01

Platinum Member
Mar 8, 2022
2,583
3,409
106
There were rumours that the M4 is going to be more AI focused. I guess SME was part of it. So right now Apple has faster AI accelerators than a Xeon in a iPad.

Yeah, those rumours that Apple building their own AI servers might be true.
Here is 12700T vs w5-3435X, Both based on Golden Cove, the latter having AMX: https://browser.geekbench.com/v6/cpu/compare/5330257?baseline=4656765
I didn't pick the fastest scores, but checked both are using the same GB version.

The effect of AMX can be seen on Object Detection. It also looks like Background Blur is affected.
Thanks for this.
 
Reactions: Orfosaurio

poke01

Platinum Member
Mar 8, 2022
2,583
3,409
106
Looks like AMD also benefits from AVX512-VNNI?

Zen 4:
Zen 3:

Huge uplift in Object Detections section. So anyone going to calculate IPC without Object detection now for Zen 4 cause of AI shenanigans.
 
Last edited:

poke01

Platinum Member
Mar 8, 2022
2,583
3,409
106
Look Intels adding more neural extensions to Lunar and Arrow lake?


I guess we gotta check the IPC improvement for Intels upcoming CPUs without all these extensions now…
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Look Intels adding more neural extensions to Lunar and Arrow lake?


I guess we gotta check the IPC improvement for Intels upcoming CPUs without all these extensions now…

Fine by me. I know you're trying to make some kind of point about people being unfair to Apple, and maybe there is some of that going on; I think my posting history makes clear I'm not one of those - but there absolutely is value, a lot of it, in knowing what a core is likely to do with a random existing application that isn't implementing the buzzword of the week and isn't affected by specialized acceleration for it.

This applies to everyone - Intel, AMD, Apple, IBM, anyone else.
 

SpudLobby

Senior member
May 18, 2022
991
684
106
Fine by me. I know you're trying to make some kind of point about people being unfair to Apple, and maybe there is some of that going on; I think my posting history makes clear I'm not one of those - but there absolutely is value, a lot of it, in knowing what a core is likely to do with a random existing application that isn't implementing the buzzword of the week and isn't affected by specialized acceleration for it.

This applies to everyone - Intel, AMD, Apple, IBM, anyone else.
Agreed
 
Reactions: SarahKerrigan

poke01

Platinum Member
Mar 8, 2022
2,583
3,409
106
Fine by me. I know you're trying to make some kind of point about people being unfair to Apple, and maybe there is some of that going on; I think my posting history makes clear I'm not one of those - but there absolutely is value, a lot of it, in knowing what a core is likely to do with a random existing application that isn't implementing the buzzword of the week and isn't affected by specialized acceleration for it.

This applies to everyone - Intel, AMD, Apple, IBM, anyone else.
Completely agree. Just having some fun with the GB6 charts I been seeing on weibo and Twitter. It’s been real fun learning all these extensions.

Let’s be honest there is some bias against Apple and Intel among the internet and that’s to be expected.
 

Hitman928

Diamond Member
Apr 15, 2012
6,391
11,392
136
SME is not only for matrix operations, it also includes a subset of HPC-oriented SVE instructions. Apples implementation uses 512-bit vectors, so comparisons to AVX-512 are appropriate as long as one stays within the HPC domain.

It's not 1:1 comparison as AVX-512 is a mess because Intel couldn't decide what they wanted from it. It does include some matrix match acceleration but is also a much more broadly based math based IS. Intel then decided to include a whole separate accelerator unit specifically for matrix math, which is when they introduced Advanced Matrix Extensions (AMX) but they only include it (at least for now) in their server offerings. On the other hand, ARM's SVE/SVE2 also have some matrix math instructions but ARM decided to also include an new IS that specifically targets matrix math acceleration, Scalable Matrix Extension (SME). SME's main purpose is to greatly increase matrix match processing speed but ARM also allows it to extend a limited number of SVE(2) instructions with the streaming SVE mode to, as far as I can tell, allow for the SME tiles to enable larger width operations than the SVE units can natively support (but again, only on a limited set of operations).

So basically, GB6 has been using Intel's AMX all along. While Apple had AMX, it wasn't utilized by GB6 because you have to use Apple's CoreML library to even target it. CoreML will automatically run your code through NPU, AMX, or GPU so there was no way to guarantee it. Hence, it was left out of GB6.

This is just Apple/ARM having more feature parity with x86 instructions. x86 vs M4 scores for GB6 are valid then.

AMX isn't supported on client CPUs at this time, which is where all the comparisons have been.

Looks like AMD also benefits from AVX512-VNNI?

Zen 4:
Zen 3:

Huge uplift in Object Detections section. So anyone going to calculate IPC without Object detection now for Zen 4 cause of AI shenanigans.

Zen 4 IPC wasn't calculated from GB6, so no need to recalculate it.

Zen 4 does seem to have a large uplift in object detection against Zen 3, but it doesn't see the same uplift compared to ADL, which doesn't support AVX512-VNNI. So, it seems that either GB will use even AVX-VNNI (which ADL does support as far as I know) for object detection or the increase is due to another reason from the improved architecture. Comparing them to M3, Zen 4 doesn't seem to have an advantage in this sub test either, so kind of the same conclusion, either the sub test was already getting some support across all of the latest CPUs instruction sets, or there's another reason they all seemed to perform fairly close to each other.

Whatever the case, it's clear that once AMX/SME come into the picture, there's a much larger acceleration occuring as can be seen in the ADL vs. Zen 4 vs. SPR vs. M4 comparisons. M4 with SME seems to have the biggest advantage in this sub test for the single core run, even more so than SPR with AMX. Overall, it really shouldn't matter. No one should be taking a GB overall score as a standard for "IPC" anyway. We will (hopefully) get SPEC and other actual app benchmarks soon enough to get a clearer picture.
 
Jul 27, 2020
20,917
14,493
146
No one should be taking a GB overall score as a standard for "IPC" anyway.
GB is getting more and more useless with each update. They need to stop with the ST/MT crap scores since the MT score is moot anyway, divide the tests into categories and derive a single composite score from the category scores. Composite score seems too high? Dive into the category score comparison to see which one is the highest. Category score seems too high? Compare the subtests in that category.
 
Reactions: Tlh97

Boland

Junior Member
May 13, 2024
2
2
16

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
License disclosure indicates they are in fact using some of DXVK's code. They could open source it to show to what extent they are using it but they won't because Apple.

Edit: It could be that it's entirely in the (L)GPL Wine components but since they bundle it all together who knows. But even still if shows that as usual Apple will build proprietary solutions on top of open ones and refuse to open that. At least in this case CodeWeavers got a nice paycheck (presumably) for doing the dirty work.
 
Last edited:
Reactions: KompuKare

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
GB is getting more and more useless with each update. They need to stop with the ST/MT crap scores since the MT score is moot anyway, divide the tests into categories and derive a single composite score from the category scores. Composite score seems too high? Dive into the category score comparison to see which one is the highest. Category score seems too high? Compare the subtests in that category.
*All* benchmarks need digging into subtests. The most stupid one is AnTuTu that aggregates CPU/GPU/MEM/UX scores as a single figure

What I find annoying with Geekbench is that now you don't get the INT and FP aggregated scores anymore; it's either individual test or the global INT+FP score. But even that separation would not have been enough to understand why M4 was getting such a high score. This issue already happened with SPEC results where some compilers from Sun and Intel were "breaking" some of the tests; you had to isolate the outliers to get a better picture of the performance of a CPU.
 

jeanlain

Member
Oct 26, 2020
159
136
116
At least in this case CodeWeavers got a nice paycheck (presumably) for doing the dirty work.
They also apparently got permission to include Apple's game porting toolkit in CrossOver, despite licensing terms suggesting this should not be allowed.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
They also apparently got permission to include Apple's game porting toolkit in CrossOver, despite licensing terms suggesting this should not be allowed.
No, that's definitely allowed - all the LGPL and GPL components have their source available.
 

poke01

Platinum Member
Mar 8, 2022
2,583
3,409
106
Zen 4 IPC wasn't calculated from GB6, so no need to recalculate it.
It’s clear IPC shouldn’t be derived from one benchmark.
M4 with SME seems to have the biggest advantage in this sub test for the single core run, even more so than SPR with AMX.
Since M4 is the first ARM SoC with SME, I do think it will take some for real world apps to implement SME. I’m not aware of any application (apart from GB) right now that supports SME so we can’t test the uplift from M3 to M4 if an application uses SME.

Hopefully PyTorch. Mat lab etc support it soon
 

Doug S

Platinum Member
Feb 8, 2020
2,889
4,912
136
No, what annoys us is Apple's walled garden. If they would let us run Windows and our current games library with proper DX12/Vulkan drivers, why would we be angry?

What is Apple doing to prevent you from running Windows? Are you complaining that they didn't, at their own expense, develop DX12 drivers for their GPU?

I'm not sure if it is still in force, but Microsoft had an exclusive deal with Qualcomm for Windows on ARM. So both of those companies share the blame you can't buy a Macbook and natively boot Windows on it.

If Apple really wanted to go walled garden on the Mac, they would locked the bootloader to prevent booting other operating systems. Something that some Windows PCs have done - do you complain about them, or do you not consider that a "walled garden" because all you care about is running Windows?
 

Doug S

Platinum Member
Feb 8, 2020
2,889
4,912
136
Fine by me. I know you're trying to make some kind of point about people being unfair to Apple, and maybe there is some of that going on; I think my posting history makes clear I'm not one of those - but there absolutely is value, a lot of it, in knowing what a core is likely to do with a random existing application that isn't implementing the buzzword of the week and isn't affected by specialized acceleration for it.

This applies to everyone - Intel, AMD, Apple, IBM, anyone else.

I agree, and that's why I always pay most of my attention to gcc on SPEC and LLVM/clang on Geekbench. Linus is in the same camp on this. Those benchmarks can't be "broken" or gamed like some SPEC subtests have been, they aren't affected by SIMD or AMX/SVE type instructions that get slipped in Geekbench's dot versions, and have a large enough footprint and enough random branchiness that they don't unfairly favor either large caches or high clock rates.

I wish Geekbench's web site would offer an easy way to view scores by subtest. If they did I'd just compare everything by clang score and ignore the rest. Not saying the results for other stuff doesn't matter but if you can get a 10% improvement on clang results you know the CPU really is 10% faster . Maybe you get a bigger benefit on something else, but if you do it is because that "something else" is benefiting from something beyond the CPU itself getting 10% faster.
 

Doug S

Platinum Member
Feb 8, 2020
2,889
4,912
136
It’s clear IPC shouldn’t be derived from one benchmark.

Since M4 is the first ARM SoC with SME, I do think it will take some for real world apps to implement SME. I’m not aware of any application (apart from GB) right now that supports SME so we can’t test the uplift from M3 to M4 if an application uses SME.

Hopefully PyTorch. Mat lab etc support it soon

Something that could easily benefit from it like Matlab will already be using Accelerate library calls that leverages the AMX instructions in existing Apple Silicon, so they won't get any faster from SME. In fact, on an M4 device you wouldn't even need to update the app - because the Accelerate library will simply call SME instructions instead of AMX instructions to perform their function.
 

jeanlain

Member
Oct 26, 2020
159
136
116
No, that's definitely allowed - all the LGPL and GPL components have their source available.
I may be mistaken, but IIRC Apple license said that one must not embed the game porting toolkit in a product, and/or that the toolkit was for development only.
But the terms may have changed.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
I may be mistaken, but IIRC Apple license said that one must not embed the game porting toolkit in a product, and/or that the toolkit was for development only.
But the terms may have changed.
Oh I misunderstood you. In any case, Apple too can offer difference license terms to different people. We can't, but CodeWeavers can integrate it.
 

Boland

Junior Member
May 13, 2024
2
2
16
License disclosure indicates they are in fact using some of DXVK's code. They could open source it to show to what extent they are using it but they won't because Apple.

Edit: It could be that it's entirely in the (L)GPL Wine components but since they bundle it all together who knows. But even still if shows that as usual Apple will build proprietary solutions on top of open ones and refuse to open that. At least in this case CodeWeavers got a nice paycheck (presumably) for doing the dirty work.
The licence disclosure lists DXVK because GPTK (as a whole) uses wine/crossover. D3DMETAL has nothing to do with that, as the engineer behind it plainly stated. Besides, DXVK is an DX11 to Vulkan translator, it doesn’t do DX12. Apple DID give codeweavers permission to bundle D3DMETAL with crossover though.
 
Reactions: Viknet and jeanlain

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
Apple DID give codeweavers permission to bundle D3DMETAL with crossover though.
In closed form & in exchange for access to their patches over Wine. Meanwhile Mesa has Microsoft contributors and has for years. Apple's not really changing at all. Use wherever possible, give back as little as possible and avoid GPL like the plague. The other American megacorps at least do play the open source game for non-core components. And can you think of anything less core to Apple's success than keeping D3DMetal closed? But they do it anyway.
 
Reactions: igor_kavinski

mikegg

Golden Member
Jan 30, 2010
1,849
471
136
AMX isn't supported on client CPUs at this time, which is where all the comparisons have been.
But it's supported on GB6, which is what these comparisons have been. See these posts below:



Here is 12700T vs w5-3435X, Both based on Golden Cove, the latter having AMX: https://browser.geekbench.com/v6/cpu/compare/5330257?baseline=4656765
I didn't pick the fastest scores, but checked both are using the same GB version.

The effect of AMX can be seen on Object Detection. It also looks like Background Blur is affected.
 
Reactions: Orfosaurio
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |