SME is not only for matrix operations, it also includes a subset of HPC-oriented SVE instructions. Apples implementation uses 512-bit vectors, so comparisons to AVX-512 are appropriate as long as one stays within the HPC domain.
It's not 1:1 comparison as AVX-512 is a mess because Intel couldn't decide what they wanted from it. It does include some matrix match acceleration but is also a much more broadly based math based IS. Intel then decided to include a whole separate accelerator unit specifically for matrix math, which is when they introduced Advanced Matrix Extensions (AMX) but they only include it (at least for now) in their server offerings. On the other hand, ARM's SVE/SVE2 also have some matrix math instructions but ARM decided to also include an new IS that specifically targets matrix math acceleration, Scalable Matrix Extension (SME). SME's main purpose is to greatly increase matrix match processing speed but ARM also allows it to extend a limited number of SVE(2) instructions with the streaming SVE mode to, as far as I can tell, allow for the SME tiles to enable larger width operations than the SVE units can natively support (but again, only on a limited set of operations).
So basically, GB6 has been using Intel's AMX all along. While Apple had AMX, it wasn't utilized by GB6 because you have to use Apple's CoreML library to even target it. CoreML will automatically run your code through NPU, AMX, or GPU so there was no way to guarantee it. Hence, it was left out of GB6.
This is just Apple/ARM having more feature parity with x86 instructions. x86 vs M4 scores for GB6 are valid then.
AMX isn't supported on client CPUs at this time, which is where all the comparisons have been.
Looks like AMD also benefits from AVX512-VNNI?
Zen 4:
Benchmark results for a Micro-Star International Co., Ltd. MS-7E16 with an AMD Ryzen 7 7700X processor.
browser.geekbench.com
Zen 3:
Benchmark results for a Gigabyte Technology Co., Ltd. B550M DS3H with an AMD Ryzen 7 5700X processor.
browser.geekbench.com
Huge uplift in Object Detections section. So anyone going to calculate IPC without Object detection now for Zen 4 cause of AI shenanigans.
Zen 4 IPC wasn't calculated from GB6, so no need to recalculate it.
Zen 4 does seem to have a large uplift in object detection against Zen 3, but it doesn't see the same uplift
compared to ADL, which doesn't support AVX512-VNNI. So, it seems that either GB will use even AVX-VNNI (which ADL does support as far as I know) for object detection or the increase is due to another reason from the improved architecture.
Comparing them to M3, Zen 4 doesn't seem to have an advantage in this sub test either, so kind of the same conclusion, either the sub test was already getting some support across all of the latest CPUs instruction sets, or there's another reason they all seemed to perform fairly close to each other.
Whatever the case, it's clear that once AMX/SME come into the picture, there's a much larger acceleration occuring as can be seen in the ADL vs. Zen 4 vs. SPR vs. M4 comparisons. M4 with SME seems to have the biggest advantage in this sub test for the single core run, even more so than
SPR with AMX. Overall, it really shouldn't matter. No one should be taking a GB overall score as a standard for "IPC" anyway. We will (hopefully) get SPEC and other actual app benchmarks soon enough to get a clearer picture.