- Jul 27, 2020
- 20,917
- 14,491
- 146
First Gracemont laptop available for sale.
Anybody got disposable $500 to buy and test this laptop?
This is why SPEC exists and we have SPEC results for M4. this is the industry standard and why Intel and ARM use it.The problem with IPC is that it heavily depends on the test suite. And Geekbench 6.3 is not the best one for comparing different platforms. Actually, in some cases, you can't compare results within one platform (SME).
The only viable metric is the actual performance of the apps you're using on the fixed power limit.
It's a cluster-level accelerator so kinda cheese.Also why do you always harp on SME?
The P-Core SME unit can provide up to 16 INT8 TOPS. Beating out Intels 5 INT8 TOPS on Lunar provided by its extensions.It's a cluster-level accelerator so kinda cheese.
When you compare two platforms and want to be objective, you need to ensure that both platforms are in the same conditions. SME gives one platform a benefit over the other, and it is used in three tests.This is why SPEC exists and we have SPEC results for M4. this is the industry standard and why Intel and ARM use it.
Also why do you always harp on SME? AVX-512 is also bumps the score on every Geekbench past 5.1, so one can say you can’t compare any Zen 4/5 score using Geekbench to any Intel/Apple CPU.
Even AMD cherry picked the Geekbench sub tests. Oh and guess what Intel CPUs also support AVX-VNNI which Geekbench 6 supports and also bumps up the score.
With Lunar Lake supporting AVX-VNNI, its score will also be increased in Geekbench.
It looks like you just want to discredit the lead Apple has in IPC which is a big one.
And that test will show you how much SME can bring. You just have to be sure your application can benefit from it. And that helped show that SME seems to bring more performance than AMX (for people who care about matrix computations that matters).When you compare two platforms and want to be objective, you need to ensure that both platforms are in the same conditions. SME gives one platform a benefit over the other, and it is used in three tests.
Same here. I want a CPU with AVX-512 because I've seen what it brings on some benchmarks that I know have similarities that code I develop as a hobby. So my next CPU will be an AMD. I thank benchmarks for showing me this.AVX512-VNNI, as well as AMX, are supported only on some server Intel CPUs and are used in two tests. A more generic AVX-512 set is available on some server Intel processors and recent AMD ones but is not available on consumer Intel products (except Tiger Lake). That also does not make the comparison more objective.
Some subtests of GB are definitely useful and can be compared across OS, ISA and platforms (clang for instance).All those cases make the GB6 results not comparable not only across different platforms (ARM and x86) but within each platform as well. You simply can't compare them objectively. And this is what the Geekbench team wrote in the release notes for 6.3.
SPEC PERL shows things about branch prediction that applies to most end-user code.How often do you use Perl in 2024? How frequently do you compile C? Also, I don't think you model explosions, atmosphere, fluids, weather, etc.
And when you end up with a HW media accelerator that doesn't support the shiny new compression, you'll end on the CPU, where the profile of SW encoders/decoders measured by benchmarks starts being relevant.Another group of tests is related to AI, media processing, rendering, etc.. Those tasks, in most cases, are handled by GPU, media accelerators, and hardware NPU units. There's no reason to use a CPU for that because a GPU is tens of times faster.
Yes agreed. And that's the most important remark. That applies perfectly well to GPU reviews: pick the game you're interested in, the rest most likely doesn't show you much.So, we end up with the conclusion that the only viable metric is the actual performance of the apps you're using on the fixed power limit.
AMD added AVX-VINNI in Zen5 and Intel extended AVX-VINNI extensions to support INT8 in Lunar. These extensions ultimately help with AI workloads.This is a good question, considering that SME support is limited to Apple M4. It already has a 38 TOPS NPU, which can do all those operations a few times faster while consuming less power than a CPU with all cores fully loaded. Actually, that's why other companies like AMD, Qualcomm, ARM, etc. do not add matrix extension support.
If you listened to ARM Computex Keynote/Q&A, they explained SME/SVE/KlediAI. In short, the virtue of adding SME to the CPU is that the CPU is most general purpose processor, and unless an application is specifically programmed to be accelerated by the NPU, it will run on the CPU. That's where the benefit of adding SME to the CPU is.
Also their is benefit to doing matrix operations on the SME unit instead of the NPU. The SME unit has lower latency, than going out to the NPU. This is great for matrix operations that are low throughput but latency sensitive.
They don’t but they should. They seem to love writing articles and talking about it. Anyway, the SME stuff should go to the ARM thread.ARM X925 will not support SMEs, according to the technical specs.
But Intel added these extensions to Lunar Lake as well where will always be an NPU present. There might be some benefits running AI workloads on a CPU.Also, it looks strange to add a lot of silicon for a powerful NPU and then promote running matrix operations on the CPU. I'm sure there will be some kind of legacy mode when some simple models will be running on CPUs, but it will take place when there is no NPU.
But Intel added these extensions to Lunar Lake as well where will always be an NPU present. There might be some benefits running AI workloads on a CPU.
CPU matrix operations will still be used even with an NPU present. There are latency and power penalties for going off CPU. simple small tasks will stay on the CPU. NPU opens up the possibility of doing somewhat larger tasks for a little bit more power.ARM X925 will not support SMEs, according to the technical specs.
As for Intel Skymont, it will be supporting AVX-VNNI. I don't think we will see extensive support for that in actual apps because most developers will be focused on NPUs and appropriate APIs (like DirectML).
Also, it looks strange to add a lot of silicon for a powerful NPU and then promote running matrix operations on the CPU. I'm sure there will be some kind of legacy mode when some simple models will be running on CPUs, but it will take place when there is no NPU.
CPU matrix operations will still be used even with an NPU present. There are latency and power penalties for going off CPU. simple small tasks will stay on the CPU. NPU opens up the possibility of doing somewhat larger tasks for a little bit more power.
GPUs will still handle the biggest tasks, though with a much bigger power penalty.
When an app uses them a lot, it makes sense to offload those tasks to the NPU or GPU and get significant improvement in terms of speed and power consumption. Usually, the actual models are pretty big, and it makes sense to use NPU or GPU anyway.
I am not sure of that. For instance, in the Snapdragon X Elite, the NPU is more powerful than the GPU (45 TOPS vs 30 TOPS). Similar situation in Strix Point, I think.GPUs will still handle the biggest tasks, though with a much bigger power penatty
Not all matrix operations are LLMs.
Skymont is back in the 1mm2 range. It's 0.95-1mm2, or same as Crestmont. Gracemont was 1.6mm2.
I am waiting till someone someone confirms your measurement. (See Lunar Lake thread)Skymont is back in the 1mm2 range. It's 0.95-1mm2, or same as Crestmont. Gracemont was 1.6mm2.
Monodie Xeons have always been useful for that. Chart needs SPR-MCC basically.I don't know how Intel managed to pull a rabbit out of their dingy hat and deliver the lowest latency in a PGSQL workload
It's scary. And if Intel somehow gave 6780E a sizeable cache (~1GB), they could have a monster at their hands.Monodie Xeons have always been useful for that.