Question Geekbench 6 released and calibrated against Core i7-12700

igor_kavinski · Feb 14, 2023

Geekbench Blog

www.geekbench.com

Weird choice of baseline CPU and even weird is that the baseline score is 2500.

i7-12700 does hardly 2000 in GB5 with the fastest DDR5.

TwistedAndy · Jun 25, 2024

I've found interesting results illustrating the impact of the SME on Apple M4.

Here is the comparison of the same Apple M4 device made by the same user:

The only difference is the test version. Geekbench 6.3 support SME, Geekbench 6.2 does not.

Also, it's pretty obvious which tests are affected:

And there's no way to filter Geekbench results by the benchmark version.

Garbage benchmark.

Nothingness · Jun 25, 2024

TwistedAndy said:
Garbage benchmark.

Interesting comparison, thanks. But silly conclusion from an uneducated user.

Eug · Jun 25, 2024

TwistedAndy said:
I've found interesting results illustrating the impact of the SME on Apple M4.

Here is the comparison of the same Apple M4 device made by the same user:

View attachment 101848

The only difference is the test version. Geekbench 6.3 support SME, Geekbench 6.2 does not.

Also, it's pretty obvious which tests are affected:

View attachment 101850

And there's no way to filter Geekbench results by the benchmark version.

Garbage benchmark.

We had this discussion weeks ago. We also calculated the impact of SME. In addition, Geekbench 5 results are available for M4.

TwistedAndy · Jun 25, 2024

Eug said:
We had this discussion weeks ago. We also calculated the impact of SME.

Yep. Now, we have some actual results on the same device and the OS version.

There are some other results made on iOS 18 with GB 6.2 (3476), but they are not much different.

Unfortunately, Geekbench does not allow us to filter results by the benchmark version.

Luckily, the results for Geekbench 5 are still available. They are way more accurate and closer to SPEC.

igor_kavinski · Tuesday at 12:49 PM

eek2121 said:
Geekbench does use many libraries that are used in both commercial and open source applications.

For compression, LZ4/ZSTD: Mostly used in Linux?

Navigation: How many self learned developers who mostly learned by shooting their foot off and learning to the extent of what NOT to do to compile software successfully, know about Dijkstra's algorithm? I bet most non-CS background programmers think the problem is easy to solve and just use brute force to generate such routes.

HTML5 browser: Why doesn't GB just use a full open source browser for this workload instead of trying to mimic one?

PDF Render: Again, it's using only the PDF component in isolation and not simulating how a browser with other tabs open would generate a workload or how a browser would first launch, load its engine into RAM and then display the PDF.

Photo Library: Mimicry instead of using real downloadable software for such user tasks.

Clang: This is a more real world workload but is it testing compilation only or JIT compilation too? Not clear if it's doing both. Is it running some workload in Lua after compiling it?

Text processing: What kind of convoluted mess is this? Why is there an encrypted in-memory file system being used? Which real text processing software does this? And using Python that can never achieve true parallelism due to its GIL issue? How many normal non-developer folks use Python for multicore workloads in their daily lives?

Asset compression: This is a workload that is even more limited in scope as it targets a specific species of developer: the Game Developer, renowned for the ability to CRUNCH, CRUNCH, CRUNCH with minimum sleep and maximum caffeine. What is this workload going to tell the average user?

Object Detection: Why not use a real open source software for this purpose?

Background Blur: Not a bad workload but only 10 frames? How can those 10 frames capture the rapid head movement of a typical user on cam?

Object Remover: Again, why not use Gimp?

Horizon Detection: Why no Gimp?

Photo filter: Doesn't mention a library so maybe written from scratch?

HDR: This is also not using a library and written from scratch meaning it may not be reflective of HDR operations done in real applications.

Raytracer: No issues with this since it's using Embree.

Structure from Motion: No mention of library. Custom code.

And the remaining tests don't mention any libraries either.

This benchmark is mostly a mishmash of custom code with the occasional widely used library. How can it possibly reflect real world usage?

I pray that your much-awaited eekBench does not suffer from the above mentioned pitfalls.

Nothingness · Tuesday at 1:09 PM

igor_kavinski said:
For compression, LZ4/ZSTD: Mostly used in Linux?

Navigation: How many self learned developers who mostly learned by shooting their foot off and learning to the extent of what NOT to do to compile software successfully, know about Dijkstra's algorithm? I bet most non-CS background programmers think the problem is easy to solve and just use brute force to generate such routes.

HTML5 browser: Why doesn't GB just use a full open source browser for this workload instead of trying to mimic one?

PDF Render: Again, it's using only the PDF component in isolation and not simulating how a browser with other tabs open would generate a workload or how a browser would first launch, load its engine into RAM and then display the PDF.

Photo Library: Mimicry instead of using real downloadable software for such user tasks.

Clang: This is a more real world workload but is it testing compilation only or JIT compilation too? Not clear if it's doing both. Is it running some workload in Lua after compiling it?

Text processing: What kind of convoluted mess is this? Why is there an encrypted in-memory file system being used? Which real text processing software does this? And using Python that can never achieve true parallelism due to its GIL issue? How many normal non-developer folks use Python for multicore workloads in their daily lives?

Asset compression: This is a workload that is even more limited in scope as it targets a specific species of developer: the Game Developer, renowned for the ability to CRUNCH, CRUNCH, CRUNCH with minimum sleep and maximum caffeine. What is this workload going to tell the average user?

Object Detection: Why not use a real open source software for this purpose?

Background Blur: Not a bad workload but only 10 frames? How can those 10 frames capture the rapid head movement of a typical user on cam?

Object Remover: Again, why not use Gimp?

Horizon Detection: Why no Gimp?

Photo filter: Doesn't mention a library so maybe written from scratch?

HDR: This is also not using a library and written from scratch meaning it may not be reflective of HDR operations done in real applications.

Raytracer: No issues with this since it's using Embree.

Structure from Motion: No mention of library. Custom code.

And the remaining tests don't mention any libraries either.

This benchmark is mostly a mishmash of custom code with the occasional widely used library. How can it possibly reflect real world usage?

I pray that your much-awaited eekBench does not suffer from the above mentioned pitfalls.

Can't wait to see Geekbench become a 1 GB code benchmark and people complain that it's not using the very latest version of their preferred app

Benchmarks such as Geekbench are proxies to existing application. If you want to measure the speed of GIMP then ask the GIMP community to design a specific benchmark for GIMP. It's the same thing as people saying "Oh look that SPEC x264 is stupid, it doesn't use AVX-4096!!!!!!!1111111111".

igor_kavinski · Tuesday at 1:13 PM

Nothingness said:
Benchmarks such as Geekbench are proxies to existing application.

I was just conveying that using Geekbench as a predictor of IPC improvement will result in disappointment more often than not. It's interesting, yes but it's not the only thing I would base my judgement of a CPU on.

okoroezenwa · Tuesday at 1:14 PM

The campaign to discredit Geekbench is ongoing I see.

igor_kavinski · Tuesday at 1:17 PM

okoroezenwa said:
The campaign to discredit Geekbench is ongoing I see.

I think we should affectionately call it FruitBench

okoroezenwa · Tuesday at 1:21 PM

igor_kavinski said:
I think we should affectionately call it FruitBench

FruitBench (affectionate) vs AppleBench (derogatory). I can see it.

Doug S · Tuesday at 1:35 PM

igor_kavinski said:
For compression, LZ4/ZSTD: Mostly used in Linux?

Navigation: How many self learned developers who mostly learned by shooting their foot off and learning to the extent of what NOT to do to compile software successfully, know about Dijkstra's algorithm? I bet most non-CS background programmers think the problem is easy to solve and just use brute force to generate such routes.

HTML5 browser: Why doesn't GB just use a full open source browser for this workload instead of trying to mimic one?

PDF Render: Again, it's using only the PDF component in isolation and not simulating how a browser with other tabs open would generate a workload or how a browser would first launch, load its engine into RAM and then display the PDF.

Photo Library: Mimicry instead of using real downloadable software for such user tasks.

Clang: This is a more real world workload but is it testing compilation only or JIT compilation too? Not clear if it's doing both. Is it running some workload in Lua after compiling it?

Text processing: What kind of convoluted mess is this? Why is there an encrypted in-memory file system being used? Which real text processing software does this? And using Python that can never achieve true parallelism due to its GIL issue? How many normal non-developer folks use Python for multicore workloads in their daily lives?

Asset compression: This is a workload that is even more limited in scope as it targets a specific species of developer: the Game Developer, renowned for the ability to CRUNCH, CRUNCH, CRUNCH with minimum sleep and maximum caffeine. What is this workload going to tell the average user?

Object Detection: Why not use a real open source software for this purpose?

Background Blur: Not a bad workload but only 10 frames? How can those 10 frames capture the rapid head movement of a typical user on cam?

Object Remover: Again, why not use Gimp?

Horizon Detection: Why no Gimp?

Photo filter: Doesn't mention a library so maybe written from scratch?

HDR: This is also not using a library and written from scratch meaning it may not be reflective of HDR operations done in real applications.

Raytracer: No issues with this since it's using Embree.

Structure from Motion: No mention of library. Custom code.

And the remaining tests don't mention any libraries either.

This benchmark is mostly a mishmash of custom code with the occasional widely used library. How can it possibly reflect real world usage?

I pray that your much-awaited eekBench does not suffer from the above mentioned pitfalls.

Maybe you should read the rather detailed PDF John Poole provides about Geekbench before compiling this list, since a lot of it is irrelevant. i.e. he tells you exactly what the Clang benchmark does, and it does not include JIT compilation or running Lua, the object detection does use open source for that, etc.

Regarding HTML5, I don't think you've considered how useless a cross platform benchmark would be that actually RAN a browser. If it ran Firefox, for example, it is running a ton of OS specific code (and using the GPU) to display the GUI which would invalidate its numbers between Windows and Mac, between Android and iPhone, etc. It wouldn't be a "HTML5 test", it would be a test of how efficient the operating system's memory management was as the browser started up, how fast it could draw the UI, and any actual HTML5 interpretation would be completely drowned out.

The same applies for "why not use GIMP". GIMP probably supports an option to run without the UI, but that doesn't mean it isn't going to be doing a ton of various startup tasks just to get to the point where it can do "object remover" etc. All that startup is going to make the results much more dependent on the operating system and be a poor test of the CPU.

Hopefully eek understands the "pitfalls" of benchmarking far far better than you do! I imagine @SarahKerrigan could expand greatly on my list of objections to your list of "pitfalls".

whoshere · Tuesday at 1:57 PM

igor_kavinski said:
For compression, LZ4/ZSTD: Mostly used in Linux?

GB is closed source and proprietary for one most important reason: the company behind it doesn't want CPU OEMs to optimize specifically for its workloads and win unfairly and if GB starts using open source components that's exactly what's going to happen. That's why almost all popular benchmarks are closed source.

igor_kavinski · Tuesday at 2:05 PM

Doug S said:
I imagine @SarahKerrigan could expand greatly on my list of objections to your list of "pitfalls".

I look forward to her "objections"

igor_kavinski · Tuesday at 2:07 PM

whoshere said:
GB is closed source and proprietary for one most important reason: the company behind it doesn't want CPU OEMs to optimize specifically for its workloads and win unfairly and if GB starts using open source components that's exactly what's going to happen. That's why almost all popular benchmarks are closed source.

Then I guess every win on Phoronix Test Suite must be a cheat.

whoshere · Tuesday at 2:10 PM

igor_kavinski said:
Then I guess every win on Phoronix Test Suite must be a cheat.

No one outside of Phoronix cares about PTS. The vast majority of IT pros don't even know it exists. Michael has created it seemingly for himself.

igor_kavinski · Tuesday at 2:11 PM

whoshere said:
No one outside of Phoronix cares about PTS. The vast majority of IT pros don't even know it exists. Michael has created it seemingly for himself.

OUCH!

Nothingness · Tuesday at 3:27 PM

whoshere said:
GB is closed source and proprietary for one most important reason: the company behind it doesn't want CPU OEMs to optimize specifically for its workloads and win unfairly and if GB starts using open source components that's exactly what's going to happen. That's why almost all popular benchmarks are closed source.

You don't need sources to tune a CPU for a benchmark.

TwistedAndy · Wednesday at 1:24 AM

whoshere said:
GB is closed source and proprietary for one most important reason: the company behind it doesn't want CPU OEMs to optimize specifically for its workloads and win unfairly and if GB starts using open source components that's exactly what's going to happen. That's why almost all popular benchmarks are closed source.

We already have that with Apple SME support (+10% boost)

poke01 · Wednesday at 2:03 AM

TwistedAndy said:
We already have that with Apple SME support (+10% boost)

why do you exclude AVX2, AVX-512 and VINNI?

Also it’s SME is not only for Apple, Apple just made SME available first on their SoCs.
Qualcomm is free to add SME extension to thier 8 Gen 4.

TwistedAndy · Wednesday at 2:34 AM

poke01 said:
why do you exclude AVX2, AVX-512 and VINNI?

Support for AVX-512 and AVX-VNNI has a smaller impact on the final score, but it does not make the benchmark more objective.

Nothingness · Wednesday at 2:54 AM

TwistedAndy said:
Support for AVX-512 and AVX-VNNI has a smaller impact on the final score, but it does not make the benchmark more objective.

Proof?

igor_kavinski · Wednesday at 4:34 AM

Doug S said:
Regarding HTML5, I don't think you've considered how useless a cross platform benchmark would be that actually RAN a browser. If it ran Firefox, for example, it is running a ton of OS specific code (and using the GPU) to display the GUI which would invalidate its numbers between Windows and Mac, between Android and iPhone, etc. It wouldn't be a "HTML5 test", it would be a test of how efficient the operating system's memory management was as the browser started up, how fast it could draw the UI, and any actual HTML5 interpretation would be completely drowned out.

It's not like current GB scores are OS agnostic.

poke01 · Wednesday at 7:27 AM

But hey only Apple has additional help right....

igor_kavinski · Wednesday at 7:50 AM

poke01 said:
But hey only Apple has additional help right....

What we don't know is how easy it is to write good Apple SME code vs. good AVX-512 code. If AVX-512 is just harder to write for by its very nature, x86 would be at a disadvantage in this benchmark due to almost useless AVX-512 usage. I'm not even sure if the speed up offered by these enhanced image functions is worth engaging the AVX-512 units.

TRANSPARENCY is important and the GB developer doesn't reveal such important details.

Doug S · Wednesday at 3:41 PM

igor_kavinski said:
What we don't know is how easy it is to write good Apple SME code vs. good AVX-512 code. If AVX-512 is just harder to write for by its very nature, x86 would be at a disadvantage in this benchmark due to almost useless AVX-512 usage. I'm not even sure if the speed up offered by these enhanced image functions is worth engaging the AVX-512 units.

TRANSPARENCY is important and the GB developer doesn't reveal such important details.

The idea that a benchmark is biased against x86 is just laughably absurd. When provided proof that there are plenty of optimizations added for x86, now you're trying to hang your hat on an even more absurd argument that it is "hard to write good AVX-512 code". Nevermind that AVX-512 has been around for years and there are plenty of code examples on the net so you don't even have to write it yourself for common functions.

If it actually was true that writing SME code was much easier than writing AVX-512 code, wouldn't that be an indication of strength for ARM - that real world code would be more likely to realize those benefits?

Question Geekbench 6 released and calibrated against Core i7-12700

Lifer

Member

Platinum Member

Lifer

Member

Lifer

Platinum Member

Lifer

Member

Lifer

Member

Platinum Member

Junior Member

Lifer

Lifer

Junior Member

Lifer

Platinum Member

Member

Golden Member

Member

Platinum Member

Lifer

Golden Member

Attachments

Lifer

Platinum Member