Question Geekbench 6 released and calibrated against Core i7-12700

igor_kavinski · Feb 14, 2023

Geekbench Blog

www.geekbench.com

Weird choice of baseline CPU and even weird is that the baseline score is 2500.

i7-12700 does hardly 2000 in GB5 with the fastest DDR5.

The Hardcard · Jun 5, 2024

TwistedAndy said:
According to Geekbench, 16-core AMD Ryzen 9 7950X is 32% faster in the multi-core test than AMD EPYC 9754 with 128 cores:

View attachment 100597

Again. 16 cores are faster than 128 cores using the same architecture. Does it match the reality?

Has anyone seen the Task Manager during a Geekbench run? Are all the 256 boxes pegged?

igor_kavinski · Jun 5, 2024

ST: riding a bicycle. Even with unlimited endurance, only so fast a bicycle can go.

GB5-style MT: driving a race car. More cylinders more horsepower.

GB6-style MT: driving a race car that is prone to getting stuck in first gear.

Hitman928 · Jun 5, 2024

FlameTail said:
Let's say hypothetically Primate Labs adds back the GB5 style MT test to GB6...

Then there are 2 MT tests in GB6. How will they be named?

"Professional Multi-Score" and "Consumer Multi-Score"?

Or maybe:

"Embarassingly Parallel Multi Score' and "Dog's Multi Core"

When GB6 was first released I suggested they rework it to have 2 different multi-core scores (but they'd probably have to add some tests as well as only 1 actually scales). It could easily be something like single thread, client multithread, professoinal workstation (or enterprise) multithread tests. Then the typical end user knows both in single thread and client multithread their performance for typical client workloads, and if they don't score high on the workstation section, it's not a big deal as they didn't buy a professional workstation.

Nothingness · Jun 5, 2024

gdansk said:
I agree but there's no harm in telling everyone that comparing GB composite MT score is very limited.

This also applies to GB ST composite score though to a lesser extent

roger_k · Jun 6, 2024

TwistedAndy said:
According to Geekbench, 16-core AMD Ryzen 9 7950X is 32% faster in the multi-core test than AMD EPYC 9754 with 128 cores:

Again. 16 cores are faster than 128 cores using the same architecture. Does it match the reality?

Of course this is a nonsensical result. It is equally as nonsensical to cherry-pick a known (and well understood!) limitation of a test and conclude that all results must be void as a consequence. Benchmarks are not some mystical things, they are well-studied, and mostly understood software. Any results produced by the benchmark needs to be framed in the context of this understanding.

coercitiv · Jun 6, 2024

FlameTail said:
"Professional Multi-Score" and "Consumer Multi-Score"?

"How many cores do I want" and "How many cores do I need", respectively.

On a more serious note, one could call them : ST Score, MT Score, Throughput Score. The third score would still not be useful for professionals by itself, because a professional would know even better than a consumer to test for particular workloads instead of using aggregate ratings. It would also have to rely heavily on workloads that scale well, and not even all professional workloads do this. However, it could help people gauge the robustness of the product, understand how it scales.

TwistedAndy · Jun 6, 2024

roger_k said:
Of course this is a nonsensical result. It is equally as nonsensical to cherry-pick a known (and well understood!) limitation of a test and conclude that all results must be void as a consequence. Benchmarks are not some mystical things, they are well-studied, and mostly understood software. Any results produced by the benchmark needs to be framed in the context of this understanding.

It's not a limitation; it's a design issue.

When you look at the Multi-Core score, you expect that it represents the multi-core performance. In the case of Geekbench, it does not. It measures "something", but it's not the actual performance levels.

The second big problem is the lack of consistency across platforms. For example, the SME extension is used in three tests and supported only on Apple M4, AVX-VNNI is used only in one test, AVX-512 is also used in one test, etc., which makes the comparison extremely platform- and app-dependent. There is another question on how actively those extensions are used and how much do they affect the end score.

Another problem is the short duration of the test, which makes it way less consistent and less representative.

As a result, Geekbench can be used for comparison only for very limited cases, like comparing the ST performance of CPUs from the same generation.

Using it to compare different platforms is a joke.

Nothingness · Jun 6, 2024

TwistedAndy said:
It's not a limitation; it's a design issue.

When you look at the Multi-Core score, you expect that it represents the multi-core performance. In the case of Geekbench, it does not. It measures "something", but it's not the actual performance levels.

The second big problem is the lack of consistency across platforms. For example, the SME extension is used in three tests and supported only on Apple M4, AVX-VNNI is used only in one test, AVX-512 is also used in one test, etc., which makes the comparison extremely platform- and app-dependent. There is another question on how actively those extensions are used and how much do they affect the end score.

Another problem is the short duration of the test, which makes it way less consistent and less representative.

As a result, Geekbench can be used for comparison only for very limited cases, like comparing the ST performance of CPUs from the same generation.

Using it to compare different platforms is a joke.

You, as many others, get this feeling because you are misusing Geekbench. And because PrimateLabs along with media make you think the global score is what one should look at.

If you want to know if your game will run fast on a GPU, do you look at some average or geomean that some reviews report?

TwistedAndy · Jun 6, 2024

Nothingness said:
You, as many others, get this feeling because you are misusing Geekbench. And because PrimateLabs along with media make you think the global score is what one should look at.

I'm not using Geekbench 6

Nothingness · Jun 6, 2024

TwistedAndy said:
I'm not using Geekbench 6

So are you just here to argue without first hand knowledge?

TwistedAndy · Jun 6, 2024

Nothingness said:
So are you just here to argue without first hand knowledge?

Because I can read the docs and have some CS background to do that properly

Nothingness · Jun 6, 2024

TwistedAndy said:
Because I can read the docs and have some CS background to do that properly

But you obviously lack the background to properly make use of a benchmark. Which doesn't prevent you from making silly claims.

And claiming that CB23 was your benchmark of choice after trying to explain why GB6 was not good for cross-platform comparisons, well...

TwistedAndy · Jun 6, 2024

Nothingness said:
But you obviously lack the background to properly make use of a benchmark. Which doesn't prevent you from making silly claims.

Could you provide an example of those "silly claims"?

Nothingness said:
And claiming that CB23 was your benchmark of choice after trying to explain why GB6 was not good for cross-platform comparisons, well...

Unlike Geekbench, it shows that the processor with 128 cores is much faster than the one with 16 cores in the multi-core test

Additionally, it does not use SME, AVX-512, and other extensions that are missing on some platforms. Cinebench R23 uses AVX2 for x86 and NEON for ARM, which are widely supported. So, technically, this benchmark is correct.

There are some discussions regarding its use of the Intel Embree library, but it matches the situation in real life. A lot of libraries were designed by Intel. And a lot of them will not be re-optimized for ARM anytime soon.

For example, Apple, as a company, is not interested in optimizing PHP, Python, Ruby, Redis, MySQL, Docker, and a bunch of other software that is used for development and on production servers. Apple does not produce server equipment.

That's why I prefer Cinebench R23 as a quick way to get the overall performance numbers. Usually, they correlate with more specific benchmarks at https://openbenchmarking.org/.

poke01 · Jun 6, 2024

TwistedAndy said:
Cinebench R23 uses AVX2 for x86 and NEON for ARM, which are widely supported. So, technically, this benchmark is correct.

It’s not correct at all. Cinebench R23 does use not NEON properly at all and all ARM CPUs are at a major disadvantage when you compare x86 CPUs and ARM together.

There is no worse benchmark than Cinebench R23 for ARM CPUs. It doesn’t even fully show case the performance of an ARM CPU.

TwistedAndy said:
For example, Apple, as a company, is not interested in optimizing PHP, Python, Ruby, Redis, MySQL, Docker, and a bunch of other software that is used for development and on production servers. Apple does not produce server equipment.

That's why I prefer Cinebench R23 as a quick way to get the overall performance numbers. Usually, they correlate with more specific benchmarks at https://openbenchmarking.org/.

TwistedAndy said:
There are some discussions regarding its use of the Intel Embree library, but it matches the situation in real life. A lot of libraries were designed by Intel. And a lot of them will not be re-optimized for ARM anytime soon.

So how is this fair then? Cinebench R23 even favours Intel CPUs over AMDs as well. Not a great benchmark. Thank you for confirming your bias. No, R23 does not reflect real world performance. The 14900K is beaten by the 7950X in Blender and it’s the opposite in Cinebench.

TwistedAndy said:
For example, Apple, as a company, is not interested in optimizing PHP, Python, Ruby, Redis, MySQL, Docker, and a bunch of other software that is used for development and on production servers. Apple does not produce server equipment.

That's why I prefer Cinebench R23 as a quick way to get the overall performance numbers. Usually, they correlate with more specific benchmarks at https://openbenchmarking.org/.

Wow, you came to this conclusion but that is not an objective site. I would say the worst benchmarks for measuring ARM performance. Not even AMD uses benchmarks form that website.

TwistedAndy · Jun 6, 2024

poke01 said:
It’s not correct at all. Cinebench R23 does use not NEON properly at all and all ARM CPUs are at a major disadvantage when you compare x86 CPUs and ARM together.

But what makes you think that Geekbench, for example, is using AVX2 "properly"?

poke01 said:
There is no worse benchmark than Cinebench R23 for ARM CPUs. It doesn’t even fully show case the performance of an ARM CPU.

And we return to the initial question: why do we use benchmarks?

The fact that some benchmarks are optimized for Apple or Intel does not change the performance of real apps, which may not have all those optimizations in place. For example, the Intel Xe GPU shows great results in many benchmarks, but in real games, the situation is not always that great.

That's completely fine if you work in Geekbench or play games in 3DMark, but I believe many of us don't.

Let me give you an example. I'm a full-time software developer doing some back- and front-end stuff. I care about the performance in PHP, Node, SQL, and some other additional things, like Java-based IDE and services like Redis, Memcached, etc.

It's a very specific use case. And in this case, the backend runs 1.5 to 2 times faster on 12900HK compared to M3 Pro. The Node tooling is nearly on par. IDE and additional services are slightly faster on Intel (except Redis, where Intel is noticeably faster).

If we compare GB6 results for 12900HK and M3 Pro, the latter one is much faster (3100 vs 2500). But it does match the actual situation.

If I take Cinebench R23, it shows a similar performance, which is way closer to reality than Geekbench.

Yes, maybe, someday, some of those apps will get support for new ISA extensions and optimizations, but before that time, we have the performance that we have.

FlameTail · Jun 6, 2024

There are 2 purposes of benchmarks:

1. To gauge how a given CPU performs in your workloads.

2. To gauge the objective performance of the given CPU.

As a tech enthusiast, I am more interested in the latter.

TwistedAndy · Jun 6, 2024

FlameTail said:
There are 2 purposes of benchmarks:

1. To gauge how a given CPU performs in your workloads.

2. To gauge the objective performance of the given CPU.

As a tech enthusiast, I am more interested in the latter.

Now, we use Geekbench, Cinebench, and all other similar benches mostly to speculate about the theoretic performance gains and IPC

Also, PR departments use them to sell us some new products and manipulate the numbers. It's very easy to cherry-pick some benchmarks and make bold claims.

In reality, you'll end up with the situation I described

poke01 · Jun 6, 2024

TwistedAndy said:
Let me give you an example. I'm a full-time software developer doing some back- and front-end stuff. I care about the performance in PHP, Node, SQL, and some other additional things, like Java-based IDE and services like Redis, Memcached, etc.

It's a very specific use case. And in this case, the backend runs 1.5 to 2 times faster on 12900HK compared to M3 Pro. The Node tooling is nearly on par. IDE and additional services are slightly faster on Intel (except Redis).

If we compare GB6 results for 12900HK and M3 Pro, the latter one is much faster (3100 vs 2500). But it does match the actual situation.

That’s why Geekbench is part of the equation, not all.

The 12900HK uses up to 115 watts and has more cores and threads than an M3 Pro.

Honestly, that’s pretty good if the M3 Pro is only 2x slower because thr M3 Pro is up to 2.5x more efficient and has less clocks and threads.

TwistedAndy · Jun 6, 2024

poke01 said:
That’s why Geekbench is part of the equation, not all.

The 12900HK uses up to 115 watts and has more cores and threads than an M3 Pro.

Honestly, that’s pretty good if the M3 Pro is only 2x slower because thr M3 Pro is up to 2.5x more efficient and has less clocks and threads.

The situation with power consumption is much more complex than "12900HK uses up to 115W".

The peak power consumption is defined by PL1 and PL2 limits, which can be easily changed. Usually, the sustained power limit (PL1) is between 35 and 54W. It is dynamically changed by Intel DTT and IPF.

During regular work with a bunch of stuff running, the power package for the SoC is hovering at nearly 10-20W. When I launch some batch processing scripts with PHP, Redis, MySQL, and other stuff, the package power jumps to 20-30W.

That's more than 10-20W total package power on M3 Pro, but it does not matter in my case. The fans are not audible in both cases anyway.

If we take the overall developer experience, it depends on what you're doing. Now, Windows offers a better developer experience in general, but there are cases when you have to use macOS (iOS development, mostly).

naukkis · Jun 6, 2024

TwistedAndy said:
According to Geekbench, 16-core AMD Ryzen 9 7950X is 32% faster in the multi-core test than AMD EPYC 9754 with 128 cores:

View attachment 100597

Again. 16 cores are faster than 128 cores using the same architecture. Does it match the reality?

Absolutely. For most workloads - even those multithreaded are single thread limited in execution by amdahl law. So for normal even multithreaded desktop use cases 7950x is faster option than 9754 because it's per thread performance is better. That's also reason AMD and Intel doesn't even try to sell consumers 128 core cpus - those would not be best performing parts anyway.

And as most typical workloads today are multithreaded - that MT score that GB6 measures is pretty much best performance comparison man can do when comparing cpu performance. Single thread score is important but today it's even more important to have cpu design which can scale well on executing few threads - and to gain that performance is most difficult for cpu designer. There's no cheap tricks as chiplets to excel in that comparison.

xiewe3wq · Jun 6, 2024

TwistedAndy said:
The situation with power consumption is much more complex than "12900HK uses up to 115W".

The peak power consumption is defined by PL1 and PL2 limits, which can be easily changed. Usually, the sustained power limit (PL1) is between 35 and 54W. It is dynamically changed by Intel DTT and IPF.

During regular work with a bunch of stuff running, the power package for the SoC is hovering at nearly 10-20W. When I launch some batch processing scripts with PHP, Redis, MySQL, and other stuff, the package power jumps to 20-30W.

That's more than 10-20W total package power on M3 Pro, but it does not matter in my case. The fans are not audible in both cases anyway.

If we take the overall developer experience, it depends on what you're doing. Now, Windows offers a better developer experience in general, but there are cases when you have to use macOS (iOS development, mostly).

As a C++ developer myself, Apple Silicon is an absolute joy to use. M3 often compiles code faster than Intel CPUs, see for example this comparison.

CPU Compile Speed Benchmarks

Thanks to all for the bench results! I will manage this overview results table… Desktop Ryzens 7600/7700/7900 WANTED - do you have one? CPU OS LLVM Commit User Time CPU Mark Max TDP Apple M3 Pro OSX Sonoma 18 2b0b506 jgaskins 2m25s 24270/4714 30W? Apple M1 Max OSX Sonoma 17 2b0b506...

forum.crystal-lang.org

Apple M3 Pro - 2m:25s
Intel i9-14900K - 3m:20s

This can also be seen by the Geekbench 6 clang subtests results, where the 12900HK is 25% slower than the M3 Pro.

WelshBloke · Jun 6, 2024

The Hardcard said:
Has anyone seen the Task Manager during a Geekbench run? Are all the 256 boxes pegged?

If it's the same as earlier versions it's super bursty. It never used to peg the CPU usage even on 8 core cpus.

poke01 · Jun 6, 2024

TwistedAndy said:
The peak power consumption is defined by PL1 and PL2 limits, which can be easily changed. Usually, the sustained power limit (PL1) is between 35 and 54W. It is dynamically changed by Intel DTT and IPF.

During regular work with a bunch of stuff running, the power package for the SoC is hovering at nearly 10-20W. When I launch some batch processing scripts with PHP, Redis, MySQL, and other stuff, the package power jumps to 20-30W.

I have a hard time believing this as the default PL1 for this CPU is 75 watts in balanced mode. It can also go much higher when turbo boost is activated too.

TwistedAndy · Jun 6, 2024

naukkis said:
Absolutely. For most workloads - even those multithreaded are single thread limited in execution by amdahl law. So for normal even multithreaded desktop use cases 7950x is faster option than 9754 because it's per thread performance is better. That's also reason AMD and Intel doesn't even try to sell consumers 128 core cpus - those would not be best performing parts anyway.

And as most typical workloads today are multithreaded - that MT score that GB6 measures is pretty much best performance comparison man can do when comparing cpu performance. Single thread score is important but today it's even more important to have cpu design which can scale well on executing few threads - and to gain that performance is most difficult for cpu designer. There's no cheap tricks as chiplets to excel in that comparison.

From my experience, there are cases when you need to have as many cores as possible:

1. Running tests in parallel. This type of work scales pretty well with the number of cores, especially on huge projects.
2. Code compilation. The more cores you have, the better, especially on the huge projects. I don't write in C/C++/Flutter and don't see that frequently. But having 2x more performance with 4x more cores is a good result here.
3. Complex projects with a lot of microservices, docker containers, and other stuff. The performance scales pretty well here as well because each container or worker can utilize separate cores.
4. Project indexing. It's not a big problem for small projects, but once they become bigger, it may take more time. It's also scales pretty well, especially on fast SSDs.
5. Build and release tasks. It's related mostly to JS/TS projects built with Webpack, Vite, or other building tools. And they can benefit from multiple cores.
6. Also, there are server-related cases when you have dozens of workers, services, and other stuff, but it's pretty rare to have the production setup running on the developer machine.

In the case of running tests or project indexing, the performance scales similarly to Cinebench R23. It's a good representation of the best-case scenario. Geekbench 6 is a terrible metric here.

Usually, the more time-consuming the task, the higher the chance that it can scale well.

For example, one big company specializing in creating some Java products has a separate 2P server to run tests during the weekend

poke01 said:
I have a hard time believing this as the default PL1 for this CPU is 75 watts in balanced mode. It can also go much higher when turbo boost is activated too.

The PL1/PL2 levels depend on the laptop vendor and can be easily changed. In the case of Intel 12900HK, the 45W PL1 is absolutely fine. I was playing with 55-60W, but there's only a small performance improvement. So it's not worth it.

In the case of Dell XPS 17, for example, it is 45W on average, which can be dynamically adjusted between 35 and 55W. PL2 is set to 115W, but the laptop never reaches it. Usually, it's nearly 70-80W for 28 seconds.

Doug S · Jun 6, 2024

Why are you guys arguing with him? He clearly wants a benchmark to tell him what he wants to hear: that Intel CPUs are the best. Any benchmarks that show ARM or AMD CPUs beating Intel are biased. The ones that show Intel winning are legit. You won't convince someone like that with logic, it is a waste of time.

Question Geekbench 6 released and calibrated against Core i7-12700

Lifer

Member

Lifer

Diamond Member

Platinum Member

Member

Diamond Member

Member

Platinum Member

Member

Platinum Member

Member

Platinum Member

Member

Golden Member

Member

Diamond Member

Member

Golden Member

Member

Senior member

Junior Member

Lifer

Golden Member

Member

Platinum Member