Discussion Rudi_Float_Bench v0.02a

Jul 27, 2020
23,075
16,243
146
Download here (only ~12 KB in size): https://drive.google.com/file/d/1l7PU3W0u82iJovpbmJ9FTnhGMHJoVJGw/view?usp=sharing

If it complains about missing vcruntime140 DLL something, install this: https://aka.ms/vs/17/release/vc_redist.x64.exe

EDIT: v0.02 with an additional AVX-512 specific binary (it will either crash or exit unexpectedly if the CPU is lacking the necessary ISA extensions): https://drive.google.com/file/d/12RuZsWdhNueu7th2HCuzblBA8CUGFu9u/view?usp=sharing

Thanks, MS_AT and Hail for identifying the compiler issue.

And now for some scores!















 
Last edited:
Reactions: lightmanek

MS_AT

Senior member
Jul 15, 2024
525
1,107
96
So what this benchmark is measuring, as for sure not what it claims, as those intrinsics are AVX512 specific, so no way it could run on anything but Xeon.
 
Jul 27, 2020
23,075
16,243
146
So what this benchmark is measuring, as for sure not what it claims, as those intrinsics are AVX512 specific, so no way it could run on anything but Xeon.
Visual Studio 2022 is apparently putting in an alternate AVX2 codepath. Seems they wisened up and did the right thing, for once.

The actual hot loop code:

 
Reactions: lightmanek

MS_AT

Senior member
Jul 15, 2024
525
1,107
96
Visual Studio 2022 is apparently putting in an alternate AVX2 codepath. Seems they wisened up and did the right thing, for once.

The actual hot loop code:

View attachment 119235
Have you verified avx512 ops are actually in the compiled binary? Because any sensible optimizing compiler should just discard them when higher optimization levels are used (O2, O3, might be even O1). And if you want to have a benchmark you should use at least O2 equivalent or higher.

Unfortunately godbolt is unusable on mobile so I cannot check.
 
Reactions: igor_kavinski
Jul 27, 2020
23,075
16,243
146
Well, what about your 64 core AMD Rome ??
Unfortunately, Ice Lake wins

Because:

The benchmark has some bug. It won't use more than 64 threads. I'll have to check the code tomorrow to see if there is a hard limit in there.

If Ice Lake is really using AVX-512, that might explain how 48 threads are able to beat 64 physical Zen 2 cores, though the Epyc puts up a fierce fight armed only with AVX2 and gets pretty close. Considering that the Ice Lake server cost $7000 and my Epyc cost not more than $1500, I would say the Epyc wins fair and square

However, if the compiler has "cheated" and replaced the AVX-512 code with AVX2 code, the only possible explanation for the Epyc's loss is higher frequency on the Ice Lake CPU (3.95 GHz vs. 2.91 GHz on the Epyc). But even then, with almost a GHz deficit, the Epyc comes really close (screenshots will be posted soon!).

Also waiting for MS_AT's confirmation on whether the binary contains AVX-512 instructions to further understand the Epyc's performance.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,755
15,789
136
Unfortunately, Ice Lake wins

Because:

The benchmark has some bug. It won't use more than 64 threads. I'll have to check the code tomorrow to see if there is a hard limit in there.

If Ice Lake is really using AVX-512, that might explain how 48 threads are able to beat 64 physical Zen 2 cores, though the Epyc puts up a fierce fight armed only with AVX2 and gets pretty close. Considering that the Ice Lake server cost $7000 and my Epyc cost not more than $1500, I would say the Epyc wins fair and square

However, if the compiler has "cheated" and replaced the AVX-512 code with AVX2 code, the only possible explanation for the Epyc's loss is higher frequency on the Ice Lake CPU (3.95 GHz vs. 2.91 GHz on the Epyc). But even then, with almost a GHz deficit, the Epyc comes really close (screenshots will be posted soon!).

Also waiting for MS_AT's confirmation on whether the binary contains AVX-512 instructions to further understand the Epyc's performance.
Rome has very limited avx-512 support. Zen 4 has quite a bit. Zen 5 kicks butt ! If I get time to load up my Turin, I will run it. Even at 2.3 ghz, it beats Zen 4s at 3.5 ghz in avx-512 stull.
 
Jul 27, 2020
23,075
16,243
146
9800X3D. Seems kinda low?
View attachment 119246
Can't say unless someone posts their 9800X3D score for comparison.

BUT, you may lose up to 9% score if you don't run it with admin rights.

7-max may help a bit too but doesn't always work.

It's a pure floating point bench. Only taxes the FPU units themselves.

The current champ Ice Lake Xeon (until more users test) does 42 Mops/s per thread. Yours is doing 55.25
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,678
2,874
136
Can't say unless someone posts their 9800X3D score for comparison.

BUT, you may lose up to 9% score if you don't run it with admin rights.

7-max may help a bit too but doesn't always work.

It's a pure floating point bench. Only taxes the FPU units themselves.

The current champ Ice Lake Xeon (until more users test) does 42 Mops/s per thread. Yours is doing 55.25
Disabling AVX512 made no difference to scores, which would seem to corroborate the assertion that the AVX512 instructions are getting compiled away.

Administrator did nothing for me.

7-Max didn't crash the application, at least. No change to scores either.
 
Jul 27, 2020
23,075
16,243
146
Disabling AVX512 made no difference to scores, which would seem to corroborate the assertion that the AVX512 instructions are getting compiled away.
Crap!

Back to the drawing board

I have to agree that the compiler stripped out the AVX-512 instructions. Guess I need to push out v0.02a with a fix.

May need to have a dedicated AVX-512 binary coz can't afford to spend time on doing proper CPU feature detection.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,678
2,874
136
Crap!

Back to the drawing board

I have to agree that the compiler stripped out the AVX-512 instructions. Guess I need to push out v0.02a with a fix.

May need to have a dedicated AVX-512 binary coz can't afford to spend time on doing proper CPU feature detection.
A benchmark that uses different instruction sets based on what the CPU has available is also not a very good benchmark IMO. Consistency and comparability right out the window

Edit: Before someone bites my head off, I mean a benchmark like this where the intention is to test how quickly the CPU can do a specific operation. If different CPU's are doing different operations, what are you even trying to compare then? Since no real work is being done, it's not measuring how fast different CPU's can accomplish a greater task.
 
Reactions: igor_kavinski
Jul 27, 2020
23,075
16,243
146
Edit: Before someone bites my head off, I mean a benchmark like this where the intention is to test how quickly the CPU can do a specific operation. If different CPU's are doing different operations, what are you even trying to compare then? Since no real work is being done, it's not measuring how fast different CPU's can accomplish a greater task.
You are right, in a way
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,755
15,789
136
This is a little better with SMT disabled

well, after damn MS did an update on last boot, it won't pase !!!

But its 2308 !

 
Reactions: igor_kavinski
Jul 27, 2020
23,075
16,243
146
A benchmark that uses different instruction sets based on what the CPU has available is also not a very good benchmark IMO. Consistency and comparability right out the window
Well, yeah I agree that for proper comparison, both CPUs should use the same ISA extensions so the score tells us which chip is designed better.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |