Question The AVX-512 thread

Jul 27, 2020
23,093
16,257
146
Here's your chance to spam a thread with all manner of AVX-512 related stuff, including

technical discussion, benchmarks,

software including libraries explicitly designed to take advantage of AVX-512 acceleration,

your personal experiences enjoying AVX-512 acceleration,

your future hardware/software wish list with AVX-512 support,

if you dropped the big bucks on Threadripper or server CPUs mainly for AVX-512 etc.


I'll start off with some links:

The origins of AVX-512: https://tomforsyth1000.github.io/papers/LRBNI origins v4 full fat.pdf

Benchmarking the performance and energy consumption of the AVX512 and VNNI instruction sets: https://addi.ehu.es/bitstream/handle/10810/58088/TFG_Jon_Arriaran.pdf?sequence=2



https://albertvilella.substack.com/p/intels-avx-512-use-cases-part1 (sadly requires subscription for full read)














 
Last edited:
Jul 27, 2020
23,093
16,257
146
+79% improvement????

Dude, you should try out Alan Wake 2 benchmark if AMD bios has option to enable/disable AVX-512. If these guys used AVX-512 in 3dmark, maybe they used it in their latest engine too?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,768
15,791
136
+79% improvement????

Dude, you should try out Alan Wake 2 benchmark if AMD bios has option to enable/disable AVX-512. If these guys used AVX-512 in 3dmark, maybe they used it in their latest engine too?
If you have an easy benchmark to run that uses a few cores and needs bandwidth, I can fir up my Turin 64 core. Its not as fast as a 9950x (2.3 ghz fully loaded) but 12 memory channels work really good if you need the bandwidth.
 
Jul 27, 2020
23,093
16,257
146
If you have an easy benchmark to run that uses a few cores and needs bandwidth, I can fir up my Turin 64 core. Its not as fast as a 9950x (2.3 ghz fully loaded) but 12 memory channels work really good if you need the bandwidth.
Maybe once I figure out how to run x265 encoder with and without AVX-512. I can then put it in the PES Handbrake benchmark package for easy testing.
 
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,768
15,791
136
Jul 27, 2020
23,093
16,257
146
DC tasks aren't easy to benchmark due to the variable work units they push to different clients. And you would have to disable AVX-512 in the BIOS if that is even possible if you want to see, suppose, whether the CPU crunches through more work units in 24 hours with AVX-512 than without.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,768
15,791
136
DC tasks aren't easy to benchmark due to the variable work units they push to different clients. And you would have to disable AVX-512 in the BIOS if that is even possible if you want to see, suppose, whether the CPU crunches through more work units in 24 hours with AVX-512 than without.
Well, disabling avx-512 is a possibility, but with these tasks, you will see upwords of 40% over Zen 4 and up to 80% over on Zen 5 vs NO avx-512.

It was just an idea that puts it in the best light.
 

Jan Olšan

Senior member
Jan 12, 2017
506
981
136
Maybe once I figure out how to run x265 encoder with and without AVX-512. I can then put it in the PES Handbrake benchmark package for easy testing.
‑‑asm avx512 from commandline, for handbrake see picture in the article

 
Jul 27, 2020
23,093
16,257
146
‑‑asm avx512 from commandline, for handbrake see picture in the article
Unfortunately, it doesn't work for x265 which is what people are more interested in. I spent considerable time trying different variations of that command parameter with HandbrakeCLI but couldn't get the x265 encoder to choose anything other than AVX2. It seems to work only in the GUI which I can't automate in a benchmarking script.
 
Jul 27, 2020
23,093
16,257
146
A fascinating article
Definitely! Reading that makes me think that one of the reasons for disabling AVX-512 on Alder Lake may have been that they knew they were going to try for frequencies north of 5.5 GHz with Raptor Lake and AVX-512 throttling could have led to massive frequency dips of 1 GHz or more. I wish someone kind and wealthy (hard to find but they exist!) would donate more recent hardware to the Chips and Cheese guys like the W7-3545X. Reading about 10900X in 2025 is kinda sad.
 

yuri69

Senior member
Jul 16, 2013
623
1,083
136
That C&C article is very in-depth but... it repeatedly compares against Intel's first gen AVX-512 - Skylake-X. That means a 2024 core vs a 2017 core which is rather unfortunate given there are Cooper Lake, Ice Lake, Sapphire Rapids, Emerald Rapids, and Granite Rapids.
 
Reactions: igor_kavinski

MS_AT

Senior member
Jul 15, 2024
526
1,110
96
Definitely! Reading that makes me think that one of the reasons for disabling AVX-512 on Alder Lake may have been that they knew they were going to try for frequencies north of 5.5 GHz with Raptor Lake and AVX-512 throttling could have led to massive frequency dips of 1 GHz or more. I wish someone kind and wealthy (hard to find but they exist!) would donate more recent hardware to the Chips and Cheese guys like the W7-3545X. Reading about 10900X in 2025 is kinda sad.
Intel solved Skylake problems at least partially with IceLake. I would expect that Raptor Lake would do no worse than Zen5. https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
 
Reactions: Tlh97 and NTMBK

MS_AT

Senior member
Jul 15, 2024
526
1,110
96
Xeons had double FMA units and clients were cut to 1, where as Zen 5 is full featured - that could make a big difference to code that uses those.
I meant when it comes to maintaining the frequency. Not performance The biggest problem of Skylake implementation was that it was not worth it to sprinkle AVX512 into the code, now it is not a problem with Zen4/5 and should not be with newer Xeons since client CPUs from Intel lack avx512.

And about Xeons:
Across all of the benchmarks run with AVX-512 on/off, when AVX-512 was enabled for the Xeon Platinum 8380 2P "Ice Lake" server and running these AVX-512 heavy workloads it saw a ~175MHz drop on average to the peak CPU frequency. Now with Sapphire Rapids this is no longer the case but the peak CPU frequency was similar with/without AVX-512. As shown previously, AMD EPYC Genoa also doesn't experience the AVX-512 downclocking. With Ice Lake it wasn't too bad to begin with unlike the AVX-512 Skylake days.
after https://www.phoronix.com/review/intel-sapphirerapids-avx512/8, of course it's not the level that C&C go into, but it supports the case that Intel has improved the behaviour since Skylake-X.
 

Jan Olšan

Senior member
Jan 12, 2017
506
981
136
Unfortunately, it doesn't work for x265 which is what people are more interested in. I spent considerable time trying different variations of that command parameter with HandbrakeCLI but couldn't get the x265 encoder to choose anything other than AVX2. It seems to work only in the GUI which I can't automate in a benchmarking script.
It absolutely should work with x265 commandline but you are not using that.
If you are using a different frontend, be it ffmpeg or handbrake cli, you need to look up the cli option to pass the encoder extra parameters.
Based on this I think you need --encopts asm=avx512

If the parameter is already in your commandline, add the ASM thing to the end, separated by colons

Code:
-x, --encopts <string>  Specify advanced encoding options in the same
                           style as mencoder (all encoders except theora):
                           option1=value1:option2=value2

Also, Handbrake doesn't let you copy the resulting commandline of the encoding job (or show it in the logs)?
I fondly remember MeGUI for that. It casually taught you to really use the commandline encoder without fluff, by basically constructing the command for you (and having the various encoder options really well commented...). It also worked by calling the original encoder executables, not its own shim binary, so problems like this were avoided.
 
Last edited:
Jul 27, 2020
23,093
16,257
146
It's funny that AVX-512's need arose due to a software renderer: https://www.gamedeveloper.com/game-platforms/rad-launches-pixomatic----new-software-renderer




The 9950X's full width AVX-512 implementation is almost the 20 year anniversary of Pixomatic!

One wonders if Intel hadn't messed up AVX-512's rollout to consumer CPUs, we could have had a modern software renderer for games written in AVX-512, giving us the chance to enjoy Cyberpunk 2077 with Minecraft like pixelated graphics

Maybe some day average Joe's 128 thread consumer CPU WILL run games in software mode at acceptable speed and graphics quality and we won't have to bother with a mandatory fat GPU anymore.
 
Last edited:

lamedude

Golden Member
Jan 14, 2011
1,209
15
81
Nearly five years ago Mike Sartain and I had just put the wraps on our x86 software renderer, Pixomatic. We had done everything we could think of to
speed it up, and while it had certainly gotten a lot faster, it was still so much slower than hardware that we knew we could never close the gap. As we were
setting up in the RAD Game Tools booth at Game Developers Conference one morning, I said to Mike: "Man, if only Intel had a lerp [linear
interpolation] instruction!"

Mike pointed across the aisle at the Intel booth. "Maybe you should ask for one."

The odds seemed long, to say the least, but I didn't have any better ideas, so I went over and talked with Dean Macri, our developer rep. That resulted in a
couple of maverick Intel architects, Doug Carmean and Eric Sprangle, coming over to chat with us later; and somehow, over the course of five years, that
simple question led to a team at RAD -- which grew to include Tom Forsyth and Atman Binstock -- working with Intel to help design an instruction set
extension and write a software graphics pipeline for it.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |