- Mar 3, 2017
- 1,749
- 6,614
- 136
Who wants those used electrons! UselessAMD's plan: Get enthusiasts to snap up 9800X3D at launch due to FOMO. Same enthusiasts then sell their 9800X3D for 9950X3D in a few months.
Nvidia used to do the same with x080 and x080 TI...AMD's plan: Get enthusiasts to snap up 9800X3D at launch due to FOMO. Same enthusiasts then sell their 9800X3D for 9950X3D in a few months.
My Dolphin Bench 5.0 run looks abnormally fast compared to the run shown above. Can you run yours to verify? I am using a +50MHz -20 CO offset on my 3 fastest CCD 0 cores, -10 offset on all CCD 1 cores but that still seems like something isnt right. I ran twice with identical score each time.
Thats within a percent or so with mine, thats good. What they did with a 9700X to achieve a 172s is beyond me. Even their top run, a 9600X, is 165s which is still ~10% slower than what we are seeing on our CPUs.I did download it and run, with browser opened and all other crap, I got 142s.
Will do a clean run shortly and report with screenshot.
Maybe dog slow RAM or 65W made such a difference?Thats within a percent or so with mine, thats good. What they did with a 9700X to achieve a 172s is beyond me. Even their top run, a 9600X, is 165s which is still ~10% slower than what we are seeing on our CPUs.
https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/details?product=18466CSRC said:Currently tested algorithms are included in EPYC 9005 Series "Turin", EPYC 8005 Series "Sorano", EPYC Embedded 9005 Series "Turin", EPYC Embedded 8005 Series "Sorano", Ryzen "Shimada Peak". DRBG implemented in the AMD True Random Number Generator (TRNG).
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...?Maybe dog slow RAM or 65W made such a difference?
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...?
Where is the link to see where all these Zen 5 ?? I want to see the master list from the source. And the source post where I can run this myself ?
Dolphin definitely had a solid performance increase when they made it multithread capable. I think it uses 2 cores if if I recall correctly. If their poor performance in this bench were related to RAM, power plan, BIOS, etc. that should have shown in other single or lightly threaded benchmarks in the same review. To lose 10% vs what we've been posting seems to be an awful lot, as none of us are doing anything crazy with RAM or tuning.Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...
Nah that is still coming out with SMT 4And we can’t forget about the potential 9950X3R2D2.
Dolphin definitely had a solid performance increase when they made it multithread capable. I think it uses 2 cores if if I recall correctly. If their poor performance in this bench were related to RAM, power plan, BIOS, etc. that should have shown in other single or lightly threaded benchmarks in the same review. To lose 10% vs what we've been posting seems to be an awful lot, as none of us are doing anything crazy with RAM or tuning.
why are none of these results here: https://docs.google.com/spreadsheet...dgA7ciL76mBs/edit?gid=485052351#gid=485052351Here they are:
Normal run just after Windows logon:
Asus ROG Strix-E Gaming, PBO225W, FCLK 102, CO CCD0 -20, CO CCD1 -30 (not optimized as no time, but fully stable for work and games), Corsair XC7 CPU block, custom water loop with 1x480 + 1x360 set to silent (CPU + R7900XTX loop)
View attachment 110954
With Affinity set to CCD0 real cores only (every 2nd core):
View attachment 110956
Massive 3s quicker! WIN! 😅
They are there. Go to the seconds column (C) and sort A-Z. First two are yours and my submissions.why are none of these results here: https://docs.google.com/spreadsheet...dgA7ciL76mBs/edit?gid=485052351#gid=485052351
That is supposed to be all results.
wow, a 12 core 9900x got second place ! cool !They are there. Go to the seconds column (C) and sort A-Z. First two are yours and my submissions.
View attachment 110963
The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements — from three to 94 times faster — compared to standard implementations. AVX-512 enables processing large chunks of data in parallel using 512-bit registers, which can handle up to 16 single-precision FLOPS or 8 double-precision FLOPS in one operation. This optimization is ideal for compute-heavy tasks in general, but in the case of video and image processing in particular.
For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.It's from 3 to 94 times faster on FFmpeg compared to SSE3/AVX2.
I was just to post this. That's just tomshardware being doing their british tabloid level articles.For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.
Sry, fixed that. I was just in a hurry leaving home and fooled by Tomshardware title.For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.
While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.Maybe not solely related to Zen5 but still an interesting news, regarding AVX512 support on FFmpeg:
FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code
AVX-512 can benefit the average Joe, it appears.www.tomshardware.com
It's from 3 to 94 times faster on FFmpeg compare to baseline C. Not 94%, but 94 times.
Sounds like an AI sales pitch Still asking current models to write that kind of code is usually a funny experience in looking for all the places it shoot itself in the foot.While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.
True, AI might feel it need to bloat the code more to make it less efficient if that is what they were trained onAfter all the they are being trained mostly on those bloated apps...
With a proper AGI possibly, but at the moment "AI" are just probability machines. Outputting the most likely code they've come across based on the prompt and there training set. There is no understanding of the code or it's purpose. Depending on what they're trained on they may produce slower code (because it is more common/likely in the training set) than what you original have. I doubt there is enough good assembly code in there training sets to allow anything approaching optimal outcomes.While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.