Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 871 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lightmanek

Senior member
Feb 19, 2017
476
1,092
136
My Dolphin Bench 5.0 run looks abnormally fast compared to the run shown above. Can you run yours to verify? I am using a +50MHz -20 CO offset on my 3 fastest CCD 0 cores, -10 offset on all CCD 1 cores but that still seems like something isnt right. I ran twice with identical score each time.

I did download it and run, with browser opened and all other crap, I got 142s.

Will do a clean run shortly and report with screenshot.
 
Reactions: igor_kavinski

Josh128

Senior member
Oct 14, 2022
511
865
106
I did download it and run, with browser opened and all other crap, I got 142s.

Will do a clean run shortly and report with screenshot.
Thats within a percent or so with mine, thats good. What they did with a 9700X to achieve a 172s is beyond me. Even their top run, a 9600X, is 165s which is still ~10% slower than what we are seeing on our CPUs.
 

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,106
136
AMD hasn't admitted to much in public as for what they will use Zen 5 in, beyond the products they launched so far. However, in mid September, NIST's CSRC validated the EPYC 9005 Series ASP Cryptographic CoProcessor firmware/hardware, and random number generator. The description of the validations makes it seem as if EPYC 8005 and Threadripper 9000 aren't as unreal as the nonexistence of public roadmaps suggest.
CSRC said:
Currently tested algorithms are included in EPYC 9005 Series "Turin", EPYC 8005 Series "Sorano", EPYC Embedded 9005 Series "Turin", EPYC Embedded 8005 Series "Sorano", Ryzen "Shimada Peak". DRBG implemented in the AMD True Random Number Generator (TRNG).
https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/details?product=18466
https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/details?product=18470
 

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,106
136
Maybe dog slow RAM or 65W made such a difference?
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...?
 

lightmanek

Senior member
Feb 19, 2017
476
1,092
136
Here they are:

Normal run just after Windows logon:
Asus ROG Strix-E Gaming, PBO225W, FCLK 102, CO CCD0 -20, CO CCD1 -30 (not optimized as no time, but fully stable for work and games), Corsair XC7 CPU block, custom water loop with 1x480 + 1x360 set to silent (CPU + R7900XTX loop)


With Affinity set to CCD0 real cores only (every 2nd core):



Massive 3s quicker! WIN! 😅
 

lightmanek

Senior member
Feb 19, 2017
476
1,092
136
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...?

I'm learning about Dolphin, so yes, it looks to be ST only. Mainly dependent on core feq, it seems and a bit from memory timings.
 

Josh128

Senior member
Oct 14, 2022
511
865
106
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...
Dolphin definitely had a solid performance increase when they made it multithread capable. I think it uses 2 cores if if I recall correctly. If their poor performance in this bench were related to RAM, power plan, BIOS, etc. that should have shown in other single or lightly threaded benchmarks in the same review. To lose 10% vs what we've been posting seems to be an awful lot, as none of us are doing anything crazy with RAM or tuning.
 

lightmanek

Senior member
Feb 19, 2017
476
1,092
136
Dolphin definitely had a solid performance increase when they made it multithread capable. I think it uses 2 cores if if I recall correctly. If their poor performance in this bench were related to RAM, power plan, BIOS, etc. that should have shown in other single or lightly threaded benchmarks in the same review. To lose 10% vs what we've been posting seems to be an awful lot, as none of us are doing anything crazy with RAM or tuning.

10% in this bench is around 500MHz on the core clock!!

Either that, or Windows "Zen" update made such a difference?

PS. I like my 9950x more every day 😌
PS.2 Till 9950X3D launches ...
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
Here they are:

Normal run just after Windows logon:
Asus ROG Strix-E Gaming, PBO225W, FCLK 102, CO CCD0 -20, CO CCD1 -30 (not optimized as no time, but fully stable for work and games), Corsair XC7 CPU block, custom water loop with 1x480 + 1x360 set to silent (CPU + R7900XTX loop)
View attachment 110954

With Affinity set to CCD0 real cores only (every 2nd core):

View attachment 110956

Massive 3s quicker! WIN! 😅
why are none of these results here: https://docs.google.com/spreadsheet...dgA7ciL76mBs/edit?gid=485052351#gid=485052351

That is supposed to be all results.
 

deasd

Senior member
Dec 31, 2013
576
957
136
Maybe not solely related to Zen5 but still an interesting news, regarding AVX512 support on FFmpeg:


The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements — from three to 94 times faster — compared to standard implementations. AVX-512 enables processing large chunks of data in parallel using 512-bit registers, which can handle up to 16 single-precision FLOPS or 8 double-precision FLOPS in one operation. This optimization is ideal for compute-heavy tasks in general, but in the case of video and image processing in particular.

It's from 3 to 94 times faster on FFmpeg compare to baseline C. Not 94%, but 94 times.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,842
4,379
136
For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.
I was just to post this. That's just tomshardware being doing their british tabloid level articles.

I'm still salty they kept financing that thing and shut down Anandtech
 

deasd

Senior member
Dec 31, 2013
576
957
136
For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.
Sry, fixed that. I was just in a hurry leaving home and fooled by Tomshardware title.


From FFmpeg official data it's -3%/15%/59%/40% difference against AVX2


 
Last edited:

JustViewing

Senior member
Aug 17, 2022
225
408
106
Maybe not solely related to Zen5 but still an interesting news, regarding AVX512 support on FFmpeg:




It's from 3 to 94 times faster on FFmpeg compare to baseline C. Not 94%, but 94 times.
While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.
 
Reactions: Tlh97 and Joe NYC

MS_AT

Senior member
Jul 15, 2024
365
798
96
While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.
Sounds like an AI sales pitch Still asking current models to write that kind of code is usually a funny experience in looking for all the places it shoot itself in the foot.

After all the they are being trained mostly on those bloated apps...
 

Aeonsim

Junior Member
May 10, 2020
13
42
91
While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.
With a proper AGI possibly, but at the moment "AI" are just probability machines. Outputting the most likely code they've come across based on the prompt and there training set. There is no understanding of the code or it's purpose. Depending on what they're trained on they may produce slower code (because it is more common/likely in the training set) than what you original have. I doubt there is enough good assembly code in there training sets to allow anything approaching optimal outcomes.

I've played around a bit with getting them to try optimizing or writing SIMD algorithms for HPC, and they're pretty terrible at anything more than a one or two liner.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |