Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

OneEng2 · Nov 4, 2024

igor_kavinski said:
AMD's plan: Get enthusiasts to snap up 9800X3D at launch due to FOMO. Same enthusiasts then sell their 9800X3D for 9950X3D in a few months.

Who wants those used electrons! Useless

linkgoron · Nov 4, 2024

igor_kavinski said:
AMD's plan: Get enthusiasts to snap up 9800X3D at launch due to FOMO. Same enthusiasts then sell their 9800X3D for 9950X3D in a few months.

Nvidia used to do the same with x080 and x080 TI...

lightmanek · Nov 4, 2024

Josh128 said:
My Dolphin Bench 5.0 run looks abnormally fast compared to the run shown above. Can you run yours to verify? I am using a +50MHz -20 CO offset on my 3 fastest CCD 0 cores, -10 offset on all CCD 1 cores but that still seems like something isnt right. I ran twice with identical score each time.

I did download it and run, with browser opened and all other crap, I got 142s.

Will do a clean run shortly and report with screenshot.

Josh128 · Nov 4, 2024

lightmanek said:
I did download it and run, with browser opened and all other crap, I got 142s.

Will do a clean run shortly and report with screenshot.

Thats within a percent or so with mine, thats good. What they did with a 9700X to achieve a 172s is beyond me. Even their top run, a 9600X, is 165s which is still ~10% slower than what we are seeing on our CPUs.

lightmanek · Nov 4, 2024

Josh128 said:
Thats within a percent or so with mine, thats good. What they did with a 9700X to achieve a 172s is beyond me. Even their top run, a 9600X, is 165s which is still ~10% slower than what we are seeing on our CPUs.

Maybe dog slow RAM or 65W made such a difference?

StefanR5R · Nov 4, 2024

AMD hasn't admitted to much in public as for what they will use Zen 5 in, beyond the products they launched so far. However, in mid September, NIST's CSRC validated the EPYC 9005 Series ASP Cryptographic CoProcessor firmware/hardware, and random number generator. The description of the validations makes it seem as if EPYC 8005 and Threadripper 9000 aren't as unreal as the nonexistence of public roadmaps suggest.

CSRC said:
Currently tested algorithms are included in EPYC 9005 Series "Turin", EPYC 8005 Series "Sorano", EPYC Embedded 9005 Series "Turin", EPYC Embedded 8005 Series "Sorano", Ryzen "Shimada Peak". DRBG implemented in the AMD True Random Number Generator (TRNG).

https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/details?product=18466
https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/details?product=18470

StefanR5R · Nov 4, 2024

lightmanek said:
Maybe dog slow RAM or 65W made such a difference?

Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...?

lightmanek · Nov 4, 2024

Here they are:

Normal run just after Windows logon:
Asus ROG Strix-E Gaming, PBO225W, FCLK 102, CO CCD0 -20, CO CCD1 -30 (not optimized as no time, but fully stable for work and games), Corsair XC7 CPU block, custom water loop with 1x480 + 1x360 set to silent (CPU + R7900XTX loop)

With Affinity set to CCD0 real cores only (every 2nd core):

Massive 3s quicker! WIN! 😅

lightmanek · Nov 4, 2024

StefanR5R said:
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...?

I'm learning about Dolphin, so yes, it looks to be ST only. Mainly dependent on core feq, it seems and a bit from memory timings.

lightmanek · Nov 4, 2024

Markfw said:
Where is the link to see where all these Zen 5 ?? I want to see the master list from the source. And the source post where I can run this myself ?

I stole it from Intel Meteor Arrow thread - http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...akes-discussion-threads.2606448/post-41329800

Josh128 · Nov 4, 2024

StefanR5R said:
Isn't it a mostly single-threaded load? Also, while the overall duration of the benchmark is OKish, the individual parts of it are very short. — So, maybe their lower results come down to RAM latency, power plan, more conservative boosting related BIOS settings, the way how they launch the benchmark (e.g. in a scripted environment)...

Dolphin definitely had a solid performance increase when they made it multithread capable. I think it uses 2 cores if if I recall correctly. If their poor performance in this bench were related to RAM, power plan, BIOS, etc. that should have shown in other single or lightly threaded benchmarks in the same review. To lose 10% vs what we've been posting seems to be an awful lot, as none of us are doing anything crazy with RAM or tuning.

Makaveli · Nov 4, 2024

jdubs03 said:
And we can’t forget about the potential 9950X3R2D2.

Nah that is still coming out with SMT 4

lightmanek · Nov 4, 2024

Josh128 said:
Dolphin definitely had a solid performance increase when they made it multithread capable. I think it uses 2 cores if if I recall correctly. If their poor performance in this bench were related to RAM, power plan, BIOS, etc. that should have shown in other single or lightly threaded benchmarks in the same review. To lose 10% vs what we've been posting seems to be an awful lot, as none of us are doing anything crazy with RAM or tuning.

10% in this bench is around 500MHz on the core clock!!

Either that, or Windows "Zen" update made such a difference?

PS. I like my 9950x more every day 😌
PS.2 Till 9950X3D launches ...

Markfw · Nov 4, 2024

lightmanek said:
Here they are:

Normal run just after Windows logon:
Asus ROG Strix-E Gaming, PBO225W, FCLK 102, CO CCD0 -20, CO CCD1 -30 (not optimized as no time, but fully stable for work and games), Corsair XC7 CPU block, custom water loop with 1x480 + 1x360 set to silent (CPU + R7900XTX loop)
View attachment 110954

With Affinity set to CCD0 real cores only (every 2nd core):

View attachment 110956

Massive 3s quicker! WIN! 😅

why are none of these results here: https://docs.google.com/spreadsheet...dgA7ciL76mBs/edit?gid=485052351#gid=485052351

That is supposed to be all results.

Josh128 · Nov 4, 2024

Markfw said:
why are none of these results here: https://docs.google.com/spreadsheet...dgA7ciL76mBs/edit?gid=485052351#gid=485052351

That is supposed to be all results.

They are there. Go to the seconds column (C) and sort A-Z. First two are yours and my submissions.

Markfw · Nov 4, 2024

Josh128 said:
They are there. Go to the seconds column (C) and sort A-Z. First two are yours and my submissions.

View attachment 110963

wow, a 12 core 9900x got second place ! cool !

deasd · Nov 4, 2024

Maybe not solely related to Zen5 but still an interesting news, regarding AVX512 support on FFmpeg:

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

AVX-512 can benefit the average Joe, it appears.

www.tomshardware.com

The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements — from three to 94 times faster — compared to standard implementations. AVX-512 enables processing large chunks of data in parallel using 512-bit registers, which can handle up to 16 single-precision FLOPS or 8 double-precision FLOPS in one operation. This optimization is ideal for compute-heavy tasks in general, but in the case of video and image processing in particular.

It's from 3 to 94 times faster on FFmpeg compare to baseline C. Not 94%, but 94 times.

Rheingold · Nov 4, 2024

deasd said:
It's from 3 to 94 times faster on FFmpeg compared to SSE3/AVX2.

For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.

Gideon · Nov 4, 2024

Rheingold said:
For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.

I was just to post this. That's just tomshardware being doing their british tabloid level articles.

I'm still salty they kept financing that thing and shut down Anandtech

deasd · Nov 5, 2024

Rheingold said:
For that one specific function, it's 94x against baseline C, with SSE3 (40x) and AVX2 (67x) already having huge speed improvements. The AVX-512 implementation is "only" 40% faster than the AVX2 version. Always read the primary source.

Sry, fixed that. I was just in a hurry leaving home and fooled by Tomshardware title.

From FFmpeg official data it's -3%/15%/59%/40% difference against AVX2

https://twitter.com/x/status/1852542388851601913

JustViewing · Nov 5, 2024

deasd said:
Maybe not solely related to Zen5 but still an interesting news, regarding AVX512 support on FFmpeg:

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

AVX-512 can benefit the average Joe, it appears.

www.tomshardware.com

It's from 3 to 94 times faster on FFmpeg compare to baseline C. Not 94%, but 94 times.

While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.

MS_AT · Nov 5, 2024

JustViewing said:
While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.

Sounds like an AI sales pitch Still asking current models to write that kind of code is usually a funny experience in looking for all the places it shoot itself in the foot.

After all the they are being trained mostly on those bloated apps...

JustViewing · Nov 5, 2024

MS_AT said:
After all the they are being trained mostly on those bloated apps...

True, AI might feel it need to bloat the code more to make it less efficient if that is what they were trained on

Aeonsim · Nov 5, 2024

JustViewing said:
While modern compilers are good, they don't have the understanding of the code and its intentions. But that could all change with future AI/LLM backed compilers, where compiler could understand the intention and optimize good as or better than experienced assembly programmer (even assembly programmer have to make some sacrifices to make the code more manageable which in turn can reduce performance). AI could rewrite the code to make to get maximum possible performance as well as reduce bloated applications.

With a proper AGI possibly, but at the moment "AI" are just probability machines. Outputting the most likely code they've come across based on the prompt and there training set. There is no understanding of the code or it's purpose. Depending on what they're trained on they may produce slower code (because it is more common/likely in the training set) than what you original have. I doubt there is enough good assembly code in there training sets to allow anything approaching optimal outcomes.

I've played around a bit with getting them to try optimizing or writing SIMD algorithms for HPC, and they're pretty terrible at anything more than a one or two liner.

Hans Gruber · Nov 5, 2024

Amazon had the 7800x3D for $335 a month ago. Just throwing that out there if the MSRP is $450+ for 9800x3D.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Platinum Member

Senior member

Senior member

Senior member

Elite Member

Elite Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Moderator Emeritus, Elite Member

Senior member

Moderator Emeritus, Elite Member

Senior member

Member

Golden Member

Senior member

Senior member

Senior member

Senior member

Junior Member

Platinum Member