Discussion Zen 5 Architecture & Technical discussion

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hitman928

Diamond Member
Apr 15, 2012
6,019
10,339
136

yuri69

Senior member
Jul 16, 2013
530
944
136

It's official, no parallel decoding with 2 clusters SMT off.
Sadly only 2 applications were profiled. Anyways, this article highlights the Zen 5 bottlenecks - tiny increase to the int PRF with the ROB still being less then Golden Cove. The unified int scheduler also regressed in total capacity. Oh well

Profiling a gaming workload would be nice.
These results make no sense. We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T
Dunno. He used GCC 10 which is over 4 years old. This means it simulates rather outdated than stable environments.
 

del42sa

Member
May 28, 2013
99
114
106
Zen 5 continues to enjoy very fast cache to cache transfers within a cluster. However, cross-cluster latencies are high compared to prior generations. At nearly 200 ns, cross-cluster latencies aren’t far off from cross-socket latencies on a server platform. It’s a regression compared to prior Zen generations ( 80 ns ), where cross-cluster latencies were more comparable to worst-case latencies on a monolithic mesh based design.
Zen 5’s biggest stall reason is the ROB filling up, which is a good thing because it suggests other resources are appropriately sized. AMD’s revamped NSQ setup deserves credit for basically eliminating stalls due to lack of FP/vector register file entries, an issue that Zen 4 struggled with. On the other hand, Zen 5’s integer register file only got a small capacity increase, and frequently finds itself full.
https://chipsandcheese.com/2024/08/14/amds-ryzen-9950x-zen-5-on-desktop/
 
Last edited:

MS_AT

Member
Jul 15, 2024
188
414
91
Dunno. He used GCC 10 which is over 4 years old. This means it simulates rather outdated than stable environments.
His choice of flags is sub-optimal. He is using -march=native, with the compiler that was released before any architecture he tests came to existence. Therefore compiler, not recognizing specific architectures will default to base target that enables SSE2 at most. While for CPUs at hand it is not so significant because they all are unknown to GCC 10, if he tested something that was known to GCC10 (let's say Coffee Lake or Zen2) he would put others at disadvantage as the compiler would generate AVX2 code for those targets what could affect specific subtests like x264. Here it only flattens the results. Just something to keep in mind.
Sadly only 2 applications were profiled. Anyways, this article highlights the Zen 5 bottlenecks - tiny increase to the int PRF with the ROB still being less then Golden Cove. The unified int scheduler also regressed in total capacity. Oh well
Actually while total capacity got lower, the effective capacity should be higher as address generation has its own scheduler. Problem is strain on those could be greater since they added more execution resources? Also L1i shows greater miss ratio than Zen4. I think David Huang was also pointing at this, but in C&C the difference is not so dramatic, hmm
 

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,746
136
We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T
More precisely, David Huang reports +9% at 5.1 GHz fixed clock and +11% at 4.2 GHz fixed clock (both int rate-1, Strix Point versus Phoenix) while AnandTech reports +11% at unspecified, freewheeling clock (int rate-1, Granite Ridge versus Raphael).
 

inf64

Diamond Member
Mar 11, 2011
3,863
4,535
136
More precisely, David Huang reports +9% at 5.1 GHz fixed clock and +11% at 4.2 GHz fixed clock (both int rate-1, Strix Point versus Phoenix) while AnandTech reports +11% at unspecified, freewheeling clock (int rate-1, Granite Ridge versus Raphael).
Granite Ridge 9950X boosts 25 to 50Mhz lower than 7950X for ST according to Gamers Nexus, so it's basically iso clock comparison in AT review.
 

MS_AT

Member
Jul 15, 2024
188
414
91
Huang actually measured 11% after retesting (he realized he initially used a different compiler flag between Zen 4 and Zen 5).
The funny thing is he reverted to telling the compiler to treat the targets as znver3 due to bug in znver4 gcc cost tables [and he used znver4 for zen4 and zen5 as the version of GCC he is using doesn't have znver5 target]
 

Jan Olšan

Senior member
Jan 12, 2017
396
680
136
Granite Ridge 9950X boosts 25 to 50Mhz lower than 7950X for ST according to Gamers Nexus, so it's basically iso clock comparison in AT review.
Do they test with AIO? It may be the other way around if you use air cooler. 7950X would have trouble staying in its highest boost bins, while 9950X's lower temperature gives it more breathing space. AMD's reps said they consider that an advantage that liquid/AIO is no longer a soft requirement and you can use aircoolers comfortably again. That's of course influenced by them trying to sell their product.

Our test measured lower clocks on 9950X in games and multithreaded tasks compared to 7950X with Noctua NH-U14S. But in single-thread task (FLAC encoding, the last graph), things turn around completely, 9950X manages ~5650 MHz compared to ~5500 MHz on 7950X.

So that story about being able to comfortably use air coolers again and not needing AIO as much sounds like having some truth to it.
 
Last edited:

mmaenpaa

Member
Aug 4, 2009
89
151
106
It is possible AVX512 might matter soon enough.
I wonder if there is a tool for windows which could analyze what AVX (if any) is used in a running program?

For example I asked Mastercam if it uses any form of AVX but have not received any answer yet. I am building a new workstation for this and some toolpath runs take currently 20-30 minutes on 5900X.

I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)
 
Reactions: Tlh97 and Ken g6

gdansk

Platinum Member
Feb 8, 2011
2,829
4,190
136
I wonder if there is a tool for windows which could analyze what AVX (if any) is used in a running program?

For example I asked Mastercam if it uses any form of AVX but have not received any answer yet. I am building a new workstation for this and some toolpath runs take currently 20-30 minutes on 5900X.

I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)
Always assume it doesn't.
 
Jul 27, 2020
19,594
13,435
146
I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)

Maybe someone with 9950X could run it for you?
 

mmaenpaa

Member
Aug 4, 2009
89
151
106

Maybe someone with 9950X could run it for you?
That one needs a fully licensed Mastercam, I am planning to run it on my customer's current workstation.

Hmm, it seems there is almost fully working Home Learning Edition, maybe I will try that for benchmarking.

I really like to test these programs and get real world data vs. benchmarks.

Next is Solidworks & EPLAN (I need to make workstations for those too)
 
Reactions: lightmanek

Mahboi

Senior member
Apr 4, 2024
956
1,721
96
At 4:30, Mike Clark basically answers the question "why did you only get 16% IPC" with "the software isn't ready for Zen 5, so it'll improve and probably Zen 6/7 will get the credit for what Zen 5 did".
FineWining in Ryzen too now?
 

Mahboi

Senior member
Apr 4, 2024
956
1,721
96
Agreed. But it is interesting how he mentions that essentially the problem is the shift from the 6 wide decode to 4 wide, and also the former 4 ALUs. He isn't asked about FP or INT or anything, just "why's the IPC only that?" and the answer is specifically jumping to decode width and ALU count, which IIRC only went from 4 to 6 in INT. So my little ear got attentive there. Seems like even he knows what the crux of the complaints is. And claims that soft will grow enough over time (which apparently doesn't mean only Z5 but also 6/7 possibly).
 
Reactions: igor_kavinski

DavidC1

Senior member
Dec 29, 2023
776
1,230
96
Do we know the clock rates of N3E Zen 5C?
I'm also assuming it is lower than that but it is odd to be so confident without measuring.
There's this too.

Zen 5 and 5C actually showed a 6% gap between the two per clock. If Skymont is indeed 2% faster, and Int advantage for Zen 5 is 10%, you have SKT and Zen 5C being less than 2%. If Darkmont gets 5% advantage like Crestmont, you'll have Intel's E core being faster than Zen 5C.
If that was not the goal, they would do it like Golden Cove and Lion Cove and try to add decoders in a single cluster.
Golden and Lion doesn't have clusters. Monts do. Tremont, Gracemont, Skymont.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,746
136
Zen 5 and 5C actually showed a 6% gap between the two per clock.
This is from a 4c16MB CCX to 8c8MB CCX comparison, at a clock speed magnitude of 5 GHz, isn't it?

[Edit: Whoops, more likely at a clock speed magnitude of 3 GHz at least in the case of the 8c8MB CCX, as 3.3 GHz is its peak boost clock.]
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |