Discussion Zen 5 Architecture & Technical discussion

yuri69 · Aug 14, 2024

OneRaichu

SPEC int - Zen 5 - 3.6GHz - GCC 10 - JEDEC

gdansk · Aug 14, 2024

yuri69 said:
OneRaichu

SPEC int - Zen 5 - 3.6GHz - GCC 10 - JEDEC

I can't view it but if yuri69 linked it then it must be terrible.

Care to share with the non-x users?

yuri69 · Aug 14, 2024

gdansk said:
Care to share with the non-x users?

Sure thing

inf64 · Aug 14, 2024

These results make no sense. We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T

The AMD Ryzen 9 9950X and Ryzen 9 9900X Review: Flagship Zen 5 Soars - and Stalls

www.anandtech.com

Hitman928 · Aug 14, 2024

inf64 said:
These results make no sense. We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T

The AMD Ryzen 9 9950X and Ryzen 9 9900X Review: Flagship Zen 5 Soars - and Stalls

www.anandtech.com

Huang actually measured 11% after retesting (he realized he initially used a different compiler flag between Zen 4 and Zen 5).

yuri69 · Aug 14, 2024

CouncilorIrissa said:
AMD’s Ryzen 9950X: Zen 5 on Desktop

AMD’s desktop Zen 5 products, codenamed Granite Ridge, are the latest in the company’s line of high performance consumer offerings. Here, we’ll be looking at AMD’s Ryzen 9 9…

chipsandcheese.com

It's official, no parallel decoding with 2 clusters SMT off.

Sadly only 2 applications were profiled. Anyways, this article highlights the Zen 5 bottlenecks - tiny increase to the int PRF with the ROB still being less then Golden Cove. The unified int scheduler also regressed in total capacity. Oh well

Profiling a gaming workload would be nice.

inf64 said:
These results make no sense. We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T

Dunno. He used GCC 10 which is over 4 years old. This means it simulates rather outdated than stable environments.

del42sa · Aug 14, 2024

Zen 5 continues to enjoy very fast cache to cache transfers within a cluster. However, cross-cluster latencies are high compared to prior generations. At nearly 200 ns, cross-cluster latencies aren’t far off from cross-socket latencies on a server platform. It’s a regression compared to prior Zen generations ( 80 ns ), where cross-cluster latencies were more comparable to worst-case latencies on a monolithic mesh based design.

Zen 5’s biggest stall reason is the ROB filling up, which is a good thing because it suggests other resources are appropriately sized. AMD’s revamped NSQ setup deserves credit for basically eliminating stalls due to lack of FP/vector register file entries, an issue that Zen 4 struggled with. On the other hand, Zen 5’s integer register file only got a small capacity increase, and frequently finds itself full.

https://chipsandcheese.com/2024/08/14/amds-ryzen-9950x-zen-5-on-desktop/

MS_AT · Aug 14, 2024

yuri69 said:
Dunno. He used GCC 10 which is over 4 years old. This means it simulates rather outdated than stable environments.

His choice of flags is sub-optimal. He is using -march=native, with the compiler that was released before any architecture he tests came to existence. Therefore compiler, not recognizing specific architectures will default to base target that enables SSE2 at most. While for CPUs at hand it is not so significant because they all are unknown to GCC 10, if he tested something that was known to GCC10 (let's say Coffee Lake or Zen2) he would put others at disadvantage as the compiler would generate AVX2 code for those targets what could affect specific subtests like x264. Here it only flattens the results. Just something to keep in mind.

yuri69 said:
Sadly only 2 applications were profiled. Anyways, this article highlights the Zen 5 bottlenecks - tiny increase to the int PRF with the ROB still being less then Golden Cove. The unified int scheduler also regressed in total capacity. Oh well

Actually while total capacity got lower, the effective capacity should be higher as address generation has its own scheduler. Problem is strain on those could be greater since they added more execution resources? Also L1i shows greater miss ratio than Zen4. I think David Huang was also pointing at this, but in C&C the difference is not so dramatic, hmm

StefanR5R · Aug 14, 2024

inf64 said:
We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T

More precisely, David Huang reports +9% at 5.1 GHz fixed clock and +11% at 4.2 GHz fixed clock (both int rate-1, Strix Point versus Phoenix) while AnandTech reports +11% at unspecified, freewheeling clock (int rate-1, Granite Ridge versus Raphael).

poke01 · Aug 14, 2024

https://x.com/9550pro/status/1823738626536300808?s=46

Oh great..

inf64 · Aug 14, 2024

StefanR5R said:
More precisely, David Huang reports +9% at 5.1 GHz fixed clock and +11% at 4.2 GHz fixed clock (both int rate-1, Strix Point versus Phoenix) while AnandTech reports +11% at unspecified, freewheeling clock (int rate-1, Granite Ridge versus Raphael).

Granite Ridge 9950X boosts 25 to 50Mhz lower than 7950X for ST according to Gamers Nexus, so it's basically iso clock comparison in AT review.

MS_AT · Aug 14, 2024

Hitman928 said:
Huang actually measured 11% after retesting (he realized he initially used a different compiler flag between Zen 4 and Zen 5).

The funny thing is he reverted to telling the compiler to treat the targets as znver3 due to bug in znver4 gcc cost tables [and he used znver4 for zen4 and zen5 as the version of GCC he is using doesn't have znver5 target]

yuri69 · Aug 14, 2024

MS_AT said:
The funny thing is he reverted to telling the compiler to treat the targets as znver3 due to bug in znver4 gcc cost tables [and he used znver4 for zen4 and zen5 as the version of GCC he is using doesn't have znver5 target]

Details here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1e3aa9c9278db69d4bdb661a750a7268789188d6

Jan Olšan · Aug 14, 2024

inf64 said:
Granite Ridge 9950X boosts 25 to 50Mhz lower than 7950X for ST according to Gamers Nexus, so it's basically iso clock comparison in AT review.

Do they test with AIO? It may be the other way around if you use air cooler. 7950X would have trouble staying in its highest boost bins, while 9950X's lower temperature gives it more breathing space. AMD's reps said they consider that an advantage that liquid/AIO is no longer a soft requirement and you can use aircoolers comfortably again. That's of course influenced by them trying to sell their product.

Our test measured lower clocks on 9950X in games and multithreaded tasks compared to 7950X with Noctua NH-U14S. But in single-thread task (FLAC encoding, the last graph), things turn around completely, 9950X manages ~5650 MHz compared to ~5500 MHz on 7950X.

So that story about being able to comfortably use air coolers again and not needing AIO as much sounds like having some truth to it.

del42sa · Aug 15, 2024

https://forum.beyond3d.com/threads/amd-execution-thread-2024.63467/post-2347971

mmaenpaa · Aug 15, 2024

gdansk said:
It is possible AVX512 might matter soon enough.

I wonder if there is a tool for windows which could analyze what AVX (if any) is used in a running program?

For example I asked Mastercam if it uses any form of AVX but have not received any answer yet. I am building a new workstation for this and some toolpath runs take currently 20-30 minutes on 5900X.

I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)

gdansk · Aug 15, 2024

mmaenpaa said:
I wonder if there is a tool for windows which could analyze what AVX (if any) is used in a running program?

For example I asked Mastercam if it uses any form of AVX but have not received any answer yet. I am building a new workstation for this and some toolpath runs take currently 20-30 minutes on 5900X.

I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)

Always assume it doesn't.

igor_kavinski · Aug 15, 2024

mmaenpaa said:
I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)

Benchmark 3.0

This is a continuation of Benchmark 2.0. I thought a new and clean slate would make things easier . The file for Benchmark 3.0 : Benchmark 3_0 for 2017.zip 1. So I propose Benchmark 3_0 . For starters the old file was a whopping 120 megabytes . This new file is 1.2 megabytes a 100 fold reduction ...

www.emastercam.com

Maybe someone with 9950X could run it for you?

mmaenpaa · Aug 15, 2024

igor_kavinski said:
Benchmark 3.0

This is a continuation of Benchmark 2.0. I thought a new and clean slate would make things easier . The file for Benchmark 3.0 : Benchmark 3_0 for 2017.zip 1. So I propose Benchmark 3_0 . For starters the old file was a whopping 120 megabytes . This new file is 1.2 megabytes a 100 fold reduction ...

www.emastercam.com

Maybe someone with 9950X could run it for you?

That one needs a fully licensed Mastercam, I am planning to run it on my customer's current workstation.

Hmm, it seems there is almost fully working Home Learning Edition, maybe I will try that for benchmarking.

I really like to test these programs and get real world data vs. benchmarks.

Next is Solidworks & EPLAN (I need to make workstations for those too)

igor_kavinski · Aug 15, 2024

mmaenpaa said:
Hmm, it seems there is almost fully working Home Learning Edition, maybe I will try that for benchmarking.

Do let us know how it goes.

Mahboi · Aug 17, 2024

At 4:30, Mike Clark basically answers the question "why did you only get 16% IPC" with "the software isn't ready for Zen 5, so it'll improve and probably Zen 6/7 will get the credit for what Zen 5 did".
FineWining in Ryzen too now?

Thunder 57 · Aug 17, 2024

"The software isn't ready" has never turned out to be true IIRC.

Mahboi · Aug 17, 2024

Agreed. But it is interesting how he mentions that essentially the problem is the shift from the 6 wide decode to 4 wide, and also the former 4 ALUs. He isn't asked about FP or INT or anything, just "why's the IPC only that?" and the answer is specifically jumping to decode width and ALU count, which IIRC only went from 4 to 6 in INT. So my little ear got attentive there. Seems like even he knows what the crux of the complaints is. And claims that soft will grow enough over time (which apparently doesn't mean only Z5 but also 6/7 possibly).

DavidC1 · Aug 18, 2024

gdansk said:
Do we know the clock rates of N3E Zen 5C?
I'm also assuming it is lower than that but it is odd to be so confident without measuring.

There's this too.

Zen 5 and 5C actually showed a 6% gap between the two per clock. If Skymont is indeed 2% faster, and Int advantage for Zen 5 is 10%, you have SKT and Zen 5C being less than 2%. If Darkmont gets 5% advantage like Crestmont, you'll have Intel's E core being faster than Zen 5C.

Jan Olšan said:
If that was not the goal, they would do it like Golden Cove and Lion Cove and try to add decoders in a single cluster.

Golden and Lion doesn't have clusters. Monts do. Tremont, Gracemont, Skymont.

StefanR5R · Aug 18, 2024

DavidC1 said:
Zen 5 and 5C actually showed a 6% gap between the two per clock.

This is from a 4c16MB CCX to 8c8MB CCX comparison, at a clock speed magnitude of 5 GHz, isn't it?

[Edit: Whoops, more likely at a clock speed magnitude of 3 GHz at least in the case of the 8c8MB CCX, as 3.3 GHz is its peak boost clock.]

Discussion Zen 5 Architecture & Technical discussion

Senior member

Platinum Member

Senior member

Diamond Member

Diamond Member

Senior member

Member

Senior member

Elite Member

Platinum Member

Diamond Member

Senior member

Senior member

Senior member

Member

Member

Platinum Member

Lifer

Member

Lifer

Golden Member

Platinum Member

Golden Member

Senior member

Elite Member