I can't view it but if yuri69 linked it then it must be terrible.
These results make no sense. We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T
Sadly only 2 applications were profiled. Anyways, this article highlights the Zen 5 bottlenecks - tiny increase to the int PRF with the ROB still being less then Golden Cove. The unified int scheduler also regressed in total capacity. Oh wellAMD’s Ryzen 9950X: Zen 5 on Desktop
AMD’s desktop Zen 5 products, codenamed Granite Ridge, are the latest in the company’s line of high performance consumer offerings. Here, we’ll be looking at AMD’s Ryzen 9 9…chipsandcheese.com
It's official, no parallel decoding with 2 clusters SMT off.
Dunno. He used GCC 10 which is over 4 years old. This means it simulates rather outdated than stable environments.These results make no sense. We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T
Zen 5 continues to enjoy very fast cache to cache transfers within a cluster. However, cross-cluster latencies are high compared to prior generations. At nearly 200 ns, cross-cluster latencies aren’t far off from cross-socket latencies on a server platform. It’s a regression compared to prior Zen generations ( 80 ns ), where cross-cluster latencies were more comparable to worst-case latencies on a monolithic mesh based design.
https://chipsandcheese.com/2024/08/14/amds-ryzen-9950x-zen-5-on-desktop/Zen 5’s biggest stall reason is the ROB filling up, which is a good thing because it suggests other resources are appropriately sized. AMD’s revamped NSQ setup deserves credit for basically eliminating stalls due to lack of FP/vector register file entries, an issue that Zen 4 struggled with. On the other hand, Zen 5’s integer register file only got a small capacity increase, and frequently finds itself full.
His choice of flags is sub-optimal. He is using -march=native, with the compiler that was released before any architecture he tests came to existence. Therefore compiler, not recognizing specific architectures will default to base target that enables SSE2 at most. While for CPUs at hand it is not so significant because they all are unknown to GCC 10, if he tested something that was known to GCC10 (let's say Coffee Lake or Zen2) he would put others at disadvantage as the compiler would generate AVX2 code for those targets what could affect specific subtests like x264. Here it only flattens the results. Just something to keep in mind.Dunno. He used GCC 10 which is over 4 years old. This means it simulates rather outdated than stable environments.
Actually while total capacity got lower, the effective capacity should be higher as address generation has its own scheduler. Problem is strain on those could be greater since they added more execution resources? Also L1i shows greater miss ratio than Zen4. I think David Huang was also pointing at this, but in C&C the difference is not so dramatic, hmmSadly only 2 applications were profiled. Anyways, this article highlights the Zen 5 bottlenecks - tiny increase to the int PRF with the ROB still being less then Golden Cove. The unified int scheduler also regressed in total capacity. Oh well
More precisely, David Huang reports +9% at 5.1 GHz fixed clock and +11% at 4.2 GHz fixed clock (both int rate-1, Strix Point versus Phoenix) while AnandTech reports +11% at unspecified, freewheeling clock (int rate-1, Granite Ridge versus Raphael).We had David Huang measuring 10% IPC increase on gimped (StrixPoint) Zen5 core while AT measured 11% on Granite Ridge Zen 5 in Specint 1T
Granite Ridge 9950X boosts 25 to 50Mhz lower than 7950X for ST according to Gamers Nexus, so it's basically iso clock comparison in AT review.More precisely, David Huang reports +9% at 5.1 GHz fixed clock and +11% at 4.2 GHz fixed clock (both int rate-1, Strix Point versus Phoenix) while AnandTech reports +11% at unspecified, freewheeling clock (int rate-1, Granite Ridge versus Raphael).
The funny thing is he reverted to telling the compiler to treat the targets as znver3 due to bug in znver4 gcc cost tables [and he used znver4 for zen4 and zen5 as the version of GCC he is using doesn't have znver5 target]Huang actually measured 11% after retesting (he realized he initially used a different compiler flag between Zen 4 and Zen 5).
Details here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1e3aa9c9278db69d4bdb661a750a7268789188d6The funny thing is he reverted to telling the compiler to treat the targets as znver3 due to bug in znver4 gcc cost tables [and he used znver4 for zen4 and zen5 as the version of GCC he is using doesn't have znver5 target]
Do they test with AIO? It may be the other way around if you use air cooler. 7950X would have trouble staying in its highest boost bins, while 9950X's lower temperature gives it more breathing space. AMD's reps said they consider that an advantage that liquid/AIO is no longer a soft requirement and you can use aircoolers comfortably again. That's of course influenced by them trying to sell their product.Granite Ridge 9950X boosts 25 to 50Mhz lower than 7950X for ST according to Gamers Nexus, so it's basically iso clock comparison in AT review.
I wonder if there is a tool for windows which could analyze what AVX (if any) is used in a running program?It is possible AVX512 might matter soon enough.
Always assume it doesn't.I wonder if there is a tool for windows which could analyze what AVX (if any) is used in a running program?
For example I asked Mastercam if it uses any form of AVX but have not received any answer yet. I am building a new workstation for this and some toolpath runs take currently 20-30 minutes on 5900X.
I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)
I assume I can get upto 40% more perfomance with 7950X but maybe there is more uplift using 9950X at this time (worth the 200€ difference)
That one needs a fully licensed Mastercam, I am planning to run it on my customer's current workstation.Benchmark 3.0
This is a continuation of Benchmark 2.0. I thought a new and clean slate would make things easier . The file for Benchmark 3.0 : Benchmark 3_0 for 2017.zip 1. So I propose Benchmark 3_0 . For starters the old file was a whopping 120 megabytes . This new file is 1.2 megabytes a 100 fold reduction ...www.emastercam.com
Maybe someone with 9950X could run it for you?
Do let us know how it goes.Hmm, it seems there is almost fully working Home Learning Edition, maybe I will try that for benchmarking.
There's this too.Do we know the clock rates of N3E Zen 5C?
I'm also assuming it is lower than that but it is odd to be so confident without measuring.
Golden and Lion doesn't have clusters. Monts do. Tremont, Gracemont, Skymont.If that was not the goal, they would do it like Golden Cove and Lion Cove and try to add decoders in a single cluster.
This is from a 4c16MB CCX to 8c8MB CCX comparison, at a clock speed magnitude of 5 GHz, isn't it?Zen 5 and 5C actually showed a 6% gap between the two per clock.