Discussion Zen 5 Architecture & Technical discussion

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lopri

Elite Member
Jul 27, 2002
13,217
600
126
Why is this thing so bad at Super Pi? With 7700X I got 6 secs for 1M but with 9700X can't do it under 7 secs.
 

lopri

Elite Member
Jul 27, 2002
13,217
600
126
Super pi is purely single threaded. Power limit has no impact until you go way lower.
 

Det0x

Golden Member
Sep 11, 2014
1,253
3,952
136
Ok, maybe I wasn't clear enough. The CCD to IOD interface limits you to 64GB/s, while 6000MT/s DDR5 setups provides theoretical 96GB/s. Since CCD to IOD bandwidth is the limiting factor here, it doesn't matter how fast your DRAM is if you saturate CCD to IOD link first [probably better to have a bit higher for various contoller related overheads].

AVX512 would love to use the bandwidth but it won't be able to.
Not correct, single CCD Zen4 scales alittle with memoryspeeds even in 2:1 8000MT/s vs 1:1 6600MT/s
My own results with Clam cache/mem benchmark:
Results in Clam cache/mem benchmark:

Latency ranking:
  1. SR 2x16gigs @ 6600MT/s 1:1 mode= 68.75ns
  2. DR 2x32gigs @ 6600MT/s 1:1 mode =70.17ns
  3. SR 2x16gigs @ 8000MT/s 2:1 mode = 70.24ns
  4. DR 2x32gigs @ 8000MT/s 2:1 mode = 71.84ns

Bandwidth read-modify-write (ADD) ranking:
  1. SR 2x16gigs @ 8000MT/s 2:1 mode= 97.11GB/s
  2. DR 2x32gigs @ 8000MT/s 2:1 mode = 92.87GB/s
  3. SR 2x16gigs @ 6600MT/s 1:1 mode = 91.23GB/s
  4. DR 2x32gigs @ 6600MT/s 1:1 mode = 87.34GB/s
A few comments in random order to my findings above

A single 8core Zen4 CCD can take advantage of the higher bandwidth afforded by 2:1 mode vs 1:1 mode, even if the common misconception on many forums is that there is no benefit because they can hardly see any difference in gimmicky AIDA64 memory bench. (its also easy to double check this in other benchmarks such as y-cruncher / GB3 membench which will show the same)

The next question would naturally be what's the "best memory setup", 1:1 mode with its lower latency or 2:1 with its higher bandwidth. There is no easy answer for this as it all depends on what benchmark/game you comparing the numbers in.. Some will prefer latency while others bandwidth, so you just have to check on an individual basis.

But what i can say is that i pretty much always think higher memoryspeed is better, be it in 1:1 mode or 2:1 mode... From time to time i see some limit themself to something like 6000/6200MT/s because they think its faster in games than say 6400MT/s for some reason (?)

My next observation is that i did not find any bandwidth benefit from the "dual rank" (quad) in Clam cache/mem benchmark, but karhu is seemingly showing higher mb/s. But i suspect this is because the higher memory size tested, not increased bandwidth from DR. I will do some more DR karhu runs where i limit used memorysize to same as SR and check if the numbers change. (y) edit Its also possible the forced GDM enabled with DR is eating up the bandwidth benefit compared to SR

Have also seen some complains about some ppl having a hardtime tuning memory on the 1.1.7.0 PatchA FireRangeP AGESA, i can only say that is working pretty good for me on the ASUS GENE, even if i'm using a beta bios. But be warned, stabilizing DR 64gigs @ 8000MT/s is still insanely hard, think i spent like 5x the time on this profile compared to all others combined... Its really on a razors edge, +-5 mv on some rails and you can forget about 10k karhu.

Have also saved all pictures here also, incase this forum goes loco again with the screenshots
The same should be true for Zen5 as they share the same memory system
 
Last edited:
Jul 27, 2020
19,761
13,556
146
Super pi is purely single threaded. Power limit has no impact until you go way lower.
It's legacy code. If PiFast from Benchmate or the following multithreaded Rust program also shows Zen 5 losing to Zen 4, then yes, we can say Houston, we have a problem.

 

Nothingness

Diamond Member
Jul 3, 2013
3,054
2,020
136
It's legacy code. If PiFast from Benchmate or the following multithreaded Rust program also shows Zen 5 losing to Zen 4, then yes, we can say Houston, we have a problem.
There are two sides to this: some people run some random obsolete benchmark, get odd results and draw conclusions (not saying anyone is doing the latter here); OTOH legacy code has to run fast enough.

But at this point I'm not sure what the point of running that obsolete unmaintained PiFast is. Especially if it's not been characterized (what SIMD extensions are used? Is it memory bottlenecked?).
 

Det0x

Golden Member
Sep 11, 2014
1,253
3,952
136
In all its beauty 😘


But on a more serious note guys, watch out with the direct die frame v2
Even if TG says it support Zen5, its not without problems..

Long store short, i was getting pretty bad temp spread on the cores after delid


7 remounts later i found the problem (yes this took hours)
Frame had been pressing down on the glue on each side of the CCD's


Temperature spread @ 310w PPT after the fix are looking much better


 
Last edited:

MS_AT

Senior member
Jul 15, 2024
216
518
96
Not correct, single CCD Zen4 scales alittle with memoryspeeds even in 2:1 8000MT/s vs 1:1 6600MT/s
My own results with Clam cache/mem benchmark:
Results in Clam cache/mem benchmark:

Latency ranking:
  1. SR 2x16gigs @ 6600MT/s 1:1 mode= 68.75ns
  2. DR 2x32gigs @ 6600MT/s 1:1 mode =70.17ns
  3. SR 2x16gigs @ 8000MT/s 2:1 mode = 70.24ns
  4. DR 2x32gigs @ 8000MT/s 2:1 mode = 71.84ns

Bandwidth read-modify-write (ADD) ranking:
  1. SR 2x16gigs @ 8000MT/s 2:1 mode= 97.11GB/s
  2. DR 2x32gigs @ 8000MT/s 2:1 mode = 92.87GB/s
  3. SR 2x16gigs @ 6600MT/s 1:1 mode = 91.23GB/s
  4. DR 2x32gigs @ 6600MT/s 1:1 mode = 87.34GB/s

The same should be true for Zen5 as they share the same memory system
I could have been more precise. So, just to clear the first point I was talking about bandwidth only, not latency.

The part that I ignored is the fact that CCD to IOD connection is 32B/16B read and write respectively (based on one of earlier C&C investigations) with both lanes, so to speak, usable at the same time what gives you higher bandwidth limit for a test that is mixing reads and writes. Pure read should show bandwith closer to 32B x IF clk. Unless I have missed something in my analysis.
 
Reactions: lightmanek
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |