Is your 9700X at 105W? If not, that may explain it.Why is this thing so bad at Super Pi? With 7700X I got 6 secs for 1M but with 9700X can't do it under 7 secs.
Try turning SMT off and running it in the hidden Administrator account.Super pi is purely single threaded.
Isn't SuperPI limited to x87 and SSE2? If so it might be impacted by changes in instruction latency in Zen5.Why is this thing so bad at Super Pi? With 7700X I got 6 secs for 1M but with 9700X can't do it under 7 secs.
Not correct, single CCD Zen4 scales alittle with memoryspeeds even in 2:1 8000MT/s vs 1:1 6600MT/sOk, maybe I wasn't clear enough. The CCD to IOD interface limits you to 64GB/s, while 6000MT/s DDR5 setups provides theoretical 96GB/s. Since CCD to IOD bandwidth is the limiting factor here, it doesn't matter how fast your DRAM is if you saturate CCD to IOD link first [probably better to have a bit higher for various contoller related overheads].
AVX512 would love to use the bandwidth but it won't be able to.
The same should be true for Zen5 as they share the same memory systemA few comments in random order to my findings above
A single 8core Zen4 CCD can take advantage of the higher bandwidth afforded by 2:1 mode vs 1:1 mode, even if the common misconception on many forums is that there is no benefit because they can hardly see any difference in gimmicky AIDA64 memory bench. (its also easy to double check this in other benchmarks such as y-cruncher / GB3 membench which will show the same)
The next question would naturally be what's the "best memory setup", 1:1 mode with its lower latency or 2:1 with its higher bandwidth. There is no easy answer for this as it all depends on what benchmark/game you comparing the numbers in.. Some will prefer latency while others bandwidth, so you just have to check on an individual basis.
But what i can say is that i pretty much always think higher memoryspeed is better, be it in 1:1 mode or 2:1 mode... From time to time i see some limit themself to something like 6000/6200MT/s because they think its faster in games than say 6400MT/s for some reason (?)
My next observation is that i did not find any bandwidth benefit from the "dual rank" (quad) in Clam cache/mem benchmark, but karhu is seemingly showing higher mb/s. But i suspect this is because the higher memory size tested, not increased bandwidth from DR. I will do some more DR karhu runs where i limit used memorysize to same as SR and check if the numbers change. (y) edit Its also possible the forced GDM enabled with DR is eating up the bandwidth benefit compared to SR
Have also seen some complains about some ppl having a hardtime tuning memory on the 1.1.7.0 PatchA FireRangeP AGESA, i can only say that is working pretty good for me on the ASUS GENE, even if i'm using a beta bios. But be warned, stabilizing DR 64gigs @ 8000MT/s is still insanely hard, think i spent like 5x the time on this profile compared to all others combined... Its really on a razors edge, +-5 mv on some rails and you can forget about 10k karhu.
Have also saved all pictures here also, incase this forum goes loco again with the screenshots
It's legacy code. If PiFast from Benchmate or the following multithreaded Rust program also shows Zen 5 losing to Zen 4, then yes, we can say Houston, we have a problem.Super pi is purely single threaded. Power limit has no impact until you go way lower.
There are two sides to this: some people run some random obsolete benchmark, get odd results and draw conclusions (not saying anyone is doing the latter here); OTOH legacy code has to run fast enough.It's legacy code. If PiFast from Benchmate or the following multithreaded Rust program also shows Zen 5 losing to Zen 4, then yes, we can say Houston, we have a problem.
Especially if it's not been characterized (what SIMD extensions are used? Is it memory bottlenecked?).
I think it's been updated to at least use SSE2 (that's why I pointed above, it might hit the latency increase of some instructions in Zen5).
Released in 2003 so best it would be using is SSE2, if at all.
I hope Det0x will go to the trouble of compiling that Rust benchmark. For scienceI think it's been updated to at least use SSE2 (that's why I pointed above, it might hit the latency increase of some instructions in Zen5).
But your 9950x is not retail. Any chance the retail is better ?Can confirm Zen5 being slower in superpi than Zen4
Both these results are mine (done with static clocks)
View attachment 108063
View attachment 108062
As you may recall he compared his ES to a bunch of retail units and it was the best so he kept it.But your 9950x is not retail. Any chance the retail is better ?
The sample is the lucky one! Imagine if it had got into the hands of someone who ignored it coz it was "just" an ES so it would have been sitting around somewhere in his desk collecting dust.He got the golden sample. Lucky fella! 🍀🍀🍀🍀
I could have been more precise. So, just to clear the first point I was talking about bandwidth only, not latency.Not correct, single CCD Zen4 scales alittle with memoryspeeds even in 2:1 8000MT/s vs 1:1 6600MT/s
My own results with Clam cache/mem benchmark:
Results in Clam cache/mem benchmark:AMD DDR5 OC And 24/7 Daily Memory Stability Thread
https://www.amd.com/en/technologies/expo Soon to be released EXPO AMD optimized DDR5 memory. Likely as with Intel memory, A-die will be preferred. M-die is a close second. It has been said that like 3800-4000 1-1 the sweet spot for AMD 5000 series CPUs, 6000 1-1 should be the sweet spot...www.overclock.net
Latency ranking:
- SR 2x16gigs @ 6600MT/s 1:1 mode= 68.75ns
- DR 2x32gigs @ 6600MT/s 1:1 mode =70.17ns
- SR 2x16gigs @ 8000MT/s 2:1 mode = 70.24ns
- DR 2x32gigs @ 8000MT/s 2:1 mode = 71.84ns
Bandwidth read-modify-write (ADD) ranking:
- SR 2x16gigs @ 8000MT/s 2:1 mode= 97.11GB/s
- DR 2x32gigs @ 8000MT/s 2:1 mode = 92.87GB/s
- SR 2x16gigs @ 6600MT/s 1:1 mode = 91.23GB/s
- DR 2x32gigs @ 6600MT/s 1:1 mode = 87.34GB/s
The same should be true for Zen5 as they share the same memory system
TL;DR - cores dangerously close to each other, collisions expected.Reading Lion Cove/Skymont analysis at David Huang's Blog there are interesting comparisons to Zen 5.
* Intel really went from 6-wide to 8-wide x86 decode this gen; while AMD apparently sticks to 4-wide
* Skymont internal structure sizing is dangerously close to Zen 5 (except FP-related)
* Lion Cove vs Zen 5 SPEC2017 INT scores achieved at 4.2GHz are very close
* Skymont internal structure sizing is dangerously close to Zen 5 (except FP-related)
They are called dense cores instead of efficient cores for a reason, although it is weird that they aren’t as efficient even though AMD touted a perf/W gain, e.g.TL;DR - cores dangerously close to each other, collisions expected.
The one thing that does not sit right with me is the efficiency of the cores in the dense cluster, less efficiency than vanilla cores is weird.
View attachment 108440
AMD themselves showed Zen 4c improving efficiency for low power scenarios in Phoenix 2. Also note the wording: "better optimized for NT efficiency, and size". The idea was to gain the density jump while also preserving or preferably improving efficiency. That being said, David Huang's package power readings might not be enough to tell the whole story here, but for what is worth it's showing a regression.They are called dense cores instead of efficient cores for a reason