Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Hans Gruber · Aug 16, 2024

Josh128 said:
AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.

More and more it looks like desktop Zen 5 should have just been delayed, even if it would be a 6 month plus delay, to get this and other performance anomalies ironed out. I cant wait to see the core latencies on Zen 5C 3nm Turin, which is rumored to have the fabled 16 core CCX.

I found some leaks of Arrow Lake 5 series CPU geekbench results. Zen 5 beat Intel Arrow Lake significantly in single thread and got smoked in multi thread results. I can almost guarantee you 100% that AMD's problem is in their AGESA bios. I will not even hazard a guess as to what the real Zen 5 performance should/will be when they sort out their mess.

It took AMD 6 months or more to get Zen 4 ironed out. Out of the box Zen 4 was pretty good to begin with. With all the testing anomalies reviewers ran into. It seems obvious that AMD released a completely flawed AGESA bios with Zen 5.

AMD really needs to fix their DDR5 bandwidth limits. They should have upgrade their infinity fabric controller with Zen 4 and certainly by Zen 5. That is holding back significant performance gains faster ram would provide.

tsamolotoff · Aug 16, 2024

Hans Gruber said:
AMD really needs to fix their DDR5 bandwidth limits. They should have upgrade their infinity fabric controller with Zen 4 and certainly by Zen 5. That is holding back significant performance gains faster ram would provide.

While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2

marees · Aug 16, 2024

tsamolotoff said:
While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2

Wouldn't AMD have caught this in their internal testing ?

Hans Gruber · Aug 16, 2024

tsamolotoff said:
While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2

From my understanding Zen 4 CPU's are only stable up to DDR5 6000mhz in some cases 6400mhz. With Zen 5 it's more of the same. Newegg has memory kits that go up to 8400mhz. I am adding the DDR5 memory issues with Ryzen to the Zen 5 problem. I figured they would have solved that problem. Lisa said that Zen 4 was going to be a memory OCer's dream. They said the sweet spot was 6400mhz. All the ryzen builders are focusing on DDR5 6000mhz sticks with Cas 30 timings or better.

There is no advancement in DDR5 memory support with Zen 5. The memory market went from DDR5 7200mhz kits during Zen 4 early days to now 8400mhz kits. Zen 5 cannot even support DDR5 7000mhz or better. There should be no limitation on memory speeds in a Zen 5 system or a Zen 4 system. The sky should be the limit.

Abwx · Aug 16, 2024

Hans Gruber said:
Zen 5 cannot even support DDR5 7000mhz or better. There should be no limitation on memory speeds in a Zen 5 system or a Zen 4 system. The sky should be the limit.

In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.

Ryzen 9 9950X & 9900X: Gaming-Benchmarks

Dieser Test analysiert die Gaming-Performance von Ryzen 9 9950X sowie Ryzen 9 9900X inklusive Benchmarks mit RAM-OC bis DDR5-8000.

www.computerbase.de

moinmoin · Aug 16, 2024

Doug S said:
Maybe Linux slants more towards the performance side of power management (more aggressive with clock ramping, which would help Geekbench) and Windows slants more towards saving power.

First of all this is highly configurable, both under Windows and under Linux. But there well may be a slant in that while the Windows scheduler is about the worst one out there, energy saving features are usually supported earlier under Windows than under Linux. But I doubt Ryzen 9000's performance oddities can be ascribed to (correctly working) energy saving features.

If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.

Hans Gruber · Aug 16, 2024

Abwx said:
In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.

Ryzen 9 9950X & 9900X: Gaming-Benchmarks

Dieser Test analysiert die Gaming-Performance von Ryzen 9 9950X sowie Ryzen 9 9900X inklusive Benchmarks mit RAM-OC bis DDR5-8000.

www.computerbase.de

Hopefully all of the current Zen 5 issues and the DDR5 memory issues will be worked out in the upcoming AGESA bios updates.

del42sa · Aug 16, 2024

tsamolotoff · Aug 16, 2024

Hans Gruber said:
All the ryzen builders are focusing on DDR5 6000mhz sticks with Cas 30 timings or better.

Not all, and not really. Also, you can't really compare 1:1 amd mode to anything intel does, as Intel always has 1:2 mode enabled. And in 1:2 mode, it's not hard to get to 7800-8000, sometimes even with 2DPC board. In any case, it's not relevant to this interchiplet communication issue, slower or faster RAM shouldn't affect this at all, as far as I can understand.

In any case, my current setup (cheap $110 24 gbit chinese sticks), works just fine in memory-intensive calculations (actual speedup as compared to 6400 1:1 mode with same timings in terms of nanoseconds).

tsamolotoff · Aug 16, 2024

marees said:
Wouldn't AMD have caught this in their internal testing ?

Well, as I've noted earlier, this might have been some sort of conscious tradeoff made by AMD, maybe related to the fact that IO die for next gen Epyc is new and not the same as with soho Zen5.

Abwx · Aug 16, 2024

Hans Gruber said:
Hopefully all of the current Zen 5 issues and the DDR5 memory issues will be worked out in the upcoming AGESA bios updates.

For the time there s no memory issues since max RAM frequency has been extended significantly, the only issue so far is the inter CCDs latency that is unexpected and still not explicated.

moinmoin said:
If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.

Looking at Computebase numbers AMD made a very good job on the efficency front and comparisons with the 7000 series are pointless since that s not their competition.

At 142W PPT the 9700X is 7% faster than the 14600K that pull 191W, actually it would require something like 112W to just match the Intel chip, that s 70% better perf/watt at isoperf .

The 9900X at 162W is just 3% below the 14700K that pull 266W, so 180W would be enough to match the Intel opponent and that would amount to 47% better perf/watt at isoperf.

Last but not least at 200W the 9950X is 10% faster than the 14900K@275W and 150W are enough to match Intel s top contender, wich would be a massive 83% better perf/watt at isoperf.

MS_AT · Aug 16, 2024

tsamolotoff said:
In any case, it's not relevant to this interchiplet communication issue, slower or faster RAM shouldn't affect this at all, as far as I can understand.

Actually, somebody with Zen5 2ccd chip could try to vary RAM timings keeping IF frequency constant, and run core to core latency test. If the curve wouldn't be flat that would mean CCDs are synced through RAM, what would be surprising to say the least

lucasworais · Aug 16, 2024

Abwx said:
comparisons with the 7000 series are pointless

No. LoL

sl0519 · Aug 16, 2024

This is the ARL preliminary benchmarks, from Jaykihn at Xwitter.
https://x.com/jaykihn0

As a comparison, I pulled Uniko's testing of the recent 9950X.

Looking at GB 5.4.5, the QS scored 2455, which trails 9950X by around 5%. Multicore however the QS is winning 9950X by almost 8%, but also keep in mind that GB doesn't scale well with more cores (correct me if I'm wrong). It's impressive if we consider the fact the score is obtained without HT, but for people looking for a generational improvement, this is exactly Zen 5% lol.

Jan Olšan · Aug 16, 2024

moinmoin said:
First of all this is highly configurable, both under Windows and under Linux. But there well may be a slant in that while the Windows scheduler is about the worst one out there, energy saving features are usually supported earlier under Windows than under Linux. But I doubt Ryzen 9000's performance oddities can be ascribed to (correctly working) energy saving features.

If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.

It could also be the scheduler never giving a single process maximum cycles available, so that QoS is better for multitasking/GUI responsiveness.

Also Windows probably does more work in background compared to a Linux system and I think that is a factor in lots of the "game X runs faster on Linux OMG" articles. IMHO these differences should not be automatically ascribed to "bad scheduler".

I used to hear the same about memory management. Then I read a writeup by somebody who actually knew how memory management works in Windows and Linux, the strengths/weaknesses and it was not a simple story, at all.

Internet randos also routinely spit on NTFS but it's a FS that is so ridiculously reliable IMHO (while I recall data-endangering issues being discussed for almost any Linux FS.. except JFS which went out of fashion by the time I started following this.)

Det0x · Aug 16, 2024

Abwx said:
In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.

Ryzen 9 9950X & 9900X: Gaming-Benchmarks

Dieser Test analysiert die Gaming-Performance von Ryzen 9 9950X sowie Ryzen 9 9900X inklusive Benchmarks mit RAM-OC bis DDR5-8000.

www.computerbase.de

Seems like mainstream reviewers have finally started to run higherspeed that "we overclockers" have ran for like 1.5 years already 👍
Just too bad its slow expo timings atm

Abwx · Aug 16, 2024

lucasworais said:
No. LoL

Why, are the 7000 series manufactured and sold by Intel.?.

Det0x said:
Seems like mainstream reviewers have finally started to run higherspeed that "we overclockers" have ran for like 1.5 years already 👍
Just too bad its slow expo timings atm

That s the timings he apparently used :

DDR5-8000 und den Timings 38-48-48-98

He also said that the 9600X/9700X/9950X worked flawlessly but not the 9900X, also he had no time to tweak the timings since he was hard pressed by the short time available before releasing the review, hence he used only out of the box timings and added that the 9900X could eventually work as well at 8000MT/s with some tweaks.

Det0x · Aug 16, 2024

Abwx said:
Why, are the 7000 series manufactured and sold by Intel.?.

That s the timings he apparently used :

He also said that the 9600X/9700X/9950X worked flawlessly but not the 9900X, also he had no time to tweak the timings since he was hard pressed by the short time available before releasing the review, hence he used only out of the box timings and added that the 9900X could eventually work as well with some tweaks.

tFRC+tREFI is the timing that will give the highest performance gain, sub-timings matter also! ;-)

tsamolotoff · Aug 16, 2024

Abwx said:
That s the timings he apparently used :

I'd say that RFC, REFI, RRDL/RRDS/FAW and SCLs are more important for gaming performance than primaries

Abwx · Aug 16, 2024

Det0x said:
tFRC is the timing that will give the highest performance gain, sub-timings matter also! ;-)

tsamolotoff said:
I'd say that RFC, REFI, RRDL/RRDS/FAW and SCLs are more important for gaming performance than primaries

If the reviewers keep the chips for some time we ll surely have revised reviews for the tests with ocked RAM, the current reviews were rushed since at Computerbase the CPU perfs reviewer, Volker Rißka, said on their forum that he got only 3 days to make all the perfs measurements, guess that it was no different for other outlets.

majord · Aug 16, 2024

sl0519 said:
This is the ARL preliminary benchmarks, from Jaykihn at Xwitter.
https://x.com/jaykihn0

As a comparison, I pulled Uniko's testing of the recent 9950X.

Looking at GB 5.4.5, the QS scored 2455, which trails 9950X by around 5%. Multicore however the QS is winning 9950X by almost 8%, but also keep in mind that GB doesn't scale well with more cores (correct me if I'm wrong). It's impressive if we consider the fact the score is obtained without HT, but for people looking for a generational improvement, this is exactly Zen 5% lol.

actually by the numbers you posted there its zen 13%

Det0x · Aug 16, 2024

Abwx said:
If the reviewers keep the chips for some time we ll surely have revised reviews for the tests with ocked RAM, the current reviews were rushed since at Computerbase the CPU perfs reviewer, Volker Rißka, said on their forum that he got only 3 days to make all the perfs measurements, guess that it was no different for other outlets.

Yeah its kinda strange i got 2 months to play with my sample before release, and reviewers got ~1week 🤔

Josh128 · Aug 16, 2024

Hmm.. has nobody else yet made the correlation between the horrible cross CCX latency on Strix with the horrible cross CCD/CCX latency on Granite Ridge? They are very similar, with GNR having some worse outliers. I think this puts to bed it has anything to do with AGESA or packaging. This is a conscious design decision.

Crazy thought, but what if it has something to do with power efficiency management? Lower speed cross CCX comms could possibly mean lower power consumption? I think if thats the case it seems like a ridiculous choice, but if mobile was the priority, it might be. Who knows?

StefanR5R · Aug 16, 2024

Hitman928 said:
From what we have so far, the driver does not do anything for parking the cores, that behavior is controlled by the CPU/firmware itself. The driver is there to prevent thread migration from the active CCD to the inactive CCD unnecessarily.

Isn't "core parking" an interaction between Windows process scheduler, power management driver, and CPU? (That is, the process scheduler avoids putting tasks onto a subset of logical CPUs, idle CPUs go into low power state. The latter was the whole reason why "core parking" was invented back in the day; now it its used/ abused for quite different purposes, as processors have become increasingly complex, making it ever more difficult for OSs' process schedulers and power managers to make good decisions.)

Hitman928 said:
The 7950x wouldn't park cores on the 2nd CCD.

Although maybe it could if one wanted to(?).

Benefits of spreading a lowly…medium parallel workload onto both CCXs of dual-CCX Ryzens:

More cache is available.
More GMI link width is available.

Benefits of concentrating a lowly…medium parallel workload onto one of the CCXs of dual-CCX Ryzens:

Threads which happen to interact share cache lines.

The latter is important in such cases as — for example — increasing lo-res/low-detail video game FPS from 450 to 600, or to efficiently run multithreaded FFTs to search for large primes.

inf64 said:
What I would like is AMD to come out and explain how they got the application Performance numbers they showed in the charts (which is not reproducible by anyone else right now). There is no way they achieved those numbers and everyone else is doing something wrong.

First of all, it was borderline impossible to reproduce any of this for launch day reviews: In the tight time frame from sample reception until end of the review embargo, reviewers must have had their hands full with going through multiple BIOS updates and Windows reinstallations, perhaps going through questions and answers with AMD or other reviewers because they have a hard time to make things work, and getting as much of their own planned set of benchmarks as possible done and written up.

However, now that the rush to publish is over, somebody could take the sparse details which are given in the infamous endnotes GNR-01…GNR-04, tell themselves "imagine you had to produce some bar charts for a presentation which your very CEO is going to give at Computex", and then see if they could achieve what AMD subtitled with "* all results are 'up to'". There is certainly quite some room to optimize here and un-optimize there; it's mostly a question of whether you are more afraid of getting fired in May because the numbers aren't looking good enough, or of getting fired in August because 3rd parties won't match these numbers on different setups.

tsamolotoff said:
https://twitter.com/x/status/1824103659019628717

Josh128 said:
AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.

One thing which they appear to have done in Zen 5 relative to Zen 3/4, according to Granite Ridge die shots, is to implement the L3 cache within a considerably reduced area. Right now I can't see though how this could affect penalties to cross-CCX traffic.

The other obvious change in Zen 5 is that they co-designed for 8c and 16c CCXs. But again it is not obvious to me how this pertains to traffic outside a CCX.

Third, Turin has got a new IOD, which notably has to support a) more GMI links than Genoa's IOD, b) up to 16 rather than 8 cores per CCX. Maybe this had repercussions on what they did with the CCD's GMI links or on how L3 tags are managed, all the while as they insisted to not update the client IOD alongside.

marees said:
Wouldn't AMD have caught this in their internal testing ?

Likely they have; the question is what they determined is the impact on real workloads of this.

Hans Gruber · Aug 16, 2024

StefanR5R said:
Isn't "core parking" an interaction between Windows process scheduler, power management driver, and CPU? (That is, the process scheduler avoids putting tasks onto a subset of logical CPUs, idle CPUs go into low power state. The latter was the whole reason why "core parking" was invented back in the day; now it its used/ abused for quite different purposes, as processors have become increasingly complex, making it ever more difficult for OSs' process schedulers and power managers make good decisions.)

Although maybe it could if one wanted to(?).

Benefits of spreading a lowly…medium parallel workload onto both CCXs of dual-CCX Ryzens:

More cache is available.

More GMI link width is available.

Benefits of concentrating a lowly…medium parallel workload onto one of the CCXs of dual-CCX Ryzens:

Threads which happen to interact share cache lines.

The latter is important in such cases as — for example — increasing lo-res/low-detail video game FPS from 450 to 600, or to efficiently run multithreaded FFTs to search for large primes.

First of all, it was borderline impossible to reproduce any of this for launch day reviews: In the tight time frame from sample reception until end of the review embargo, reviewers must have had their hands full with going through multiple BIOS updates and Windows reinstallations, perhaps going through questions and answers with AMD or other reviewers because they have a hard time to make things work, and getting as much of their own planned set of benchmarks as possible done and written up.

However, now that the rush to publish is over, somebody could take the sparse details which are given in the infamous endnotes GNR-01…GNR-04, tell themselves "imagine you had to produce some bar charts for a presentation which your very CEO is going to give at Computex", and then see if they could achieve what AMD subtitled with "* all results are 'up to'". There is certainly quite some room to optimize here and un-optimize there; it's mostly a question of whether you are more afraid of getting fired in May because the numbers aren't looking good enough, or of getting fired in August because 3rd parties won't match these numbers on different setups.

One thing which they appear to have done in Zen 5 relative to Zen 3/4, according to Granite Ridge die shots, is to implement the L3 cache within a considerably reduced area. Right now I can't see though how this could affect penalties to cross-CCX traffic.

The other obvious change in Zen 5 is that they co-designed for 8c and 16c CCXs. But again it is not obvious to me how this pertains to traffic outside a CCX.

Third, Turin has got a new IOD, which notably has to support a) more GMI links than Genoa's IOD, b) up to 16 rather than 8 cores per CCX. Maybe this had repercussions on what they did with the CCD's GMI links or on how L3 tags are managed, all the while as they insisted to not update the client IOD alongside.

Likely they have; the question is what they determined is the impact on real workloads of this.

So what you are saying in a nutshell. A lot of people are getting fired at AMD for Zen 5 in September.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Member

Senior member

Platinum Member

Lifer

Attachments

Diamond Member

Platinum Member

Member

Member

Member

Lifer

Senior member

Junior Member

Junior Member

Senior member

Golden Member

Lifer

Golden Member

Member

Lifer

Senior member

Golden Member

Senior member

Elite Member

Platinum Member